Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1511525 - rpc.mountd crashes while building rpc response to mount DUMP if rmtab is very large
Summary: rpc.mountd crashes while building rpc response to mount DUMP if rmtab is very...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: nfs-utils
Version: 7.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Steve Dickson
QA Contact: Zhi Li
URL:
Whiteboard:
Depends On:
Blocks: 1420851
TreeView+ depends on / blocked
 
Reported: 2017-11-09 14:02 UTC by Frank Sorenson
Modified: 2018-03-24 11:11 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-03-24 11:11:21 UTC


Attachments (Terms of Use)

Description Frank Sorenson 2017-11-09 14:02:34 UTC
Description of problem:

If the /var/lib/nfs/rmtab file contains a large number of entries, rpc.mountd will segfault after running out of stack when constructing the response to a mount dump (showmount -a) request.


Version-Release number of selected component (if applicable):

nfs-utils-1.3.0-0.48.el7.x86_64
libtirpc-0.2.4-0.10.el7.x86_64


How reproducible:

see below for reproducer

Steps to Reproduce:


# i=0 ; while [[ $i -lt 100000 ]] ; do echo "192.168.1.1:/exports:0x00000001" >> /var/lib/nfs/rmtab ; i=$(($i + 1)) ; done

# showmount -a


Actual results:

rpc.mountd crashes with segfault

Expected results:

no segfault


Additional info:

backtrace in coredump:


Core was generated by `rpc.mountd -d all -t 4'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f77b9cf389b in xdrrec_putbytes (xdrs=0x557433680b58, addr=0x557435989950 "172.22.73.81", len=12) at xdr_rec.c:304

(gdb) info thr
  Id   Target Id         Frame 
* 1    Thread 0x7f77ba75b880 (LWP 25573) 0x00007f77b9cf389b in xdrrec_putbytes (xdrs=0x557433680b58, 
    addr=0x557435989950 "192.168.1.1", len=12) at xdr_rec.c:304


#0  0x00007f77b9cf389b in xdrrec_putbytes (xdrs=0x557433680b58, addr=0x557435989950 "192.168.1.1", len=12) at xdr_rec.c:304
#1  0x00007f77b9cf2f8a in xdr_opaque (xdrs=0x557433680b58, cp=<optimized out>, cnt=<optimized out>) at xdr.c:605
#2  0x00007f77b9cf32b1 in xdr_string (xdrs=xdrs@entry=0x557433680b58, cpp=cpp@entry=0x557435989930, maxsize=maxsize@entry=255)
    at xdr.c:820
#3  0x0000557432caa566 in xdr_name (objp=0x557435989930, xdrs=0x557433680b58) at mount_xdr.c:83
#4  xdr_mountbody (xdrs=0x557433680b58, objp=0x557435989930) at mount_xdr.c:103
#5  0x00007f77b9cf4602 in xdr_reference (xdrs=xdrs@entry=0x557433680b58, pp=pp@entry=0x5574359899a0, size=size@entry=24, 
    proc=proc@entry=0x557432caa550 <xdr_mountbody>) at xdr_reference.c:90
#6  0x00007f77b9cf4749 in xdr_pointer (xdrs=xdrs@entry=0x557433680b58, objpp=objpp@entry=0x5574359899a0, 
    obj_size=obj_size@entry=24, xdr_obj=xdr_obj@entry=0x557432caa550 <xdr_mountbody>) at xdr_reference.c:137
#7  0x0000557432caa5a8 in xdr_mountlist (objp=0x5574359899a0, xdrs=0x557433680b58) at mount_xdr.c:93
#8  xdr_mountbody (xdrs=0x557433680b58, objp=0x557435989990) at mount_xdr.c:107
#9  0x00007f77b9cf4602 in xdr_reference (xdrs=xdrs@entry=0x557433680b58, pp=pp@entry=0x557435989a00, size=size@entry=24, 
    proc=proc@entry=0x557432caa550 <xdr_mountbody>) at xdr_reference.c:90

... <snip about 209,500 lines>

#209477 0x00007f77b9cf4602 in xdr_reference (xdrs=xdrs@entry=0x557433680b58, pp=pp@entry=0x557435ed7ae0, size=size@entry=24, 
    proc=proc@entry=0x557432caa550 <xdr_mountbody>) at xdr_reference.c:90
#209478 0x00007f77b9cf4749 in xdr_pointer (xdrs=xdrs@entry=0x557433680b58, objpp=objpp@entry=0x557435ed7ae0, 
    obj_size=obj_size@entry=24, xdr_obj=xdr_obj@entry=0x557432caa550 <xdr_mountbody>) at xdr_reference.c:137
#209479 0x0000557432caa5a8 in xdr_mountlist (objp=0x557435ed7ae0, xdrs=0x557433680b58) at mount_xdr.c:93
#209480 xdr_mountbody (xdrs=0x557433680b58, objp=0x557435ed7ad0) at mount_xdr.c:107
#209481 0x00007f77b9cf4602 in xdr_reference (xdrs=xdrs@entry=0x557433680b58, pp=pp@entry=0x557435ed7b40, size=size@entry=24, 
    proc=proc@entry=0x557432caa550 <xdr_mountbody>) at xdr_reference.c:90
#209482 0x00007f77b9cf4749 in xdr_pointer (xdrs=xdrs@entry=0x557433680b58, objpp=objpp@entry=0x557435ed7b40, 
    obj_size=obj_size@entry=24, xdr_obj=xdr_obj@entry=0x557432caa550 <xdr_mountbody>) at xdr_reference.c:137
#209483 0x0000557432caa5a8 in xdr_mountlist (objp=0x557435ed7b40, xdrs=0x557433680b58) at mount_xdr.c:93
#209484 xdr_mountbody (xdrs=0x557433680b58, objp=0x557435ed7b30) at mount_xdr.c:107
#209485 0x00007f77b9cf4602 in xdr_reference (xdrs=xdrs@entry=0x557433680b58, pp=pp@entry=0x7ffd62a69d50, size=size@entry=24, 
    proc=proc@entry=0x557432caa550 <xdr_mountbody>) at xdr_reference.c:90
#209486 0x00007f77b9cf4749 in xdr_pointer (xdrs=0x557433680b58, objpp=0x7ffd62a69d50, obj_size=obj_size@entry=24, 
    xdr_obj=xdr_obj@entry=0x557432caa550 <xdr_mountbody>) at xdr_reference.c:137
#209487 0x0000557432caa745 in xdr_mountlist (xdrs=<optimized out>, objp=<optimized out>) at mount_xdr.c:93
#209488 0x00007f77b9cf0069 in svc_vc_reply (xprt=<optimized out>, msg=<optimized out>) at svc_vc.c:705
#209489 0x00007f77b9cecef0 in svc_sendreply (xprt=xprt@entry=0x557433680aa0, xdr_results=<optimized out>, 
    xdr_location=xdr_location@entry=0x7ffd62a69d50) at svc.c:402
#209490 0x0000557432cadb59 in rpc_dispatch (rqstp=rqstp@entry=0x7ffd62a69dc0, transp=transp@entry=0x557433680aa0, 
    dtable=<optimized out>, dtable@entry=0x557432ebb260 <dtable>, nvers=nvers@entry=3, argp=argp@entry=0x7ffd62a69d40, 
    resp=resp@entry=0x7ffd62a69d50) at rpcdispatch.c:61
#209491 0x0000557432ca34d4 in mount_dispatch (rqstp=0x7ffd62a69dc0, transp=0x557433680aa0) at mount_dispatch.c:82
#209492 0x00007f77b9ced511 in svc_getreq_common (fd=fd@entry=11) at svc.c:682
#209493 0x0000557432ca7362 in my_svc_getreqset (readfds=0x7ffd62a6a380) at svc_run.c:84
#209494 my_svc_run () at svc_run.c:119
#209495 0x0000557432ca2099 in main (argc=<optimized out>, argv=<optimized out>) at mountd.c:936


this is dumping the mountlist, and is in the following recursive loop:

xdr_mountlist -> xdr_pointer -> xdr_reference -> xdr_mountlist -> xdr_pointer -> ...


(No, I don't know how/why came to have 400,000 entries; in the customer's case, nfs is managed by a third-party cluster manager, and rmtab contained mostly duplicates of other entries--only about 1/4 of the entries were unique)
(Yes, I realize rmtab is unreliable)

Comment 3 Scott Mayhew 2017-11-13 15:54:58 UTC
nfs-utils commit a15bd94860 ("mountd/exportfs: implement the -s/--state-directory-path option") would help with this.  

The idea behind this change was to move the etab and rmtab files (which HA software has no business directly messing with) out of the directory used to store the on-disk locking state files (which HA software does need to touch).  Since I hit some other stumbling points with my HA work there wasn't really much need to get this into RHEL7 yet.

My intention was to move the etab and rmtab to /run/nfs (/run is the location provided by the systemd-tmpfiles service for runtime data).  Since the data is volatile, the rmtab would only contain entries for clients that have mounted exports from the NFS server since it was last rebooted.  I'm assuming that we're looking at months or years worth of entries for there to be 400,000 of them.  When I had posted the patch, Neil Brown expressed a desire to get rid of the rmtab file altogether.

Note that if we were to go this route, we'd also have to ship a systemd-tmpfiles configuration file

# cat /usr/lib/tmpfiles.d/nfs.conf
#Type Path           Mode  UID  GID  Age Argument
d    /run/nfs        0755  root root  -  -
f    /run/nfs/etab   0644  root root  -  -
f    /run/nfs/rmtab  0644  root root  -  -

and if selinux is in enforcing mode, the correct context would need to
be set on the directory (note this was on Fedora, where semanage barks at me if I use
/run/nfs... that's why I'm using /var/run/nfs here instead):

# semanage fcontext -a -t var_lib_nfs_t "/var/run/nfs(/.*)?"

Or we'd need to get the above added to the default selinux policy.

That all seems like a lot of churn... maybe we should just recommend that people periodically clear out their rmtab file?

Comment 4 Dave Wysochanski 2018-03-24 11:11:21 UTC
Per comment #3, fixing this looks too invasive in relation to the low frequency and fact that this can be avoided by periodically clearing out rmtab file, so closing WONTFIX for now and pending further information.


Note You need to log in before you can comment on or make changes to this bug.