|Summary:||problems bringing up lockd after it has been taken down|
|Product:||Red Hat Enterprise Linux 5||Reporter:||Jeff Layton <jlayton>|
|Component:||kernel||Assignee:||Jeff Layton <jlayton>|
|Status:||CLOSED NOTABUG||QA Contact:||Martin Jenner <mjenner>|
|Version:||5.2||CC:||ram_kesavan, staubach, steved|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2008-09-29 12:02:02 UTC||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Cloudforms Team:||---||Target Upstream Version:|
Description Jeff Layton 2008-07-07 14:52:16 UTC
Do this on 2.6.18-92.1.6.el5debug kernel after a fresh reboot: 1) mount a tcp NFSv3 filesystem 2) unmount it 3) service nfs start ...nfsd will fail to start because lockd_up fails. From dmesg: FS-Cache: Loaded FS-Cache: netfs 'nfs' registered for caching SELinux: initialized (dev 0:17, type nfs), uses genfs_contexts Installing knfsd (copyright (C) 1996 email@example.com). SELinux: initialized (dev nfsd, type nfsd), uses genfs_contexts NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory NFSD: starting 90-second grace period lockd_up: makesock failed, error=-98 lockd_down: no lockd running. nfsd: last server has exited nfsd: unexporting all filesystems ...then if you do a "service nfs restart": lockd_up: no pid, 2 users?? NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state recovery directory NFSD: starting 90-second grace period nfsd: last server has exited nfsd: unexporting all filesystems ...so I think we have a couple of bugs here. Something is causing the makesock to fail and when this occurs, lockd_up isn't handling the error condition appropriately and it's throwing off the nlmsvc_users counter. I suspect this is a regression from 5.1, but I need to confirm it.
Comment 1 Jeff Layton 2008-07-07 16:27:14 UTC
Actually, this doesn't appear to be a regression. When I do the same test on -8.el5, then I get these messages: lockd_up: makesock failed, error=-98 lockd_up: no pid, 2 users?? lockd_up: no pid, 3 users?? lockd_up: no pid, 4 users?? lockd_up: no pid, 5 users?? lockd_up: no pid, 6 users?? lockd_up: no pid, 7 users?? lockd_up: no pid, 8 users?? ...and lockd isn't started. Since no one has complained about this, I'll put this on 5.4 proposed for now. If the fix turns out to be simple I may move it to 5.3...
Comment 2 Jeff Layton 2008-07-08 15:46:03 UTC
This problem has strangely "fixed itself". Yesterday, I could reliably reproduce this. Today, I can't make it happen. The host where I saw this was a RHEL5 FV xen guest. It looked like the power blinked at the office and the xen dom0 rebooted. I brought my RHEL5 image back up and now this isn't happening anymore. It seems unlikely, but maybe this is something to do with being a guest on a long running dom0? I'll leave this open for now in case it happens again...
Comment 3 Jeff Layton 2008-09-29 12:01:44 UTC
Closing this out. I've not seen this problem since, though it still worries me that I saw it at all. I'll reopen it if it returns.
Comment 4 Ram Kesavan 2009-05-20 19:29:53 UTC
I am not sure if this is important, but you will get this error if the portmapper is not running, start portmapper /etc/init.d/portmapper and try the mount and it will work properly.