|Summary:||kernel panic when mounting filesystem|
|Product:||Red Hat Enterprise Linux 3||Reporter:||David L. Crow <crow>|
|Component:||autofs||Assignee:||Jeff Moyer <jmoyer>|
|Status:||CLOSED NOTABUG||QA Contact:||Brock Organ <borgan>|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2005-05-25 17:11:25 UTC||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
Description David L. Crow 2005-05-24 05:34:22 UTC
From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.8) Gecko/20050511 Firefox/1.0.4 Description of problem: My system was upgraded to RHEL3 Update 5 over the weekend by up2date. I installed the new kernel (2.4.21-32.ELsmp) and re-booted to activate it and the kernel would panic when the automounter mounted a filesystem in the /home map. /etc/auto.master contains: /misc /etc/auto.misc /net /etc/auto.net /home /etc/auto.home /etc/auto.home is an executable file that contains #!/bin/sh key="$1" /usr/bin/ypmatch -k "$key" auto.home | sed "s,\&$,$key," (this is to workaround the lack of support for the '&' token) I downgraded to autofs-4.1.3-47 and the problem went away. I enabled netdump to capture the kernel panic via remote syslog and the first few lines look like (I'll create an attachment with the entire message): Unable to handle kernel NULL pointer dereference at virtual address 00000040 printing eip: c011ff4d *pde = 2810f001 *pte = 420d0067 Oops: 0000 autofs4 netconsole nfsd lockd sunrpc usbserial lp parport tg3 floppy sg microcode loop keybdev mousedev hid input usb-ohci usbcore ext3 jbd lvm-mod aacraid sd CPU: 2 EIP: 0060:[<c011ff4d>] Not tainted EFLAGS: 00010086 EIP is at do_page_fault [kernel] 0x2d (2.4.21-32.ELsmp/i686) eax: 00000000 ebx: e80aa000 ecx: 00000040 edx: e80aa1a8 esi: e80aa000 edi: c011ff20 ebp: c6f1abc0 esp: e80aa0dc ds: 0068 es: 0068 ss: 0068 Process automount (pid: 3947, stackpage=e80a9000) Stack: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000040 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 This is the message when I disabled page space (swapoff -a). When I a pagespace was enabled, the primary message was Unable to handle kernel paging request at virtual address 92ad6380 I can get the entire dump if required. Version-Release number of selected component (if applicable): autofs-4.1.3-130 How reproducible: Always Steps to Reproduce: 1. ls /home/foo (where foo is an entry in the auto.home map) Actual Results: kernel panic (see description above) Expected Results: a directory listing of /home/foo Additional info:
Comment 1 David L. Crow 2005-05-24 05:36:34 UTC
Created attachment 114763 [details] Full kernel dump via netdump and syslog
Comment 2 Jeff Moyer 2005-05-24 14:48:34 UTC
First, autofs does in fact support the & token. So you could have an entry like: * server:/export/& Can you please attach the vmcore file created by netdump? Thanks.
Comment 3 David L. Crow 2005-05-25 16:45:49 UTC
Thanks for the clarification on the & token. I tried to create a vmcore file with netdump, but was not able to. On the failing machine (the netdump client), I see the following in syslog output: May 25 00:40:45 host1 kernel: netlog: network logging started up successfully! May 25 00:40:45 host1 netdump: initializing netdump succeeded My ethernet device is a tg3 which isn't listed as supported in the white paper at <http://www.redhat.com/support/wpapers/redhat/netdump/>, but that white paper indicates that netdump will complain if it finds an unsupported adapter and it didn't. On the netdump server machine, the dump information is shown and then May 24 22:04:37 host1 CPU#0 is executing netdump. May 24 22:04:37 host1 CPU#2 is frozen. May 24 22:04:37 host1 < netdump activated - performing handshake with the server. > May 24 22:05:25 host2 netdump: Got too many timeouts in handshaking, ignoring client 0x....a00e May 24 22:05:28 host2 netdump: Got too many timeouts waiting for SHOW_STATUS for client 0x....a00e, rebooting it Any suggestions as to what might be the problem would be appreciated.
Comment 4 Jeff Moyer 2005-05-25 16:49:36 UTC
I've managed to reproduce the problem and get a netdump. The issue you are running into is a stack overflow, and as such, netdump isn't quite reliable in running afterwards. I am currently debugging the problem, and will keep this bug updated with status. Thanks.
Comment 5 Jeff Moyer 2005-05-25 16:57:34 UTC
Could you please attach the output from your script when passed a valid home directory? It actually looks like you are triggering a recursive bind mount, which will definitely cause problems!
Comment 6 David L. Crow 2005-05-25 17:04:39 UTC
userid@host1:/home/userid> sh /etc/auto.home userid userid server.central:/export/home11/userid As an FYI, I changed the home configuration in auto.master to yp:auto.home and still saw the problem.
Comment 7 Jeff Moyer 2005-05-25 17:11:25 UTC
> userid server.central:/export/home11/userid That line is wrong. Check the output from the auto.net script. i.e.: # sh /etc/auto.net somehost -fstype=nfs,hard,intr,nodev,nosuid \ /vol/vol1 somehost:/vol/vol1 Notice that they key is not repeated in the output! That is because the daemon already knows what the key is, it just wants the rest of the entry. I'm guessing that this worked for you before by accident. =) So, for you it is bad that we "fixed" the broken behaviour. Anyway, what will end up happening now is that autofs thinks that userid is the host from which to mount. When it determines that it isn't a host, it will fall back to using it as a local directory from which to bind mount. So, you end up with an equivalent command of: mount --bind /home/userid /home/userid Which is really bad. As mentioned above, you can simply use the wildcard matching features of automount to achieve your goal. I am closing this as NOTABUG, since the kernel hang can only be triggered by a broken configuration (and hence, only by root). Thanks.