Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 229780 - kernel panic when attempting to mount nfs4 filesystem twice
Summary: kernel panic when attempting to mount nfs4 filesystem twice
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.4
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Jeff Layton
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 234547
TreeView+ depends on / blocked
 
Reported: 2007-02-23 14:33 UTC by Jeff Layton
Modified: 2009-06-19 15:22 UTC (History)
3 users (show)

Fixed In Version: RHBA-2007-0304
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-05-08 04:53:07 UTC


Attachments (Terms of Use)
patch to check for NULL pointer before freeing (deleted)
2007-02-23 17:33 UTC, Jeff Layton
no flags Details | Diff
backported upstream patch for same problem (deleted)
2007-02-26 17:31 UTC, Jeff Layton
no flags Details | Diff
updated patch -- remove added nfs_free_iostats that would have caused double-free (deleted)
2007-02-26 23:03 UTC, Jeff Layton
no flags Details | Diff
patch -- don't free NULL pointer on error, also dont leak iostats (deleted)
2007-03-06 19:46 UTC, Jeff Layton
no flags Details | Diff


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2007:0304 normal SHIPPED_LIVE Updated kernel packages available for Red Hat Enterprise Linux 4 Update 5 2007-04-28 18:58:50 UTC

Description Jeff Layton 2007-02-23 14:33:01 UTC
Easily reproducable kernel panic -- mount a nfsv4 filesystem, and then attempt
to mount it again.

For instance, run this twice:

# mount -t nfs4 server:/ /mnt/server

Oops looks like this:

Unable to handle kernel paging request at ffffffffffffffff RIP: 
<ffffffff8015bd5a>{free_percpu+24}
PML4 103067 PGD 1727067 PMD 0 
Oops: 0000 [1] SMP 
CPU 0 
Modules linked in: nfs lockd nfs_acl md5 ipv6 autofs4 rpcsec_gss_krb5
auth_rpcgss des sunrpc loop xennet dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod
xenblk sd_mod scsi_mod
Pid: 2133, comm: mount Not tainted 2.6.9-48.EL.mntcrash.1xenU
RIP: e030:[<ffffffff8015bd5a>] <ffffffff8015bd5a>{free_percpu+24}
RSP: e02b:ffffff801b7d5c18  EFLAGS: 00010286
RAX: 00000000ffffffff RBX: ffffffffffffffff RCX: ffffff801fe2be00
RDX: ffffff8001000000 RSI: 0000000000000042 RDI: 0000000000000000
RBP: 0000000000000000 R08: 00000000c43910ac R09: ffffff801fe2be00
R10: ffffff801fe2be00 R11: ffffff801fe2be00 R12: 0000000000000000
R13: ffffff801b037000 R14: ffffffffa019a1e0 R15: ffffff801b029000
FS:  0000002a95573b00(0000) GS:ffffffff8041d700(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000
Process mount (pid: 2133, threadinfo ffffff801b7d4000, task ffffff801bdab030)
Stack: ffffffffff5fd000 ffffff801fe2be00 ffffffffa019a1e0 ffffffffa016561c 
       ffffff801fe03400 ffffffff801ce4de 0000000000000000 0000000000000000 
       0000000000000000 0000000000000000 
Call Trace:<ffffffffa016561c>{:nfs:nfs4_get_sb+1759}
<ffffffff801ce4de>{selinux_sb_copy_data+47} 
       <ffffffff8017b02d>{do_kern_mount+161} <ffffffff80190d33>{do_mount+1690} 
       <ffffffff802360c9>{sock_common_recvmsg+48}
<ffffffff80232d8a>{sock_aio_read+297} 
       <ffffffff80265895>{tcp_transmit_skb+2037}
<ffffffff80235e1f>{sk_reset_timer+15} 
       <ffffffff802663b0>{tcp_write_xmit+314}
<ffffffff80156ff4>{buffered_rmqueue+384} 
       <ffffffff8010ddc3>{error_exit+0} <ffffffff801571e4>{__alloc_pages+200} 
       <ffffffff801910d6>{sys_mount+186} <ffffffff8010d66e>{system_call+134} 
       <ffffffff8010d5e8>{system_call+0} 

Code: 48 8b 3b e8 9d f4 ff ff ff c5 48 83 c3 08 83 fd 1f 7e e0 58 
RIP <ffffffff8015bd5a>{free_percpu+24} RSP <ffffff801b7d5c18>
CR2: ffffffffffffffff
 <0>Kernel panic - not syncing: Oops

reproduced so far on an x86_64 xen guest running a -48.EL kernel with the patch
for bz 226983. Not certain yet if other arches are affected.

Comment 1 Jeff Layton 2007-02-23 14:53:01 UTC
To clarify, I've also seen the same panic on a stock -48 xenU kernel. I just
tried the patch in 226983 to see if it might fix this as well, but it didn't.


Comment 2 Jeff Layton 2007-02-23 15:15:21 UTC
Same panic on i686 xen guest as well:

general protection fault: 0000 [#1]
SMP 
Modules linked in: nfs lockd nfs_acl md5 ipv6 autofs4 sunrpc dm_mirror dm_mod
xennet ext3 jbd xenblk sd_mod scsi_mod
CPU:    0
EIP:    0061:[<c01424a3>]    Not tainted VLI
EFLAGS: 00010286   (2.6.9-48.ELxenU) 
EIP is at free_percpu+0x17/0x29
eax: ffffffff   ebx: 00000000   ecx: df357000   edx: f5392000
esi: ffffffff   edi: c1627200   ebp: dcf621a0   esp: df357e98
ds: 007b   es: 007b   ss: 0068
Process mount (pid: 2556, threadinfo=df357000 task=de933970)
Stack: c16244f8 c1624400 e1231c45 00000000 dceef000 00000000 c1630980 e125f7c0 
       c015eb35 e125f7c0 00000000 dceef000 dcf16000 dded5000 dd2bc000 dcf16000 
       00000015 dceef000 c0172c9d dd2bc000 00000000 dceef000 dcf16000 00000000 
Call Trace:
 [<e1231c45>] nfs4_get_sb+0x265/0x275 [nfs]
 [<c015eb35>] do_kern_mount+0x85/0x143
 [<c0172c9d>] do_new_mount+0x67/0xa4
 [<c01732ea>] do_mount+0x15f/0x179
 [<c0107507>] error_code+0x2b/0x30
 [<c026a298>] iret_exc+0xeb4/0x159c
 [<c0173140>] copy_mount_options+0x49/0x94
 [<c0173655>] sys_mount+0x9b/0x115
 [<c010737f>] syscall_call+0x7/0xb
Code: 0e 80 3a 00 74 09 5b 5e 5f 5d e9 b1 4b 0b 00 5b 5e 5f 5d c3 56 53 8b 74 24
0c 31 db f7 d6 0f a3 1d 24 59 3a c0 19 c0 85 c0 74 09 <ff> 34 9e e8 55 ff ff ff
59 43 83 fb 1f 7e e4 5b 5e c3 8b 44 24 
 <0>Fatal exception: panic in 5 seconds
Kernel panic - not syncing: Fatal exception

It looks like -42.26 does not panic on i686, so my guess is that this is a
regression introduced somewhere between those two releases. I'll see if I can
confirm when it was introduced.


Comment 3 Jeff Layton 2007-02-23 16:32:41 UTC
Looks like this was introduced in -42.27. The most likely culprit is the
nfs-stats patch detailed here:

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=199263

I'm going to try backing this out and seeing if it fixes the problem.


Comment 4 Jeff Layton 2007-02-23 17:33:12 UTC
Created attachment 148684 [details]
patch to check for NULL pointer before freeing

This patch corrected the oops. nfs4_get_sb needs to check if server->io_stats
is NULL before trying to free it.

Comment 6 RHEL Product and Program Management 2007-02-23 17:44:10 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 7 RHEL Product and Program Management 2007-02-23 17:44:52 UTC
This bugzilla has Keywords: Regression.  

Since no regressions are allowed between releases, 
it is also being proposed as a blocker for this release.  

Please resolve ASAP.

Comment 9 Jeff Layton 2007-02-26 17:31:21 UTC
Created attachment 148816 [details]
backported upstream patch for same problem

Actually, this patch, backported from here might be better:

http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=01d0ae8beaee75d954900109619b700fe68707d9


This looks like it fixes the original problem and also addresses Peter's
concerns.

In addition to what was in the upstream patch, I also added a call to the error
path of nfs_sb_init. It looked like if getting a root inode or dentry failed
then the iostats would leak.

Comment 13 Jeff Layton 2007-02-26 23:03:35 UTC
Created attachment 148838 [details]
updated patch -- remove added nfs_free_iostats that would have caused double-free

Peter pointed out that that nfs_free_iostats that I added could cause a double
free, since kill_sb gets called in an error condition anyway. This patch gets
rid of that and should be pretty much the same as what the upstream patch was.

Comment 14 Jeff Layton 2007-03-06 19:46:00 UTC
Created attachment 149372 [details]
patch -- don't free NULL pointer on error, also dont leak iostats

This patch should fix the problem as well, and doesn't pull in the changes to
nfs_sb_init.

Comment 15 Jason Baron 2007-03-07 19:13:57 UTC
committed in stream U5 build 50. A test kernel with this patch is available from
http://people.redhat.com/~jbaron/rhel4/


Comment 18 Mike Gahagan 2007-04-02 17:40:35 UTC
Patch is in -52, already verified by at least one partner.


Comment 21 Red Hat Bugzilla 2007-05-08 04:53:07 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0304.html


Note You need to log in before you can comment on or make changes to this bug.