Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 593689 - kernel panic on NFSv4 server running bonnie++ over NFS
Summary: kernel panic on NFSv4 server running bonnie++ over NFS
Keywords:
Status: CLOSED DUPLICATE
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.0
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Red Hat Kernel Manager
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-05-19 13:58 UTC by Matt Bernstein
Modified: 2010-05-19 15:43 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-05-19 15:43:16 UTC
Target Upstream Version:


Attachments (Terms of Use)
all I could capture of the kernel panic (deleted)
2010-05-19 13:58 UTC, Matt Bernstein
no flags Details

Description Matt Bernstein 2010-05-19 13:58:53 UTC
Created attachment 415118 [details]
all I could capture of the kernel panic

Description of problem:

With an RHEL6 beta NFS client running bonnie++ 1.93 on an RHEL6 beta NFS server, it writes 15GB then kernel panics.

Version-Release number of selected component (if applicable):

2.6.32-19.el6.x86_64

How reproducible:

not reliably

Steps to Reproduce:
1. install bonnie++ from Fedora 13 src RPM on RHEL6 beta NFSv4 client
2. run bonnie++ -f -d /path/to/nfsmount
3. wait a few minutes
  
Actual results:

client process freezes, server kernel-panics (see attachment, nothing in logs)

Expected results:

no crashes, benchmark results

Additional info:

Here are the mount options:

landin:/iso on /import/iso type nfs4 (rw,nosuid,hard,intr,proto=tcp,rsize=65536,wsize=65536,sloppy,addr=138.37.88.245,clientaddr=138.37.88.218)

Client and server are Dells (M910 and R810 respectively), both with Xeon 6542 chips. Client has 128GB RAM, server 64GB.

Comment 2 Matt Bernstein 2010-05-19 15:20:27 UTC
I ran it a second time, and it wrote 225G before crashing. This time I caught more debugging:

May 19 15:35:56 landin kernel: kernel BUG at fs/ext4/inode.c:1852!
May 19 15:35:56 landin kernel: invalid opcode: 0000 [#1] SMP 
May 19 15:35:56 landin kernel: last sysfs file: /sys/devices/system/cpu/cpu23/cache/index2/shared_cpu_map
May 19 15:35:56 landin kernel: CPU 9 
May 19 15:35:56 landin kernel: Modules linked in: mptctl(U) mptbase(U) ipmi_msghandler(U) dell_rbu(U) nfsd(U) nfs_acl(U) auth_rpcgss(U) exportfs(U) lockd(U) sunrpc(U) bonding(U) nf_conntrack_ftp(U) ts_kmp(U) nf_conntrack_amanda(U) ip6t_REJECT(U) nf_conntrack_ipv6(U) ip6table_filter(U) ip6_tables(U) ipv6(U) dm_mirror(U) dm_region_hash(U) dm_log(U) sr_mod(U) cdrom(U) bnx2(U) iTCO_wdt(U) iTCO_vendor_support(U) ses(U) serio_raw(U) power_meter(U) enclosure(U) joydev(U) dcdbas(U) hwmon(U) sg(U) ext4(U) mbcache(U) jbd2(U) pata_acpi(U) ata_generic(U) dm_multipath(U) sd_mod(U) crc_t10dif(U) ata_piix(U) megaraid_sas(U) dm_mod(U) [last unloaded: speedstep_lib]
May 19 15:35:56 landin kernel: Pid: 2757, comm: nfsd Not tainted 2.6.32-19.el6.x86_64 #1 PowerEdge R810
May 19 15:35:56 landin kernel: RIP: 0010:[<ffffffffa009dcd3>]  [<ffffffffa009dcd3>] ext4_da_get_block_prep+0x2c3/0x2e0 [ext4]
May 19 15:35:56 landin kernel: RSP: 0018:ffff8808473d3590  EFLAGS: 00010297
May 19 15:35:56 landin kernel: RAX: 0000000000001c94 RBX: ffff881052eb24e0 RCX: 0000000000000154
May 19 15:35:56 landin kernel: RDX: 0000000000001c95 RSI: 0000000000001c94 RDI: 0000000000000153
May 19 15:35:56 landin kernel: RBP: ffff8808473d35f0 R08: 0000000000001c94 R09: ffff881059548ce0
May 19 15:35:56 landin kernel: R10: 0000000004198000 R11: 0000000000000000 R12: ffff88068dbca408
May 19 15:35:56 landin kernel: R13: ffff881052eb27b0 R14: ffff881052eb2430 R15: 0000000000000000
May 19 15:35:56 landin kernel: FS:  0000000000000000(0000) GS:ffff88089c480000(0000) knlGS:0000000000000000
May 19 15:35:56 landin kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
May 19 15:35:56 landin kernel: CR2: 00000000f77ab000 CR3: 0000000001001000 CR4: 00000000000006e0
May 19 15:35:56 landin kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
May 19 15:35:56 landin kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
May 19 15:35:56 landin kernel: Process nfsd (pid: 2757, threadinfo ffff8808473d2000, task ffff8808473d1580)
May 19 15:35:56 landin kernel: Stack:
May 19 15:35:56 landin kernel: ffff880880010c80 ffffea00014121c0 ffff88105b9a9000 ffffffffffff0000
May 19 15:35:56 landin kernel: <0> ffff881052eb24e0 00000000c2100000 ffff8808473d35f0 0000000000001000
May 19 15:35:56 landin kernel: <0> 0000000000001000 0000000000001000 00000000037c2100 0000000000000000
May 19 15:35:56 landin kernel: Call Trace:
May 19 15:35:56 landin kernel: [<ffffffff8118dde3>] __block_prepare_write+0x1e3/0x590
May 19 15:35:56 landin kernel: [<ffffffffa009da10>] ? ext4_da_get_block_prep+0x0/0x2e0 [ext4]
May 19 15:35:56 landin kernel: [<ffffffff8118e334>] block_write_begin+0x64/0x100
May 19 15:35:56 landin kernel: [<ffffffffa00a052d>] ext4_da_write_begin+0x17d/0x290 [ext4]
May 19 15:35:56 landin kernel: [<ffffffffa009da10>] ? ext4_da_get_block_prep+0x0/0x2e0 [ext4]
May 19 15:35:56 landin kernel: [<ffffffff81102c0e>] generic_file_buffered_write+0x10e/0x2a0
May 19 15:35:56 landin kernel: [<ffffffff811047a0>] __generic_file_aio_write+0x250/0x480
May 19 15:35:56 landin kernel: [<ffffffff81104a3f>] generic_file_aio_write+0x6f/0xe0
May 19 15:35:56 landin kernel: [<ffffffffa0096160>] ? ext4_file_write+0x0/0xb0 [ext4]
May 19 15:35:56 landin kernel: [<ffffffffa0096199>] ext4_file_write+0x39/0xb0 [ext4]
May 19 15:35:56 landin kernel: [<ffffffff8115e29b>] do_sync_readv_writev+0xfb/0x140
May 19 15:35:56 landin kernel: [<ffffffff813edb2f>] ? release_sock+0xaf/0xc0
May 19 15:35:56 landin kernel: [<ffffffff8108dc00>] ? autoremove_wake_function+0x0/0x40
May 19 15:35:56 landin kernel: [<ffffffff811f967b>] ? selinux_file_permission+0xfb/0x150
May 19 15:35:56 landin kernel: [<ffffffff811ec646>] ? security_file_permission+0x16/0x20
May 19 15:35:56 landin kernel: [<ffffffff8115f24f>] do_readv_writev+0xcf/0x1f0
May 19 15:35:56 landin kernel: [<ffffffff81120ea9>] ? kmemdup+0x29/0x50
May 19 15:35:56 landin kernel: [<ffffffff81096396>] ? groups_alloc+0x46/0xf0
May 19 15:35:56 landin kernel: [<ffffffff811ec9c6>] ? security_task_setgroups+0x16/0x20
May 19 15:35:56 landin kernel: [<ffffffff81096165>] ? set_groups+0x25/0x1a0
May 19 15:35:56 landin kernel: [<ffffffff8115f3b6>] vfs_writev+0x46/0x60
May 19 15:35:56 landin kernel: [<ffffffffa0271fd0>] nfsd_vfs_write+0xe0/0x440 [nfsd]
May 19 15:35:56 landin kernel: [<ffffffffa02707b2>] ? nfsd_setuser_and_check_port+0x62/0xb0 [nfsd]
May 19 15:35:56 landin kernel: [<ffffffff810d4a15>] ? call_rcu_sched+0x15/0x20
May 19 15:35:56 landin kernel: [<ffffffffa0271aa1>] ? nfsd_permission+0x131/0x190 [nfsd]
May 19 15:35:56 landin kernel: [<ffffffffa0274589>] nfsd_write+0x99/0x100 [nfsd]
May 19 15:35:56 landin kernel: [<ffffffffa027f4d0>] nfsd4_write+0x100/0x130 [nfsd]
May 19 15:35:56 landin kernel: [<ffffffffa027fe51>] nfsd4_proc_compound+0x3d1/0x4d0 [nfsd]
May 19 15:35:56 landin kernel: [<ffffffffa026d3fa>] nfsd_dispatch+0xba/0x250 [nfsd]
May 19 15:35:56 landin kernel: [<ffffffffa0208ec4>] svc_process_common+0x344/0x610 [sunrpc]
May 19 15:35:56 landin kernel: [<ffffffffa02094d0>] svc_process+0x110/0x150 [sunrpc]
May 19 15:35:56 landin kernel: [<ffffffffa026daf6>] nfsd+0xd6/0x190 [nfsd]
May 19 15:35:56 landin kernel: [<ffffffffa026da20>] ? nfsd+0x0/0x190 [nfsd]
May 19 15:35:56 landin kernel: [<ffffffff8108d8a6>] kthread+0x96/0xa0
May 19 15:35:56 landin kernel: [<ffffffff810141ca>] child_rip+0xa/0x20
May 19 15:35:56 landin kernel: [<ffffffff8108d810>] ? kthread+0x0/0xa0
May 19 15:35:56 landin kernel: [<ffffffff810141c0>] ? child_rip+0x0/0x20
May 19 15:35:56 landin kernel: Code: 48 8b 40 18 49 89 44 24 20 f0 41 80 0c 24 40 f0 41 80 4c 24 01 02 e9 e2 fd ff ff 0f 1f 44 00 00 41 bf e4 ff ff ff e9 d2 fd ff ff <0f> 0b eb fe 0f 0b eb fe 0f 0b 0f 1f 00 eb fb 66 66 66 66 66 2e 
May 19 15:35:56 landin kernel: RIP  [<ffffffffa009dcd3>] ext4_da_get_block_prep+0x2c3/0x2e0 [ext4]
May 19 15:35:56 landin kernel: RSP <ffff8808473d3590>
May 19 15:35:56 landin kernel: ---[ end trace 234f986d30d8da3c ]---
May 19 15:35:56 landin kernel: Kernel panic - not syncing: Fatal exception
May 19 15:35:56 landin kernel: Pid: 2757, comm: nfsd Tainted: G      D    2.6.32-19.el6.x86_64 #1
May 19 15:35:56 landin kernel: Call Trace:
May 19 15:35:56 landin kernel: [<ffffffff814bfd69>] panic+0x78/0x137
May 19 15:35:56 landin kernel: [<ffffffff814c3d1c>] oops_end+0xdc/0xf0
May 19 15:35:56 landin kernel: [<ffffffff8101723b>] die+0x5b/0x90
May 19 15:35:56 landin kernel: [<ffffffff814c35c4>] do_trap+0xc4/0x160
May 19 15:35:56 landin kernel: [<ffffffff81014cb5>] do_invalid_op+0x95/0xb0
May 19 15:35:56 landin kernel: [<ffffffffa009dcd3>] ? ext4_da_get_block_prep+0x2c3/0x2e0 [ext4]
May 19 15:35:56 landin kernel: [<ffffffff811489e5>] ? ____cache_alloc_node+0x95/0x150
May 19 15:35:56 landin kernel: [<ffffffff81013f5b>] invalid_op+0x1b/0x20
May 19 15:35:56 landin kernel: [<ffffffffa009dcd3>] ? ext4_da_get_block_prep+0x2c3/0x2e0 [ext4]
May 19 15:35:56 landin kernel: [<ffffffff8118dde3>] __block_prepare_write+0x1e3/0x590
May 19 15:35:56 landin kernel: [<ffffffffa009da10>] ? ext4_da_get_block_prep+0x0/0x2e0 [ext4]
May 19 15:35:56 landin kernel: [<ffffffff8118e334>] block_write_begin+0x64/0x100
May 19 15:35:56 landin kernel: [<ffffffffa00a052d>] ext4_da_write_begin+0x17d/0x290 [ext4]
May 19 15:35:56 landin kernel: [<ffffffffa009da10>] ? ext4_da_get_block_prep+0x0/0x2e0 [ext4]
May 19 15:35:56 landin kernel: [<ffffffff81102c0e>] generic_file_buffered_write+0x10e/0x2a0
May 19 15:35:56 landin kernel: [<ffffffff811047a0>] __generic_file_aio_write+0x250/0x480
May 19 15:35:56 landin kernel: [<ffffffff81104a3f>] generic_file_aio_write+0x6f/0xe0
May 19 15:35:56 landin kernel: [<ffffffffa0096160>] ? ext4_file_write+0x0/0xb0 [ext4]
May 19 15:35:56 landin kernel: [<ffffffffa0096199>] ext4_file_write+0x39/0xb0 [ext4]
May 19 15:35:56 landin kernel: [<ffffffff8115e29b>] do_sync_readv_writev+0xfb/0x140
May 19 15:35:56 landin kernel: [<ffffffff813edb2f>] ? release_sock+0xaf/0xc0
May 19 15:35:56 landin kernel: [<ffffffff8108dc00>] ? autoremove_wake_function+0x0/0x40
May 19 15:35:56 landin kernel: [<ffffffff811f967b>] ? selinux_file_permission+0xfb/0x150
May 19 15:35:56 landin kernel: [<ffffffff811ec646>] ? security_file_permission+0x16/0x20
May 19 15:35:56 landin kernel: [<ffffffff8115f24f>] do_readv_writev+0xcf/0x1f0
May 19 15:35:56 landin kernel: [<ffffffff81120ea9>] ? kmemdup+0x29/0x50
May 19 15:35:56 landin kernel: [<ffffffff81096396>] ? groups_alloc+0x46/0xf0
May 19 15:35:56 landin kernel: [<ffffffff811ec9c6>] ? security_task_setgroups+0x16/0x20
May 19 15:35:56 landin kernel: [<ffffffff81096165>] ? set_groups+0x25/0x1a0
May 19 15:35:56 landin kernel: [<ffffffff8115f3b6>] vfs_writev+0x46/0x60
May 19 15:35:56 landin kernel: [<ffffffffa0271fd0>] nfsd_vfs_write+0xe0/0x440 [nfsd]
May 19 15:35:56 landin kernel: [<ffffffffa02707b2>] ? nfsd_setuser_and_check_port+0x62/0xb0 [nfsd]
May 19 15:35:56 landin kernel: [<ffffffff810d4a15>] ? call_rcu_sched+0x15/0x20
May 19 15:35:56 landin kernel: [<ffffffffa0271aa1>] ? nfsd_permission+0x131/0x190 [nfsd]
May 19 15:35:56 landin kernel: [<ffffffffa0274589>] nfsd_write+0x99/0x100 [nfsd]
May 19 15:35:56 landin kernel: [<ffffffffa027f4d0>] nfsd4_write+0x100/0x130 [nfsd]
May 19 15:35:56 landin kernel: [<ffffffffa027fe51>] nfsd4_proc_compound+0x3d1/0x4d0 [nfsd]
May 19 15:35:56 landin kernel: [<ffffffffa026d3fa>] nfsd_dispatch+0xba/0x250 [nfsd]
May 19 15:35:56 landin kernel: [<ffffffffa0208ec4>] svc_process_common+0x344/0x610 [sunrpc]
May 19 15:35:56 landin kernel: [<ffffffffa02094d0>] svc_process+0x110/0x150 [sunrpc]
May 19 15:35:56 landin kernel: [<ffffffffa026daf6>] nfsd+0xd6/0x190 [nfsd]
May 19 15:35:56 landin kernel: [<ffffffffa026da20>] ? nfsd+0x0/0x190 [nfsd]

FWIW the memory on both machines survives memtest86+.

Comment 3 Josef Bacik 2010-05-19 15:43:16 UTC

*** This bug has been marked as a duplicate of bug 576202 ***


Note You need to log in before you can comment on or make changes to this bug.