Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.

Bug 453897

Summary: Consistent kernel panics with most of our 3 GFS nodes all pointing to the same line and file: Kernel panic: GFS: Assertion failed on line 1227 of file rgrp.c
Product: [Retired] Red Hat Cluster Suite Reporter: Dennis <n3dlinux>
Component: GFS-kernelAssignee: Ben Marzinski <bmarzins>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Cluster QE <mspqa-list>
Severity: urgent Docs Contact:
Priority: low    
Version: 3CC: edamato
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2008-09-26 16:22:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Dennis 2008-07-03 02:42:40 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686 (x86_64); en-US; rv:1.9) Gecko/2008052912 Firefox/3.0

Description of problem:
Were getting consistent kernel panics with most of our GFS nodes all 
pointing to the same line and file:

Kernel panic: GFS: Assertion failed on line 1227 of file rgrp.c

Configured are 3 nodes as lock_gulm servers and also gfs clients. 1 gnbd storage. IBMx3650 are 2 nodes, IBMx346 is another node and IBMx346 as gnbd server.

GNBD server serve only 1 GFS file system. /home is mounted on all 3 nodes w/ capacity of 500GBytes serving as mailbox for our mail systems.

Currently all are running the same version of kernel, GFS and GFS modules 
as list below.

- GFS-modules-smp-6.0.2.27-0
- GFS-6.0.2.27-0

- 2.4.21-37.ELsmp
Jun 28 20:35:49 drgenesis kernel: e5b61bac f8ea8b72 00000246 00001000 e5a44100 f8fc4000 e5a44100 f8ea8d70
Jun 28 20:35:49 drgenesis kernel:        00000246 000001f0 00000000 553a5ba8 f4d1f330 00000005 00000004 ffffffff
Jun 28 20:35:49 drgenesis kernel:        f8ec26d0 f8ec9c8f f8ec9bc4 000004cb 00000016 efcf3e00 00000006 f8fc4000
Jun 28 20:35:49 drgenesis kernel: Call Trace:   [<f8ea8b72>] gfs_asserti [gfs] 0x32 (0xe5b61bb0)
Jun 28 20:35:49 drgenesis kernel: [<f8ea8d70>] gmalloc [gfs] 0x20 (0xe5b61bc8)
Jun 28 20:35:49 drgenesis kernel: [<f8ec26d0>] blkalloc_internal [gfs] 0x130 (0xe5b61bec)
Jun 28 20:35:49 drgenesis kernel: [<f8ec9c8f>] .rodata.str1.1 [gfs] 0x1da3 (0xe5b61bf0)
Jun 28 20:35:49 drgenesis kernel: [<f8ec9bc4>] .rodata.str1.1 [gfs] 0x1cd8 (0xe5b61bf4)
Jun 28 20:35:49 drgenesis kernel: [<f8ec2b8b>] gfs_blkalloc [gfs] 0x7b (0xe5b61c20)
Jun 28 20:35:49 drgenesis kernel: [<f8e9c90c>] get_datablock [gfs] 0xfc (0xe5b61c4c)
Jun 28 20:35:49 drgenesis kernel: [<f8e9cc43>] gfs_block_map [gfs] 0x333 (0xe5b61c70)
Jun 28 20:35:49 drgenesis kernel: [<c0149093>] find_or_create_page [kernel] 0x63 (0xe5b61c9c)
Jun 28 20:35:49 drgenesis kernel: [<f8e8d08c>] gfs_dgetblk [gfs] 0x3c (0xe5b61cec)
Jun 28 20:35:49 drgenesis kernel: [<f8ec17bb>] gfs_rgrp_read [gfs] 0xab (0xe5b61d10)
Jun 28 20:35:49 drgenesis kernel: [<f8e96239>] get_block [gfs] 0xb9 (0xe5b61d28)
Jun 28 20:35:49 drgenesis kernel: [<c016814b>] __block_prepare_write [kernel] 0x1ab (0xe5b61d64)
Jun 28 20:35:49 drgenesis kernel: [<c0168b09>] block_prepare_write [kernel] 0x39 (0xe5b61da8)
Jun 28 20:35:49 drgenesis kernel: [<f8e96180>] get_block [gfs] 0x0 (0xe5b61dbc)
Jun 28 20:35:49 drgenesis kernel: [<f8e968fc>] gfs_prepare_write [gfs] 0x12c (0xe5b61dc8)
Jun 28 20:35:49 drgenesis kernel: [<f8e96180>] get_block [gfs] 0x0 (0xe5b61dd8)
Jun 28 20:35:49 drgenesis kernel: [<c014c053>] do_generic_file_write [kernel] 0x1e3 (0xe5b61df4)
Jun 28 20:35:49 drgenesis kernel: [<f8e90bab>] do_do_write [gfs] 0x2ab (0xe5b61e48)
Jun 28 20:35:49 drgenesis kernel: [<f8e90feb>] do_write [gfs] 0x18b (0xe5b61e94)
Jun 28 20:35:49 drgenesis kernel: [<f8e8ef1e>] gfs_walk_vma [gfs] 0x12e (0xe5b61ed0)
Jun 28 20:35:49 drgenesis kernel: [<f8eab4d7>] gfs_glock_nq_init [gfs] 0x37 (0xe5b61f2c)
Jun 28 20:35:49 drgenesis kernel: [<f8eab513>] gfs_glock_dq_uninit [gfs] 0x13 (0xe5b61f3c)
Jun 28 20:35:49 drgenesis kernel: [<f8e8ede7>] gfs_llseek [gfs] 0xc7 (0xe5b61f48)
Jun 28 20:35:49 drgenesis kernel: [<f8e910c1>] gfs_write [gfs] 0x91 (0xe5b61f6c)
Jun 28 20:35:49 drgenesis kernel: [<f8e90e60>] do_write [gfs] 0x0 (0xe5b61f80)
Jun 28 20:35:49 drgenesis kernel: [<c0164b27>] sys_write [kernel] 0x97 (0xe5b61f94)
Jun 28 20:35:49 drgenesis kernel:
Jun 28 20:35:49 drgenesis kernel: Kernel panic: GFS: Assertion failed on line 1227 of file rgrp.c
Jun 28 20:35:49 drgenesis kernel: GFS: assertion: "x <= length"
Jun 28 20:35:49 drgenesis kernel: GFS: time = 1214656549
Jun 28 20:35:49 drgenesis kernel: GFS: fsid=alpha:home.2: RG = 64975595
Jun 28 20:35:49 drgenesis kernel:
Jun 29 11:00:06 drgenesis syslogd 1.4.1: restart.
Jun 29 11:00:06 drgenesis syslog: syslogd startup succeeded



Version-Release number of selected component (if applicable):
kernel-2.4.21-37.ELsmp,  GFS-modules-smp-6.0.2.27-0 , GFS-6.0.2.27-0 

How reproducible:
Always


Steps to Reproduce:
If one of 3 nodes failed we do this manually,
1. Load the gfs modules (gnbd,gfs,pool,lock_gulm)
2. Start the gnbd_import
3. Start the pool,ccsd,lock_gulmd and gfs

Actual Results:
After 8 or 12 hours of joining it into cluster. One or two nodes will be panic, like above errors

Expected Results:


Additional info:

Comment 1 Ben Marzinski 2008-07-03 20:29:37 UTC
I see that you filed the bug under gnbd-kernel, has anything happened to lead
you to believe that gnbd is the cause of this problem?

Also, is it possible to upgrade to the most recent kernel and GFS-modules packages?

Comment 2 Ben Marzinski 2008-07-03 20:38:56 UTC
What kind of load are you running on the filesystems?

Comment 3 Dennis 2008-07-04 07:32:36 UTC
(In reply to comment #1)
> I see that you filed the bug under gnbd-kernel, has anything happened to lead
> you to believe that gnbd is the cause of this problem?
> 
> Also, is it possible to upgrade to the most recent kernel and GFS-modules
packages?

We already upgraded it into higher version of kernel and GFS.
kernel-smp-2.4.21-50.EL.i686.rpm, GFS-modules-smp-6.0.2.30-0.i386.rpm and
GFS-6.0.2.30-0.i386.rpm yesterday, but no to avail. We'll try to do gfs_fsck to
it, but it might take 8 to 11 hours for +300GB of size.


Comment 4 Dennis 2008-07-04 07:38:23 UTC
Here's the new error we have encountered after kernel and GFS upgrade in 1 our
node. This server serves POP.

Jul  4 10:18:11 drexodus kernel: Bad metadata at 64975751, should be 5
Jul  4 10:18:11 drexodus kernel:   mh_magic = 0x01161970
Jul  4 10:18:11 drexodus kernel:   mh_type = 4
Jul  4 10:18:11 drexodus kernel:   mh_generation = 375
Jul  4 10:18:11 drexodus kernel:   mh_format = 400
Jul  4 10:18:11 drexodus kernel:   mh_incarn = 123
Jul  4 10:18:11 drexodus kernel: db6a3b8c f8f1afa2 00000001 c0387e98 00000000
00000246 00000012 00000000
Jul  4 10:18:11 drexodus kernel:        c01298c3 0000000a 00000400 f8f3b831
db6a3bfc cde536b0 00000030 00000000
Jul  4 10:18:11 drexodus kernel:        f8f0052d f8f3c848 f8f3a43a 000004e5
00000013 f8f67000 db6a3bf8 cde53810
Jul  4 10:18:11 drexodus kernel: Call Trace:   [<f8f1afa2>] gfs_asserti [gfs]
0x32 (0xdb6a3b90)
Jul  4 10:18:11 drexodus kernel: [<c01298c3>] printk [kernel] 0x153 (0xdb6a3bac)
Jul  4 10:18:11 drexodus kernel: [<f8f3b831>] .rodata.str1.1 [gfs] 0x14c5
(0xdb6a3bb8)
Jul  4 10:18:11 drexodus kernel: [<f8f0052d>] gfs_get_meta_buffer [gfs] 0x29d
(0xdb6a3bcc)
Jul  4 10:18:11 drexodus kernel: [<f8f3c848>] .rodata.str1.4 [gfs] 0x3bc
(0xdb6a3bd0)
Jul  4 10:18:11 drexodus kernel: [<f8f3a43a>] .rodata.str1.1 [gfs] 0xce (0xdb6a3bd4)
Jul  4 10:18:11 drexodus kernel: [<f8f0ec3b>] gfs_block_map [gfs] 0x2eb (0xdb6a3c2c)
Jul  4 10:18:11 drexodus kernel: [<c011c610>] flush_tlb_all_ipi [kernel] 0x0
(0xdb6a3c54)
Jul  4 10:18:11 drexodus kernel: [<c01629a8>] map_new_virtual [kernel] 0x1a8
(0xdb6a3c9c)
Jul  4 10:18:11 drexodus kernel: [<f8f08249>] get_block [gfs] 0xb9 (0xdb6a3ce4)
Jul  4 10:18:11 drexodus kernel: [<c0168dd6>] block_read_full_page [kernel]
0x2e6 (0xdb6a3d20)
Jul  4 10:18:11 drexodus kernel: [<c0159ba4>] __alloc_pages [kernel] 0xc4
(0xdb6a3d60)
Jul  4 10:18:11 drexodus kernel: [<f8f086e2>] gfs_readpage [gfs] 0x82 (0xdb6a3d84)
Jul  4 10:18:11 drexodus kernel: [<f8f08190>] get_block [gfs] 0x0 (0xdb6a3d8c)
Jul  4 10:18:11 drexodus kernel: [<c0148cca>] add_to_page_cache_unique [kernel]
0x5a (0xdb6a3d90)
Jul  4 10:18:11 drexodus kernel: [<c0148f21>] page_cache_read [kernel] 0xe1
(0xdb6a3da4)
Jul  4 10:18:11 drexodus kernel: [<c0149947>] generic_file_readahead [kernel]
0xd7 (0xdb6a3dcc)
Jul  4 10:18:11 drexodus kernel: [<c0149f24>] do_generic_file_read [kernel]
0x4d4 (0xdb6a3de8)
Jul  4 10:18:11 drexodus kernel: [<c014a7db>] generic_file_new_read [kernel]
0xbb (0xdb6a3e28)
Jul  4 10:18:11 drexodus kernel: [<c014a620>] file_read_actor [kernel] 0x0
(0xdb6a3e38)
Jul  4 10:18:11 drexodus kernel: [<c014a91f>] generic_file_read [kernel] 0x3f
(0xdb6a3e7c)
Jul  4 10:18:11 drexodus kernel: [<f8f01aa4>] do_read [gfs] 0x1a4 (0xdb6a3e9c)
Jul  4 10:18:11 drexodus kernel: [<f8f00f3e>] gfs_walk_vma [gfs] 0x12e (0xdb6a3ed0)
Jul  4 10:18:11 drexodus kernel: [<c0134f2d>] update_process_time_intertick
[kernel] 0x7d (0xdb6a3f30)
Jul  4 10:18:11 drexodus kernel: [<f8f00d40>] gfs_llseek [gfs] 0x0 (0xdb6a3f38)
Jul  4 10:18:11 drexodus kernel: [<f8f00d8c>] gfs_llseek [gfs] 0x4c (0xdb6a3f48)
Jul  4 10:18:11 drexodus kernel: [<f8f01b1e>] gfs_read [gfs] 0x6e (0xdb6a3f6c)
Jul  4 10:18:11 drexodus kernel: [<f8f01900>] do_read [gfs] 0x0 (0xdb6a3f80)
Jul  4 10:18:11 drexodus kernel: [<c0165127>] sys_read [kernel] 0x97 (0xdb6a3f94)
Jul  4 10:18:11 drexodus kernel: [<c02af06f>] no_timing [kernel] 0x7 (0xdb6a3fc0)
Jul  4 10:18:11 drexodus kernel:
   Jul  4 10:18:11 drexodus kernel:
Jul  4 10:18:11 drexodus kernel: Kernel panic: GFS: Assertion failed on line
1253 of file linux_dio.c
Jul  4 10:18:11 drexodus kernel: GFS: assertion: "metatype_check_magic ==
GFS_MAGIC && metatype_check_type == ((height) ? (5) : (4))"
Jul  4 10:18:11 drexodus kernel: GFS: time = 1215137891
Jul  4 10:18:11 drexodus kernel: GFS: fsid=alpha:home.2
Jul  4 10:18:11 drexodus kernel:
Jul  4 11:33:10 drexodus syslogd 1.4.1: restart.
                                                                               
                

Comment 5 Ben Marzinski 2008-07-07 19:17:31 UTC
Have you run gfs_fsck? That definitely looks like it could be filesystem corruption.

Comment 6 Dennis 2008-07-08 08:04:10 UTC
(In reply to comment #5)
> Have you run gfs_fsck? That definitely looks like it could be filesystem
corruption.

Yes we did that. We're monitoring its performance. If its not panic for 24
hours. We'll declare this as resolve. :-).

Comment 7 Ben Marzinski 2008-09-26 16:22:20 UTC
There's no way to know the cause of this corruption. Since it's not recreateable, there's really nothing that can be done.