Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 235686 - Kernel BUG at dm_cmirror_server while recovering region
Summary: Kernel BUG at dm_cmirror_server while recovering region
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: cmirror
Version: 4
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Jonathan Earl Brassow
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-04-09 16:27 UTC by Corey Marthaler
Modified: 2010-01-12 02:03 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-08-05 21:41:53 UTC


Attachments (Terms of Use)

Description Corey Marthaler 2007-04-09 16:27:16 UTC
Description of problem:
I was running a standard cmirror failure scenario and while the recovering was
taking place, link-08 died:

Senario: Kill primary leg of synced 2 leg mirror

****** Mirror hash info for this scenario ******
* name:      fail_primary_synced_2_legs
* sync:      1
* disklog:   1
* failpv:    /dev/sdb1
* legs:      2
* pvs:       /dev/sdb1 /dev/sdf1 /dev/sde1
************************************************

Creating mirror on link-02...
qarsh root@link-02 lvcreate -m 1 -n fail_primary_synced_2_legs -L 800M
helter_skelter /dev/sdb1:0-500 /dev/sdf1:0-500 /dev/sde1:0-50

Waiting for mirror to sync
Verifying that the mirror is fully syncd, currently at
 ...6.50% ...17.50% ...28.50% ...39.00% ...49.50% ...60.50% ...70.50% ...81.00%
...91.50% ...100.00%

Creating gfs on top of mirror on link-02...
Creating mnt point /mnt/fail_primary_synced_2_legs on link-02...
Mounting gfs on link-02...
Creating mnt point /mnt/fail_primary_synced_2_legs on link-04...
Mounting gfs on link-04...
Creating mnt point /mnt/fail_primary_synced_2_legs on link-07...
Mounting gfs on link-07...
Creating mnt point /mnt/fail_primary_synced_2_legs on link-08...
Mounting gfs on link-08...

Disabling device sdb on link-02
Disabling device sda on link-04
Disabling device sdb on link-07
Disabling device sdb on link-08

Attempting I/O to cause mirror down conversion on link-02
10+0 records in
10+0 records out
Verifying the down conversion from mirror to linear
  /dev/sdb1: open failed: No such device or address
[HANG]



LINK-04:
GFS: fsid=LINK_128:gfs.3: Joined cluster. Now mounting FS...
GFS: fsid=LINK_128:gfs.3: jid=3: Trying to acquire journal lock...
GFS: fsid=LINK_128:gfs.3: jid=3: Looking at journal...
GFS: fsid=LINK_128:gfs.3: jid=3: Done
Apr  9 06:23:23 link-08 kernel: GFS: fsid=LINK_128:gfs.3: Joined cluster. Now
mounting FS...
Apr  9 06:23:23 link-08 kernel: GFS: fsid=LINK_128:gfs.3: jid=3: Trying to
acquire journal lock...
Apr  9 06:23:23 link-08 kernel: GFS: fsid=LINK_128:gfs.3: jid=3: Looking at
journal...
Apr  9 06:23:23 link-08 kernel: GFS: fsid=LINK_128:gfs.3: jid=3: Done
Apr  9 06:23:23 link-08 qarshd[5509]: That's enough
Apr  9 06:23:28 link-08 qarshd[5541]: Talking to peer 10.15.80.47:35237
Apr  9 06:23:28 link-08 qarshd[5541]: Running cmdline: echo offline >
/sys/block/sdb/device/state
Apr  9 06:23:28 link-08 qarshd[5541]: That's enough
dm-cmirror: Recovering region mismatch: (5/18446744073709551615)
----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at dm_cmirror_server:703
invalid operand: 0000 [1] SMP
CPU 0
Modules linked in: lock_dlm(U) gfs(U) lock_harness(U) dm_cmirror(U) dlm(U)
cman(U) md5 ipv6 parport_pc lp parport autofs4 sunrpc ds yenta_socket
pcmcia_core button battery ac ohci_hcd hw_random k8_edac edac_mc tg3 floppy
dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod qla2300 qla2xxx scsi_transport_fc
mptscsih mptsas mptspi mptscsi mptbase sd_mod scsi_mod
Pid: 5467, comm: cluster_log_ser Not tainted 2.6.9-50.ELsmp
RIP: 0010:[<ffffffffa025be6e>]
<ffffffffa025be6e>{:dm_cmirror:cluster_log_serverd+3721}
RSP: 0018:000001001a805e28  EFLAGS: 00010212
RAX: 0000000000000044 RBX: 000001003d66c920 RCX: ffffffff803e5768
RDX: ffffffff803e5768 RSI: 0000000000000246 RDI: ffffffff803e5760
RBP: 000001003c1f2680 R08: ffffffff803e5768 R09: 000001003d66c920
R10: 0000000100000000 R11: ffffffff8011e9a4 R12: 000001001f789400
R13: ffffffffa02617a0 R14: 0000000000000001 R15: 000001003c1f26a0
FS:  0000002a95562b00(0000) GS:ffffffff804ee200(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00000031d2f2f8d8 CR3: 0000000000101000 CR4: 00000000000006e0
Process cluster_log_ser (pid: 5467, threadinfo 000001001a804000, task
000001001b6887f0)
Stack: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
       0000000000000000 0000000000000000 000001003b5434c0 000001003b5434c0
       000001001a805eb8 0000000000000010
Call Trace:<ffffffff80134660>{default_wake_function+0}
<ffffffff80110f47>{child_rip+8}
       <ffffffffa025afe5>{:dm_cmirror:cluster_log_serverd+0}
       <ffffffff80110f3f>{child_rip+0}

Code: 0f 0b 52 e1 25 a0 ff ff ff ff bf 02 49 c7 44 24 40 ff ff ff
RIP <ffffffffa025be6e>{:dm_cmirror:cluster_log_serverd+3721} RSP <000001001a805e28>
 <0>Kernel panic - not syncing: Oops


Version-Release number of selected component (if applicable):
2.6.9-50.ELsmp
cmirror-kernel-2.6.9-28.0

How reproducible:
Only once so far

Comment 1 Jonathan Earl Brassow 2007-04-09 16:54:59 UTC
no machines killed during this test?


Comment 2 Corey Marthaler 2007-04-09 17:55:57 UTC
no machines were killed as apart of this test.

Comment 3 Jonathan Earl Brassow 2007-04-09 17:59:19 UTC
Here is the code causing the machine to panic:

			if (lr->u.lr_region != lc->recovering_region) {
				DMERR("Recovering region mismatch: (%Lu/%Lu)",
				      lr->u.lr_region, lc->recovering_region);
				BUG();
			}

The question is, how do we have a record state saying that there is a region in
recovery _and_ lc->recovering_region saying that there is not a region in recovery.



Comment 4 Jonathan Earl Brassow 2007-04-10 07:14:00 UTC
Several fixes have gone in to fix the handling of this bug:
1) During server relocation (which can happen due to machine failure
or normal mirror suspension), the server value could get set before
the client had a chance to clean-up.  This caused the server to
become confused and issue a BUG().

2) perform a flush of the log before suspending.  This ensures
that regions which are in-sync get correctly flushed to the disk
log.  Without this, there will always be recovery work to be done
when a mirror starts up - even if it was properly in-sync during
shutdown.

3) clean-up memory used to record region users when a mirror is
shutdown.  It was possible for some regions to be left over
(causing a memory leak) during certain fault scenarios.

4) properly initialize the state field (ru_rw) in the region
user structure when a mark occurs.  Without the initialization,
it was sometimes possible for the region to be misinterpretted
as recovering instead of marked.

5) resolve and unhandled case in server_complete_resync_work

6) reset a variable in cluster_complete_resync_work.  Failure
to do so was causing a retry to include the wrong value for
the completion of the resync work - confusing the server.

assigned -> post


Comment 5 Rob Kenna 2007-04-10 10:55:44 UTC
Important repair for cluster mirror release in 4.5.  pm-ack

Comment 6 Corey Marthaler 2007-04-10 18:20:07 UTC
This BUG can still be tripped when attempting the 'non synced secondary leg
failure' scenario. Marking back to ASSIGNED, with the QA Whiteboard tag.

Comment 7 Jonathan Earl Brassow 2007-04-10 18:23:26 UTC
Here are the comments I put into the code validating the scenario encountered
and obviating the need for the BUG();

<               BUG();
---
>               /*
>                * This is a valid case, when the following happens:
>                * 1) a region is recovering and has waiting writes
>                * 2) recovery fails and calls complete_resync_work (w/ failure)
>                * 2.1) RU is removed from our list
>                * 3) waiting writes are released
>                * 3.1) writes do not mark, because b/c region state != RH_CLEAN
>                * 4) write fails and calls complete_resync_work (w/ failure)
>                * 5) boom, we are here.
>                *
>                * Not a bug to be here
>                */

assigned -> post


Comment 8 Jonathan Earl Brassow 2007-08-22 17:37:25 UTC
post -> modified.


Comment 9 Corey Marthaler 2007-08-24 18:42:06 UTC
Fix verified with the latest code.
cmirror-kernel-2.6.9-33.2
2.6.9-56.ELsmp

Comment 10 Chris Feist 2008-08-05 21:41:53 UTC
Fixed in current release (4.7).


Note You need to log in before you can comment on or make changes to this bug.