Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 233706 - failed cluster node causes mirror recovery region requests to get stuck in loop
Summary: failed cluster node causes mirror recovery region requests to get stuck in loop
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: cmirror
Version: 4
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Jonathan Earl Brassow
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-03-23 21:27 UTC by Corey Marthaler
Modified: 2010-04-27 15:00 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-04-27 15:00:03 UTC


Attachments (Terms of Use)

Description Corey Marthaler 2007-03-23 21:27:06 UTC
Description of problem:
I was running revolver on the 4 node x86_64 link cluster with 2 gfs filesystems
(one w/ 2 legs and one with 3 legs). During the second iteration I failed
link-04 and that cause the mirror recovery to go hay wire. When I brought
link-04 back into the cluster and attempt to remount the first gfs, it hung.

I'll leave the cluster in this state if you'd like to gather more info then
provided below:


Messages from link-07:
[...]
dm-cmirror: unable to notify server of completed resync work
Mar 23 15:19:50 link-07 kernel: dm-cmirror: unable to get server (3) to mark
region (8
415)
Mar 23 15:19:50 link-07 kernel: dm-cmirror: Reason :: 1
Mar 23 15:19:50 link-07 kernel: dm-cmirror: unable to get server (3) to mark
region (5
793)
Mar 23 15:19:50 link-07 kernel: dm-cmirror: Reason :: 1
Mar 23 15:19:50 link-07 kernel: dm-cmirror: unable to get server (3) to mark
region (5                       795)
Mar 23 15:19:50 link-07 kernel: dm-cmirror: Reason :: 1
Mar 23 15:19:50 link-07 kernel: dm-cmirror: unable to get server (3) to mark
region (6                       273)
Mar 23 15:19:50 link-07 kernel: dm-cmirror: Reason :: 1
Mar 23 15:19:51 link-07 kernel: dm-cmirror: unable to notify server of completed
resyn                       c work
dm-cmirror: unable to get server (3) to mark region (8192)
dm-cmirror: Reason :: 1
Mar 23 15:20:00 link-07 kernel: dm-cmirror: unable to get server (3) to mark
region (8                       192)
Mar 23 15:20:00 link-07 kernel: dm-cmirror: Reason :: 1
dm-cmirror: unable to get server (3) to mark region (2067)
dm-cmirror: Reason :: 1
Mar 23 15:20:35 link-07 kernel: dm-cmirror: unable to get server (3) to mark
region (2                       067)
Mar 23 15:20:35 link-07 kernel: dm-cmirror: Reason :: 1
dm-cmirror: unable to get server (3) to mark region (2067)
dm-cmirror: Reason :: 1
Mar 23 15:20:35 link-07 kernel: dm-cmirror: unable to get server (3) to mark
region (2                       067)
Mar 23 15:20:35 link-07 kernel: dm-cmirror: Reason :: 1


Messages from link-08 (looping over and over):
[...]
Mar 23 11:44:21 link-08 kernel: dm-cmirror: Attempt to mark a region
5578/C33UfFkJ which is being recovered.
Mar 23 11:44:21 link-08 kernel: dm-cmirror: Current recoverer: 1
Mar 23 11:44:21 link-08 kernel: dm-cmirror: Mark requester   : 4
Mar 23 11:44:21 link-08 kernel: dm-cmirror: Attempt to mark a region
5578/C33UfFkJ which is being recovered.
Mar 23 11:44:21 link-08 kernel: dm-cmirror: Current recoverer: 1
Mar 23 11:44:21 link-08 kernel: dm-cmirror: Mark requester   : 4
Mar 23 11:44:22 link-08 kernel: dm-cmirror: Attempt to mark a region
5578/C33UfFkJ which is being recovered.
Mar 23 11:44:22 link-08 kernel: dm-cmirror: Current recoverer: 1
Mar 23 11:44:22 link-08 kernel: dm-cmirror: Mark requester   : 4
Mar 23 11:44:22 link-08 kernel: dm-cmirror: Attempt to mark a region
5578/C33UfFkJ which is being recovered.
Mar 23 11:44:22 link-08 kernel: dm-cmirror: Current recoverer: 1
Mar 23 11:44:22 link-08 kernel: dm-cmirror: Mark requester   : 4
Mar 23 11:44:23 link-08 kernel: dm-cmirror: Attempt to mark a region
5578/C33UfFkJ which is being recovered.
Mar 23 11:44:23 link-08 kernel: dm-cmirror: Current recoverer: 1
Mar 23 11:44:23 link-08 kernel: dm-cmirror: Mark requester   : 4
Mar 23 11:44:23 link-08 kernel: dm-cmirror: Attempt to mark a region
5578/C33UfFkJ which is being recovered.
Mar 23 11:44:23 link-08 kernel: dm-cmirror: Current recoverer: 1
Mar 23 11:44:23 link-08 kernel: dm-cmirror: Mark requester   : 4


[root@link-07 ~]# dmsetup table
revolver-mirror2_mimage_2: 0 10485760 linear 8:49 384
revolver-mirror1_mlog: 0 8192 linear 8:113 384
revolver-mirror2_mimage_1: 0 10485760 linear 8:33 384
revolver-mirror2_mimage_0: 0 10485760 linear 8:1 10486144
revolver-mirror2_mlog: 0 8192 linear 8:17 10486144
revolver-mirror1_mimage_1: 0 10485760 linear 8:17 384
revolver-mirror1_mimage_0: 0 10485760 linear 8:1 384
revolver-mirror2: 0 10485760 mirror clustered_disk 5 253:6 1024
LVM-xVWv7JiOsSgNPv95Lg9FU6ckwsTQeik3U9Iz0MnCtDa0QV7z8Qpsi749eaAIovqe nosync
block_on_error 3 253:7 0 253:8 0 253:9 0
VolGroup00-LogVol01: 0 4063232 linear 3:2 151781760
revolver-mirror1: 0 10485760 mirror clustered_disk 5 253:2 1024
LVM-xVWv7JiOsSgNPv95Lg9FU6ckwsTQeik34coTfAbowArYJ4dLpVYZKagWC33UfFkJ nosync
block_on_error 2 253:3 0 253:4 0
VolGroup00-LogVol00: 0 151781376 linear 3:2 384
[root@link-07 ~]# dmsetup info
Name:              revolver-mirror2_mimage_2
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 9
Number of targets: 1
UUID: LVM-xVWv7JiOsSgNPv95Lg9FU6ckwsTQeik3isDcxNcz4wZdJDn4Xe8iZUruUQJ4ZRTe

Name:              revolver-mirror1_mlog
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 2
Number of targets: 1
UUID: LVM-xVWv7JiOsSgNPv95Lg9FU6ckwsTQeik34coTfAbowArYJ4dLpVYZKagWC33UfFkJ

Name:              revolver-mirror2_mimage_1
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 8
Number of targets: 1
UUID: LVM-xVWv7JiOsSgNPv95Lg9FU6ckwsTQeik3mZUe0RrJKkU91zuIoqHmYHWD8uboH8f5

Name:              revolver-mirror2_mimage_0
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 7
Number of targets: 1
UUID: LVM-xVWv7JiOsSgNPv95Lg9FU6ckwsTQeik34L8Fe5dimlQkHP0J4VRTo6VYGF1Vl6yp

Name:              revolver-mirror2_mlog
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 6
Number of targets: 1
UUID: LVM-xVWv7JiOsSgNPv95Lg9FU6ckwsTQeik3U9Iz0MnCtDa0QV7z8Qpsi749eaAIovqe

Name:              revolver-mirror1_mimage_1
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 4
Number of targets: 1
UUID: LVM-xVWv7JiOsSgNPv95Lg9FU6ckwsTQeik3kUiOVkdLj6U18LdGB92sLcSI1TO7Rgts

Name:              revolver-mirror1_mimage_0
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 3
Number of targets: 1
UUID: LVM-xVWv7JiOsSgNPv95Lg9FU6ckwsTQeik3W26y7KjUX3f6NjgBr0GjYrdSuxMZHA4b

Name:              revolver-mirror2
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      1
Major, minor:      253, 10
Number of targets: 1
UUID: LVM-xVWv7JiOsSgNPv95Lg9FU6ckwsTQeik3OTZEJ9N5A6CnpxcgWLLsFYES7vrRWrGE

Name:              VolGroup00-LogVol01
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 1
Number of targets: 1
UUID: LVM-8qGbKfLuKYoljGNFE1gsS77AYQM3dC4xQjIYEP6InPgUU5nsDPYSZl5EAEKqRWcY

Name:              revolver-mirror1
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      1
Major, minor:      253, 5
Number of targets: 1
UUID: LVM-xVWv7JiOsSgNPv95Lg9FU6ckwsTQeik33eay0GrlKTcJaZKdaEAig2hY2MNmHS5q

Name:              VolGroup00-LogVol00
State:             ACTIVE
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      253, 0
Number of targets: 1
UUID: LVM-8qGbKfLuKYoljGNFE1gsS77AYQM3dC4xrapcuzOGNgADTzIRUNTk0MZBbtWAyXhh



Version-Release number of selected component (if applicable):
2.6.9-50.ELsmp
cmirror-kernel-2.6.9-25.0

Comment 1 Jonathan Earl Brassow 2007-03-24 04:30:59 UTC
'Reason' should be a negative number.  This suggests that the client is
recieving a message from the server that is not a response that it is expecting.

sequence numbers were put int (3/22/2007) to fix this problem.  The
cmirror-kernel package you are using was built 3/14/2007.

new -> post

Comment 2 Jonathan Earl Brassow 2007-04-03 20:12:40 UTC
post -> modified

Comment 3 Corey Marthaler 2007-04-12 18:39:37 UTC
Fix verified in cmirror-kernel-2.6.9-30.0.

Comment 5 Alasdair Kergon 2010-04-27 15:00:03 UTC
Assuming this VERIFIED fix got released.  Closing.
Reopen if it's not yet resolved.


Note You need to log in before you can comment on or make changes to this bug.