Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 228104 - greater than 2 legged cluster mirrors do not down convert when a leg fails
Summary: greater than 2 legged cluster mirrors do not down convert when a leg fails
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: cmirror
Version: 4
Hardware: All
OS: Linux
high
medium
Target Milestone: ---
Assignee: Jonathan Earl Brassow
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-02-09 23:36 UTC by Corey Marthaler
Modified: 2010-01-12 02:02 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-08-05 21:42:50 UTC


Attachments (Terms of Use)

Description Corey Marthaler 2007-02-09 23:36:03 UTC
Description of problem:
This is basically the root cause of bz 228067 and bz 228070, and this may be a
single node mirroring issuse and not cluster specific. Unlike a two legged
mirror, when a three or greater legged mirror has a leg failure, is doesn't
properly down convert it, thus causing problems with whatever is currently using
the mirror.

Version-Release number of selected component (if applicable):
2.6.9-46.ELsmp
lvm2-2.02.21-1.el4
lvm2-cluster-2.02.21-3.el4

How reproducible:
everytime

Comment 1 Corey Marthaler 2007-02-09 23:44:38 UTC
Hmmm, this appears to work just fine in single node mirroring.


[root@link-07 ~]# lvs -a -o +devices
  LV                VG   Attr   LSize  Origin Snap%  Move Log         Copy% 
Devices                  
  mirror            vg   mwi-ao 10.00G                    mirror_mlog  10.70
mirror_mimage_0(0),mirror_mimage_1(0),mirror_mimage_2(0),mirror_mimage_3(0)
  [mirror_mimage_0] vg   iwi-ao 10.00G                                      
/dev/sdh1(0)             
  [mirror_mimage_1] vg   iwi-ao 10.00G                                      
/dev/sda1(0)             
  [mirror_mimage_2] vg   iwi-ao 10.00G                                      
/dev/sdb1(0)             
  [mirror_mimage_3] vg   iwi-ao 10.00G                                      
/dev/sdc1(0)             
  [mirror_mlog]     vg   lwi-ao  4.00M                                      
/dev/sdd1(0)             


# FAIL /dev/sdh and wait...

[root@link-07 ~]# lvs -a -o +devices
  /dev/sdh: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdh1: read failed after 0 of 2048 at 0: Input/output error
  /dev/sdh2: read failed after 0 of 2048 at 0: Input/output error
  LV                VG   Attr   LSize  Origin Snap%  Move Log         Copy% 
Devices                  
  mirror            vg   mwi-ao 10.00G                    mirror_mlog  13.12
mirror_mimage_3(0),mirror_mimage_1(0),mirror_mimage_2(0)
  [mirror_mimage_1] vg   iwi-ao 10.00G                                      
/dev/sda1(0)             
  [mirror_mimage_2] vg   iwi-ao 10.00G                                      
/dev/sdb1(0)             
  [mirror_mimage_3] vg   iwi-ao 10.00G                                      
/dev/sdc1(0)             
  [mirror_mlog]     vg   lwi-ao  4.00M                                      
/dev/sdd1(0)             


Comment 2 Corey Marthaler 2007-02-12 19:53:58 UTC
Here's what actually happening from the user's view point...

Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg-cmirror1
                      9.5G   20K  9.5G   1% /mnt/gfs1

[root@link-07 ~]# lvs -a -o +devices
  LV                  VG   Attr   LSize  Origin Snap%  Move Log           Copy%
 Devices              
  cmirror1            vg   mwi-ao 10.00G                    cmirror1_mlog  71.33
cmirror1_mimage_0(0),cmirror1_mimage_1(0),cmirror1_mimage_2(0)
  [cmirror1_mimage_0] vg   iwi-ao 10.00G                                       
 /dev/sdh2(0)         
  [cmirror1_mimage_1] vg   iwi-ao 10.00G                                       
 /dev/sde1(0)         
  [cmirror1_mimage_2] vg   iwi-ao 10.00G                                       
 /dev/sdf1(0)         
  [cmirror1_mlog]     vg   lwi-ao  4.00M                                       
 /dev/sdg2(0)         
[root@link-07 ~]# ls -lrt /mnt/gfs1
total 3936
-rw-rw-rw-  1 root root 1000000 Feb 12 08:33 link-02
-rw-rw-rw-  1 root root 1000000 Feb 12 09:00 link-08
-rw-rw-rw-  1 root root 1000000 Feb 12 09:00 link-04
-rw-rw-rw-  1 root root 1000000 Feb 12 14:07 link-07


[FAIL /dev/sdh]


[root@link-07 ~]# ls -lrt /mnt/gfs1
ls: /mnt/gfs1: Input/output error
[root@link-07 ~]# touch /mnt/gfs1/foo
touch: cannot touch `/mnt/gfs1/foo': Input/output error

Filesystem            Size  Used Avail Use% Mounted on
df: `/mnt/gfs1': Input/output error

# The leg "remains" in the mirror
[root@link-08 ~]# lvs -a -o +devices
  /dev/dm-3: read failed after 0 of 4096 at 0: Input/output error
  /dev/dm-6: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdh: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdh2: read failed after 0 of 2048 at 0: Input/output error
  LV                  VG   Attr   LSize  Origin Snap%  Move Log           Copy%
 Devices              
  cmirror1            vg   mwi-ao 10.00G                    cmirror1_mlog  86.17
cmirror1_mimage_0(0),cmirror1_mimage_1(0),cmirror1_mimage_2(0)
  [cmirror1_mimage_0] vg   iwi-ao 10.00G                                       
                      
  [cmirror1_mimage_1] vg   iwi-ao 10.00G                                       
 /dev/sde1(0)         
  [cmirror1_mimage_2] vg   iwi-ao 10.00G                                       
 /dev/sdf1(0)         
  [cmirror1_mlog]     vg   lwi-ao  4.00M                                       
 /dev/sdg2(0)         

All the nodes running I/O to GFS on cmirror lose their connection to the machine:
[...]
<xior magic="0xfeed10"><read
syscall="readv"><path>/mnt/gfs1/link-07</path><oflags>O_RDONLY</oflags><offset>0</offset><count>163676</count></read></xior>
<xior magic="0xfeed10"><write
syscall="write"><path>/mnt/gfs1/link-07</path><oflags>O_RDWR</oflags><offset>0</offset><count>974966</count><pattern>D</pattern></write></xior>
Connection to link-07 closed.


[...]
<xior magic="0xfeed10"><read
syscall="read"><path>/mnt/gfs1/link-04</path><oflags>O_RDONLY</oflags><offset>0</offset><count>814950</count></read></xior>
<xior magic="0xfeed10"><write
syscall="writev"><path>/mnt/gfs1/link-04</path><oflags>O_RDWR</oflags><offset>0</offset><count>703955</count><pattern>P</pattern></write></xior>
Connection to link-04 closed.

# sync % is stuck
[root@link-08 ~]# lvs -a -o +devices
  /dev/dm-3: read failed after 0 of 4096 at 0: Input/output error
  /dev/dm-6: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdh: read failed after 0 of 4096 at 0: Input/output error
  /dev/sdh2: read failed after 0 of 2048 at 0: Input/output error
  LV                  VG   Attr   LSize  Origin Snap%  Move Log           Copy%
 Devices              
  cmirror1            vg   mwi-ao 10.00G                    cmirror1_mlog  86.17
cmirror1_mimage_0(0),cmirror1_mimage_1(0),cmirror1_mimage_2(0)
  [cmirror1_mimage_0] vg   iwi-ao 10.00G                                       
                      
  [cmirror1_mimage_1] vg   iwi-ao 10.00G                                       
 /dev/sde1(0)         
  [cmirror1_mimage_2] vg   iwi-ao 10.00G                                       
 /dev/sdf1(0)         
  [cmirror1_mlog]     vg   lwi-ao  4.00M                                       
 /dev/sdg2(0)

Comment 3 Jonathan Earl Brassow 2007-02-13 18:51:05 UTC
I propose setting a restriction that mirrors are limited to 2 sides for 4.5. 
This would diffuse this bug.  Once we agree on that, I'll open a RFE for 4.6 and
make this bug dependent on that.


Comment 4 Corey Marthaler 2007-04-12 18:32:06 UTC
Greater then 2 legged mirrors have been verified to down convert during leg
failures.

Comment 5 Chris Feist 2008-08-05 21:42:50 UTC
Fixed in current release (4.7).


Note You need to log in before you can comment on or make changes to this bug.