Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.

Bug 453672

Summary: system appears to deadlock (OOM) during 3-way cmirror I/O plus failure
Product: [Retired] Red Hat Cluster Suite Reporter: Corey Marthaler <cmarthal>
Component: cmirror-kernelAssignee: Jonathan Earl Brassow <jbrassow>
Status: CLOSED WONTFIX QA Contact: Cluster QE <mspqa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: iannis
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-05-14 19:59:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Attachments:
Description Flags
log and kern dump from taft-01 none

Description Corey Marthaler 2008-07-01 19:57:36 UTC
Description of problem:
I created 3 3-way mirrors and then started I/O to all 3 mirrors from all 4 nodes
(taft-0[1234]). I noticed that taft-01 started to slow way down right away.
Then, after failing /dev/sdh it became almost unresponsive. I then killed
taft-02 (in an attempt to test bz 233034). That caused taft-01 to just about
completly lock up. All the other nodes' recovery is stuck waiting for taft-01 to
fence taft-02.

  mirror1            taft       Mwi-ao 15.00G                    mirror1_mlog
100.00         mirror1_mimage_0(0),mirror1_mimage_1(0),mirror1_mim
age_2(0)
  [mirror1_mimage_0] taft       iwi-ao 15.00G                                  
             /dev/sdb1(0)
  [mirror1_mimage_1] taft       iwi-ao 15.00G                                  
             /dev/sdc1(0)
  [mirror1_mimage_2] taft       iwi-ao 15.00G                                  
             /dev/sdd1(0)
  [mirror1_mlog]     taft       lwi-ao  4.00M                                  
             /dev/sdh1(0)

  mirror2            taft       Mwi-ao 15.00G                    mirror2_mlog
100.00         mirror2_mimage_0(0),mirror2_mimage_1(0),mirror2_mim
age_2(0)
  [mirror2_mimage_0] taft       iwi-ao 15.00G                                  
             /dev/sde1(0)
  [mirror2_mimage_1] taft       iwi-ao 15.00G                                  
             /dev/sdf1(0)
  [mirror2_mimage_2] taft       iwi-ao 15.00G                                  
             /dev/sdg1(0)
  [mirror2_mlog]     taft       lwi-ao  4.00M                                  
             /dev/sdd1(3840)

  mirror3            taft       Mwi-ao 15.00G                    mirror3_mlog
100.00         mirror3_mimage_0(0),mir
age_2(0)
  [mirror3_mimage_0] taft       iwi-ao 15.00G                                  
             /dev/sdh1(1)
  [mirror3_mimage_1] taft       iwi-ao 15.00G                                  
             /dev/sdb1(3840)
  [mirror3_mimage_2] taft       iwi-ao 15.00G                                  
             /dev/sdc1(3840)
  [mirror3_mlog]     taft       lwi-ao  4.00M                                  
             /dev/sdd1(3841)  

I'll attach a kern dump from taft-01.


Version-Release number of selected component (if applicable):
2.6.9-71.ELsmp

lvm2-2.02.37-3.el4    BUILT: Thu Jun 12 10:09:19 CDT 2008
lvm2-cluster-2.02.37-3.el4    BUILT: Thu Jun 12 10:22:07 CDT 2008
device-mapper-1.02.25-2.el4    BUILT: Mon Jun  9 09:28:41 CDT 2008
cmirror-1.0.1-1    BUILT: Tue Jan 30 17:28:02 CST 2007
cmirror-kernel-2.6.9-41.4    BUILT: Tue Jun  3 13:54:29 CDT 2008

Comment 1 Corey Marthaler 2008-07-01 20:38:38 UTC
Looks like this is some kind of memory leak.

Comment 2 Corey Marthaler 2008-07-01 20:40:05 UTC
Created attachment 310717 [details]
log and kern dump from taft-01

Comment 4 Jonathan Earl Brassow 2010-05-14 19:59:27 UTC
No 3-way cluster mirrors on rhel4.

If bug is present in re-write of later releases, please open new bug(s).