Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.

Bug 235948

Summary: clvm 2-way mirrored volume with log crashes if one mirror leg and the log is lost
Product: [Retired] Red Hat Cluster Suite Reporter: Mattias Haern <mattias.haern>
Component: lvm2-clusterAssignee: Jonathan Earl Brassow <jbrassow>
Status: CLOSED CURRENTRELEASE QA Contact: Corey Marthaler <cmarthal>
Severity: medium Docs Contact:
Priority: medium    
Version: 4CC: agk, ccaulfie, dwysocha, jbrassow, mbroz, prockai, rkenna
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
URL: http://intranet.corp.redhat.com/ic/intranet/ClusterMirrorBeta45.html
Whiteboard:
Fixed In Version: beta1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-04-19 18:42:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Attachments:
Description Flags
Cluster configuration file none

Description Mattias Haern 2007-04-10 23:07:30 UTC
Description of problem:
After creating a 2-way clustered LVM2 mirror with log, the volume crashes if
both one mirror leg and the volume mirror log is removed at the same time.

Version-Release number of selected component (if applicable): 4.5 beta

How reproducible:
Every time.

Steps to Reproduce:
1. Install RHEL 4.5 beta
2. Install RHEL 4.5 cluster beta
3. Configure a mirrored clustered LVM2 volume with a log
4. Remove one mirror leg and the log

Actual results:
Volume crashed.

Expected results:
Volume continues to be available, with only one copy.

Additional info:

Test environment
----------------
Infrastructure;
*	2 x IBM xSeries 346 installed with Redhat ES 4U5beta_64
*	EMC SAN with shared disks (2 x Emulex LP10000 HBAs on each server)

Cluster configuration;
*	2 nodes
*	Fencing based on RSA II
*	Cluster service based on following resources;
	o	IP address
	o	Logical volume on shared disk
	o	Mount of LVM based filesystem

Tests with cluster (all tests are done on SAN disk)
---------------------------------------------------
* Convert linear volume to mirror volume with mirror log on disk
  OK.

* Initially create mirror volume with mirror log on disk
  OK.

* Initially create mirror volume with mirror log in memory (corelog)
  OK.

* Force sudden removal of mirror disk with mirror log volume intact
  OK. Volume automatically converted to linear volume. Cluster in unchanged status.

* Force sudden removal of mirror disk and mirror log disk
  Not OK as expected. Volume crash when log disk is removed. Writing to the file
system stopped and corruption occurred.

* Force sudden power off on active node in cluster (log disk), with both sides
of mirror intact
  OK. Mirrored volume is moved to remaining node in cluster.

* Force sudden removal of mirror disk and mirror log disk (corelog)
  Not OK. Volume is online and can be accessed, but status of volume is strange:

	[root@tnscl02cn001 ~]# vgdisplay -v testvg1
    	Loaded external locking library liblvm2clusterlock.so
    	Using volume group(s) on command line
    	Finding volume group "testvg1"
    	Wiping cache of LVM-capable devices
  	Couldn't find device with uuid 'jccjQF-Ql0I-CYAp-N5Ak-tRaV-Z2IR-uWFLh4'.
  	Couldn't find all physical volumes for volume group testvg1.
  	Couldn't find device with uuid 'jccjQF-Ql0I-CYAp-N5Ak-tRaV-Z2IR-uWFLh4'.
  	Couldn't find all physical volumes for volume group testvg1.
  	Couldn't find device with uuid 'jccjQF-Ql0I-CYAp-N5Ak-tRaV-Z2IR-uWFLh4'.
  	Couldn't find all physical volumes for volume group testvg1.
  	Couldn't find device with uuid 'jccjQF-Ql0I-CYAp-N5Ak-tRaV-Z2IR-uWFLh4'.
  	Couldn't find all physical volumes for volume group testvg1.
  	Volume group "testvg1" not found

We do not understand. First we removed only one part of the mirror , but the
vgdisplay output indicates problems. Still it is possible to write to the file
system. But when the node fails (and the mirror log disappears then because it's
kept in memory of the failing node) the service fail to come up on adaptive
node, because the logical volume is not possible to activate.

* Force sudden removal of mirror disk and mirror log disk (corelog) and
simultaneously force suddenly power off on active node in cluster
  Not OK. Cluster is trying to failover volume, but volume is in strange status
as in previous test, and it is not possible to reactivate.

Comment 1 Mattias Haern 2007-04-10 23:07:30 UTC
Created attachment 152187 [details]
Cluster configuration file

Comment 2 Jonathan Earl Brassow 2007-04-11 15:01:59 UTC
perform lvmdump to gather lvm/device-mapper information.

"* Force sudden removal of mirror disk and mirror log disk
  Not OK as expected. Volume crash when log disk is removed. Writing to the file
system stopped and corruption occurred."

"Volume crash" - what does this mean?  What was printed/logged?
"corruption occurred" - what kind of corruption?  Data corruption?  Metadata corruption?


Comment 3 Mattias Haern 2007-04-19 14:43:40 UTC
New tests with beta1 showed that this no longer occurs.


Comment 4 Jonathan Earl Brassow 2007-04-19 15:13:01 UTC
If your continued testing shows that this is truly fixed, please close bug.

assigned -> modified