Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1513729 - additionally created virt volume after pool meta repair causes activation issue for existing virt volumes
Summary: additionally created virt volume after pool meta repair causes activation iss...
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: lvm2
Version: 7.5
Hardware: x86_64
OS: Linux
unspecified
low
Target Milestone: rc
: ---
Assignee: Joe Thornber
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-15 21:15 UTC by Corey Marthaler
Modified: 2018-11-02 01:55 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:


Attachments (Terms of Use)
lvchange -vvvv (deleted)
2017-11-15 21:17 UTC, Corey Marthaler
no flags Details

Description Corey Marthaler 2017-11-15 21:15:17 UTC
Description of problem:

Existing virt volumes will be unable to be activated if additional virt volumes are created and removed *after* a meta data corruption and repair. Without the additional created virt volumes the activation works fine.


[root@host-026 ~]# lvcreate  --thinpool POOL -L 1G --poolmetadatasize 4M snapper_thinp
  Using default stripesize 64.00 KiB.
  Thin pool volume with chunk size 64.00 KiB can address at most 15.81 TiB of data.
  Logical volume "POOL" created.
[root@host-026 ~]# lvcreate  --virtualsize 1G -T snapper_thinp/POOL -n origin
  Using default stripesize 64.00 KiB.
  Logical volume "origin" created.

[root@host-026 ~]# lvs -a -o +devices
  LV              VG            Attr       LSize   Pool Origin Data%  Meta% Devices       
  POOL            snapper_thinp twi-aotz--   1.00g             0.00   1.07  POOL_tdata(0) 
  [POOL_tdata]    snapper_thinp Twi-ao----   1.00g                          /dev/sdh1(1)  
  [POOL_tmeta]    snapper_thinp ewi-ao----   4.00m                          /dev/sda1(0)
  [lvol0_pmspare] snapper_thinp ewi-------   4.00m                          /dev/sdh1(0)  
  origin          snapper_thinp Vwi-a-tz--   1.00g POOL        0.00

## Attempt the first corruption/repair iteration
[root@host-026 ~]# dd if=/dev/urandom of=/dev/mapper/snapper_thinp-POOL_tmeta count=512 seek=4096 bs=1
512+0 records in
512+0 records out
512 bytes (512 B) copied, 0.00240324 s, 213 kB/s

[root@host-026 ~]# vgchange -an snapper_thinp
  0 logical volume(s) in volume group "snapper_thinp" now active

[root@host-026 ~]# lvconvert --yes --repair snapper_thinp/POOL
  WARNING: Disabling lvmetad cache for repair command.
  WARNING: Not using lvmetad because of repair.
  WARNING: LV snapper_thinp/POOL_meta0 holds a backup of the unrepaired metadata. Use lvremove when no longer required.
  WARNING: New metadata LV snapper_thinp/POOL_tmeta might use different PVs.  Move it with pvmove if required.

[root@host-026 ~]# vgchange -ay snapper_thinp
  WARNING: Not using lvmetad because a repair command was run.
  3 logical volume(s) in volume group "snapper_thinp" now active

## This additional virt creation and removal is apparently key to causing this issue!
## Without it, you can corrupt and repair as many times as you like

[root@host-026 ~]# lvcreate  --virtualsize 1G -T snapper_thinp/POOL -n virt1
  WARNING: Not using lvmetad because a repair command was run.
  Using default stripesize 64.00 KiB.
  Logical volume "virt1" created.
[root@host-026 ~]# lvs -a -o +devices
  WARNING: Not using lvmetad because a repair command was run.
  LV              VG            Attr       LSize   Pool Origin Data%  Meta% Devices       
  POOL            snapper_thinp twi-aotz--   1.00g             0.00   1.17  POOL_tdata(0) 
  POOL_meta0      snapper_thinp -wi-a-----   4.00m                          /dev/sda1(0)  
  [POOL_tdata]    snapper_thinp Twi-ao----   1.00g                          /dev/sdh1(1)  
  [POOL_tmeta]    snapper_thinp ewi-ao----   4.00m                          /dev/sdh1(0)  
  [lvol1_pmspare] snapper_thinp ewi-------   4.00m                          /dev/sdh1(257)
  origin          snapper_thinp Vwi-a-tz--   1.00g POOL        0.00
  virt1           snapper_thinp Vwi-a-tz--   1.00g POOL        0.00
[root@host-026 ~]# lvremove -f snapper_thinp/virt1
  WARNING: Not using lvmetad because a repair command was run.
  Logical volume "virt1" successfully removed

## Attempt another corruption iteration
[root@host-026 ~]# dd if=/dev/urandom of=/dev/mapper/snapper_thinp-POOL_tmeta count=512 seek=4096 bs=1
512+0 records in
512+0 records out
512 bytes (512 B) copied, 0.00238367 s, 215 kB/s

[root@host-026 ~]# vgchange -an snapper_thinp
  WARNING: Not using lvmetad because a repair command was run.
  0 logical volume(s) in volume group "snapper_thinp" now active

[root@host-026 ~]# lvconvert --yes --repair snapper_thinp/POOL
  WARNING: Disabling lvmetad cache for repair command.
  WARNING: Not using lvmetad because of repair.
  WARNING: LV snapper_thinp/POOL_meta1 holds a backup of the unrepaired metadata. Use lvremove when no longer required.
  WARNING: New metadata LV snapper_thinp/POOL_tmeta might use different PVs.  Move it with pvmove if required.

[root@host-026 ~]# lvs -a -o +devices
  WARNING: Not using lvmetad because a repair command was run.
  LV              VG            Attr       LSize   Pool Origin Data%  Meta% Devices       
  POOL            snapper_thinp twi---tz--   1.00g                          POOL_tdata(0) 
  POOL_meta0      snapper_thinp -wi-------   4.00m                          /dev/sda1(0)  
  POOL_meta1      snapper_thinp -wi-------   4.00m                          /dev/sdh1(0)  
  [POOL_tdata]    snapper_thinp Twi-------   1.00g                          /dev/sdh1(1)  
  [POOL_tmeta]    snapper_thinp ewi-------   4.00m                          /dev/sdh1(257)
  [lvol2_pmspare] snapper_thinp ewi-------   4.00m                          /dev/sdh1(258)
  origin          snapper_thinp Vwi---tz--   1.00g POOL

[root@host-026 ~]# vgchange -ay snapper_thinp
  WARNING: Not using lvmetad because a repair command was run.
  device-mapper: reload ioctl on  (253:6) failed: No data available
  3 logical volume(s) in volume group "snapper_thinp" now active

[root@host-026 ~]# lvchange -ay -vvvv snapper_thinp/origin > /tmp/lvchange 2>&1


Version-Release number of selected component (if applicable):
3.10.0-772.el7.x86_64

lvm2-2.02.176-4.el7    BUILT: Wed Nov 15 04:21:19 CST 2017
lvm2-libs-2.02.176-4.el7    BUILT: Wed Nov 15 04:21:19 CST 2017
lvm2-cluster-2.02.176-4.el7    BUILT: Wed Nov 15 04:21:19 CST 2017
lvm2-lockd-2.02.176-4.el7    BUILT: Wed Nov 15 04:21:19 CST 2017
lvm2-python-boom-0.8-4.el7    BUILT: Wed Nov 15 04:23:09 CST 2017
cmirror-2.02.176-4.el7    BUILT: Wed Nov 15 04:21:19 CST 2017
device-mapper-1.02.145-4.el7    BUILT: Wed Nov 15 04:21:19 CST 2017
device-mapper-libs-1.02.145-4.el7    BUILT: Wed Nov 15 04:21:19 CST 2017
device-mapper-event-1.02.145-4.el7    BUILT: Wed Nov 15 04:21:19 CST 2017
device-mapper-event-libs-1.02.145-4.el7    BUILT: Wed Nov 15 04:21:19 CST 2017
device-mapper-persistent-data-0.7.3-2.el7    BUILT: Tue Oct 10 04:00:07 CDT 2017

Comment 2 Corey Marthaler 2017-11-15 21:17:14 UTC
Created attachment 1352996 [details]
lvchange -vvvv

Comment 3 Zdenek Kabelac 2018-04-03 08:17:42 UTC
This more or less looks like  there was  a major data lost cause by 2nd. corruption trial.

It all relates to how well should be thin-pool metadata resistant against lost of single sector.

ATM it's rather visible that if the right sector is killed it may cause a complete data lost for a whole pool.

Passing to Joe to reconsider if there could be somehow improved protection of some root nodes.

On an lvm2 side - this probably needs some more advanced cooperation while running --repair - so i.e. there is made a match between  'kernel' and 'lvm2'  list of device - so when kernel successfully repairs physically nothing (no device/less devices) but lvm2 knows there should be be certain amount of devices - it will loudly object with this command  - instead of  keeping using happy till the very last moment he tries to activate  non-existing kernel thin device.

Once thin_repair tool is enhanced - we can enhance lvm2 tool level.


Note You need to log in before you can comment on or make changes to this bug.