Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 955661 - [vdsm] remove mapping fails after extend storage domain with force=True
Summary: [vdsm] remove mapping fails after extend storage domain with force=True
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: unspecified
Hardware: x86_64
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.4.0
Assignee: Daniel Erez
QA Contact: Elad
URL:
Whiteboard: storage
Depends On:
Blocks: rhev3.4beta 1142926
TreeView+ depends on / blocked
 
Reported: 2013-04-23 13:33 UTC by Elad
Modified: 2016-02-10 16:53 UTC (History)
13 users (show)

Fixed In Version: ovirt-3.4.0-alpha1
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-07-04 11:31:01 UTC
oVirt Team: Storage
Target Upstream Version:


Attachments (Terms of Use)
logs (deleted)
2013-04-23 13:33 UTC, Elad
no flags Details
logs 2 (deleted)
2013-07-27 22:34 UTC, Elad
no flags Details
logs (deleted)
2013-08-14 11:10 UTC, Aharon Canan
no flags Details


Links
System ID Priority Status Summary Last Updated
oVirt gerrit 19633 None None None Never

Description Elad 2013-04-23 13:33:40 UTC
Created attachment 738998 [details]
logs

Description of problem:

[vdsm] vdsm fails to extend storage domain due to remove mapping failure (with force=True).


Version-Release number of selected component (if applicable):

vdsm-4.10.2-15.0.el6ev.x86_64


How reproducible:

Happens only when force overriding LUN when extending storage domain, Cannot know for sure if it is 100% reproducible. 

Steps to Reproduce:

On FC env.:
1) Extend storage domain with force=true (extend the storage domain with a 'used' LUN) 

  
Actual results:

vdsm fails to extend storage domain with:

Thread-146025::INFO::2013-04-23 11:42:59,918::logUtils::40::dispatcher::(wrapper) Run and protect: extendStorageDomain(sdUUID='151df758-5a9d-4935-b4aa-c2edf61e1c9a', spUUID='885fe5c0-67d3-4060-a24b-849bebcb5d59',
devlist=['3514f0c542a000796'], force=True, options=None)
Thread-146025::DEBUG::2013-04-23 11:42:59,919::resourceManager::175::ResourceManager.Request::(__init__) ResName=`Storage.151df758-5a9d-4935-b4aa-c2edf61e1c9a`ReqID=`4dfb89c4-c178-48b8-b2fc-d8fa0980fd40`::Request
was made in '/usr/share/vdsm/storage/resourceManager.py' line '174' at '__init__'
Thread-146025::DEBUG::2013-04-23 11:42:59,920::resourceManager::486::ResourceManager::(registerResource) Trying to register resource 'Storage.151df758-5a9d-4935-b4aa-c2edf61e1c9a' for lock type 'shared'
Thread-146025::DEBUG::2013-04-23 11:42:59,920::resourceManager::528::ResourceManager::(registerResource) Resource 'Storage.151df758-5a9d-4935-b4aa-c2edf61e1c9a' is free. Now locking as 'shared' (1 active user)
Thread-146025::DEBUG::2013-04-23 11:42:59,921::resourceManager::212::ResourceManager.Request::(grant) ResName=`Storage.151df758-5a9d-4935-b4aa-c2edf61e1c9a`ReqID=`4dfb89c4-c178-48b8-b2fc-d8fa0980fd40`::Granted req
uest
Thread-146025::DEBUG::2013-04-23 11:42:59,922::task::811::TaskManager.Task::(resourceAcquired) Task=`b3075fd1-4083-48e8-b5c7-b2003a8d9ac3`::_resourcesAcquired: Storage.151df758-5a9d-4935-b4aa-c2edf61e1c9a (shared)
Thread-146025::DEBUG::2013-04-23 11:42:59,922::task::974::TaskManager.Task::(_decref) Task=`b3075fd1-4083-48e8-b5c7-b2003a8d9ac3`::ref 1 aborting False
Thread-146025::INFO::2013-04-23 11:42:59,923::blockSD::779::Storage.StorageDomain::(readMetadataMapping) META MAPPING: {u'PV0': {u'uuid': u'edKPNs-XuXn-qVYt-7vIn-PO2j-VGnI-l7vzDm', u'pestart': u'0', u'mapoffset':
u'0', u'pecount': u'157', 'guid': u'3514f0c542a00077e'}}
Thread-146025::ERROR::2013-04-23 11:43:00,013::task::850::TaskManager.Task::(_setError) Task=`b3075fd1-4083-48e8-b5c7-b2003a8d9ac3`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 857, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 41, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 682, in extendStorageDomain
    pool.extendSD(sdUUID, devlist, force)
  File "/usr/share/vdsm/storage/securable.py", line 68, in wrapper
    return f(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 2014, in extendSD
    sdCache.produce(sdUUID).extend(devlist, force)
  File "/usr/share/vdsm/storage/blockSD.py", line 695, in extend
    lvm.extendVG(self.sdUUID, devices, force)
  File "/usr/share/vdsm/storage/lvm.py", line 914, in extendVG
    _initpvs(pvs, int(vg.vg_mda_size) / 2 ** 20, force)
  File "/usr/share/vdsm/storage/lvm.py", line 677, in _initpvs
    _initpvs_removeHolders()
  File "/usr/share/vdsm/storage/lvm.py", line 667, in _initpvs_removeHolders
    os.path.basename(device))
  File "/usr/share/vdsm/storage/devicemapper.py", line 166, in removeMappingsHoldingDevice
    removeMapping(getDevName(holder))
  File "/usr/share/vdsm/storage/devicemapper.py", line 137, in removeMapping
    return getProxy().removeDeviceMapping(deviceName)
  File "/usr/share/vdsm/supervdsm.py", line 76, in __call__
    return callMethod()
  File "/usr/share/vdsm/supervdsm.py", line 67, in <lambda>
    **kwargs)
  File "<string>", line 2, in removeDeviceMapping
  File "/usr/lib64/python2.6/multiprocessing/managers.py", line 740, in _callmethod
    raise convert_to_error(kind, result)
Exception: Could not remove mapping `151df758--5a9d--4935--b4aa--c2edf61e1c9a-ids`



Expected results:

extendStorageDomain should succeed.


Additional info:  See logs attached

Comment 1 Ayal Baron 2013-04-28 10:05:42 UTC
> 
> 
> Expected results:
> 
> extendStorageDomain should succeed.
> 

Using 'force' does not guarantee success (nothing can).
1. need to understand why vdsm failed removed the mapping (could be because it's in use or could be because of something we can actually fix) - what was 'using' the device in this case?
2. The expected behaviour in case of failure is a faster recovery.

In either case, this is not very severe.

Comment 2 Elad 2013-04-29 07:12:19 UTC
(In reply to comment #1)

Those LUNs were indeed in use (probably they were present in an old setup) , but I don't think it should be a reason for vdsm to fail in remove mapping. AFAIK vdsm supposed to override the data in the 'used' LUN in a specific order and it is not matter if they are 'used'.

Comment 3 Ayal Baron 2013-05-08 11:23:07 UTC
From the log it appears the dmsetup remove failed, which probably indicates that the LV was still open (which would indicate there is nothing vdsm can do).
can you reproduce this?

Comment 4 Ayal Baron 2013-05-08 11:24:35 UTC
Unfortunately superVdsm does not log the error there. It might be failing because by the time we wanted to remove the device it was no longer there in which case we should check for eexist and proceed).  Anyway, this is guess work.

Comment 5 Haim 2013-07-04 11:41:47 UTC
the problem here is that 'dm setup remove' fails since the removal is not done correctly in the needed order.

each device is dependent on different, so you need to start from the last in the list.

usually its working, not sure its worth the effort, adding devmapper keyword.

Comment 6 Federico Simoncelli 2013-07-09 11:33:58 UTC
The domain extension request was for domain 151df758-5a9d-4935-b4aa-c2edf61e1c9a, using device 3514f0c542a000796:

Thread-145763::INFO::2013-04-23 11:32:48,786::logUtils::40::dispatcher::(wrapper) Run and protect: extendStorageDomain(sdUUID='151df758-5a9d-4935-b4aa-c2edf61e1c9a', spUUID='885fe5c0-67d3-4060-a24b-849bebcb5d59', devlist=['3514f0c542a000796'], force=True, options=None)

Problem is that such device was already part of the storage domain, combining data from getDeviceList and VGTagMetadataRW:

     {'GUID': '3514f0c542a000796',
      'capacity': '107374182400',
      'devtype': 'FCP',
      'fwrev': '1.05',
      'logicalblocksize': '512',
      'pathlist': [],
      'pathstatus': [{'lun': '1',
                      'physdev': 'sde',
                      'state': 'active',
                      'type': 'FCP'},
                     {'lun': '1',
                      'physdev': 'sdc',
                      'state': 'active',
                      'type': 'FCP'}],
      'physicalblocksize': '4096',
      'productID': 'XtremApp',
      'pvUUID': 'edKPNs-XuXn-qVYt-7vIn-PO2j-VGnI-l7vzDm',
      'serial': 'SXtremIO_XtremApp_e5b3e7136cb2400e91fdb955bdf9caf9',
      'status': 'used',
      'vendorID': 'XtremIO',
      'vgUUID': 'bq1SbR-yrR6-FErS-4Ta4-JxpK-Iq3n-puyCu1'},

Thread-49121::DEBUG::2013-04-23 07:05:50,018::persistentDict::234::Storage.PersistentDict::(refresh) read lines (VGTagMetadataRW)=['VGUUID=bq1SbR-yrR6-FErS-4Ta4-JxpK-Iq3n-puyCu1', 'PHYBLKSIZE=4096', 'LOCKPOLICY=', 'VERSION=3', 'LOGBLKSIZE=512', 'SDUUID=151df758-5a9d-4935-b4aa-c2edf61e1c9a', 'LEASERETRIES=3', 'LOCKRENEWALINTERVALSEC=5', 'CLASS=Data', u'PV0=pv:3514f0c542a00077e,uuid:edKPNs-XuXn-qVYt-7vIn-PO2j-VGnI-l7vzDm,pestart:0,pecount:157,mapoffset:0', 'DESCRIPTION=SD-FC1', 'TYPE=FCP', 'LEASETIMESEC=60', 'IOOPTIMEOUTSEC=10', 'MASTER_VERSION=7', 'ROLE=Master', 'POOL_DESCRIPTION=DC-FC', 'POOL_UUID=885fe5c0-67d3-4060-a24b-849bebcb5d59', u'POOL_DOMAINS=70cb8838-3a86-4e9a-a2ea-4e5bb0d6c303:Active,151df758-5a9d-4935-b4aa-c2edf61e1c9a:Active', 'POOL_SPM_ID=1', '_SHA_CKSUM=31f9809ca71b59d26a3ddb7a86e0c7a67348572a', 'POOL_SPM_LVER=1']

Looking at pvUUID, you'll see that SDUUID=151df758-5a9d-4935-b4aa-c2edf61e1c9a is using edKPNs-XuXn-qVYt-7vIn-PO2j-VGnI-l7vzDm and device 3514f0c542a000796 has pvUUID edKPNs-XuXn-qVYt-7vIn-PO2j-VGnI-l7vzDm as well.

In fact what removeMappingsHoldingDevice is trying to do is removing the "ids" mapping that instead is in use given that the storage domain is active:

Exception: Could not remove mapping `151df758--5a9d--4935--b4aa--c2edf61e1c9a-ids`

This should have been prevented by the frontend and backend.
We could consider to add a check in vdsm as well (even though API-wise force=True is a last resort option meaning you know what you're doing).

Comment 7 Ayal Baron 2013-07-24 09:01:49 UTC
Fede, correct me if I'm wrong, but there are 2 LUNs here with the same PV UUID?

Haim, can you reproduce this without replicating LUNs?

Comment 8 Elad 2013-07-27 22:29:54 UTC
I've manage to reproduce it:

Thread-39575::ERROR::2013-07-24 09:22:54,884::task::850::TaskManager.Task::(_setError) Task=`063e7c51-b4c9-4786-8ea9-49fa3f39a2e5`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 857, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 45, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 2034, in createVG
    (force.capitalize() == "True")))
  File "/usr/share/vdsm/storage/lvm.py", line 874, in createVG
    _initpvs(pvs, metadataSize, force)
  File "/usr/share/vdsm/storage/lvm.py", line 687, in _initpvs
    _initpvs_removeHolders()
  File "/usr/share/vdsm/storage/lvm.py", line 677, in _initpvs_removeHolders
    os.path.basename(device))
  File "/usr/share/vdsm/storage/devicemapper.py", line 146, in removeMappingsHoldingDevice
    removeMapping(getDevName(holder))
  File "/usr/share/vdsm/storage/devicemapper.py", line 117, in removeMapping
    return getProxy().removeDeviceMapping(deviceName)
  File "/usr/share/vdsm/supervdsm.py", line 50, in __call__
    return callMethod()
  File "/usr/share/vdsm/supervdsm.py", line 48, in <lambda>
    **kwargs)
  File "<string>", line 2, in removeDeviceMapping
  File "/usr/lib64/python2.6/multiprocessing/managers.py", line 740, in _callmethod
    raise convert_to_error(kind, result)
Exception: Could not remove mapping `35fb7c84--28db--40c7--8047--c6074d0a041f-ids`



It occurred because engine let me using one of master domain devices when I tried to create a new SD.

logs will be attached

Comment 9 Elad 2013-07-27 22:34:38 UTC
Created attachment 779240 [details]
logs 2

Comment 10 Aharon Canan 2013-08-14 11:06:45 UTC
seem like the same issue reproduced using is9.1 when trying to create new SD.

on 10.35.116.3 (spm) - Could not remove mapping `508ba8de--379c--4d39--bb81--b0c0c03bacac-ids

[root@camel-vdsc ~]# lvs
  LV                                   VG                                   Attr      LSize   Pool Origin Data%  Move Log Cpy%Sync Convert
  ids                                  508ba8de-379c-4d39-bb81-b0c0c03bacac -wi-ao--- 128.00m                                             
  inbox                                508ba8de-379c-4d39-bb81-b0c0c03bacac -wi-a---- 128.00m                                             
  leases                               508ba8de-379c-4d39-bb81-b0c0c03bacac -wi-a----   2.00g                                             
  master                               508ba8de-379c-4d39-bb81-b0c0c03bacac -wi-a----   1.00g                                             
  metadata                             508ba8de-379c-4d39-bb81-b0c0c03bacac -wi------ 512.00m                                             
  outbox                               508ba8de-379c-4d39-bb81-b0c0c03bacac -wi-a---- 128.00m                                             
  4807dcb6-6f42-4de2-a2e4-d49bcbcb21b3 a8b4e8aa-e0e3-416e-996f-4fd8785ab329 -wi------  10.00g                                             
  ids                                  a8b4e8aa-e0e3-416e-996f-4fd8785ab329 -wi-ao--- 128.00m                                             
  inbox                                a8b4e8aa-e0e3-416e-996f-4fd8785ab329 -wi-a---- 128.00m                                             
  leases                               a8b4e8aa-e0e3-416e-996f-4fd8785ab329 -wi-a----   2.00g                                             
  master                               a8b4e8aa-e0e3-416e-996f-4fd8785ab329 -wi-ao---   1.00g                                             
  metadata                             a8b4e8aa-e0e3-416e-996f-4fd8785ab329 -wi-a---- 512.00m                                             
  outbox                               a8b4e8aa-e0e3-416e-996f-4fd8785ab329 -wi-a---- 128.00m                                             
  lv_root                              vg0                                  -wi-ao--- 224.88g                                             
  lv_swap                              vg0                                  -wi-ao---   7.81g          

from logs
=========
Thread-3133::ERROR::2013-08-14 13:44:20,307::task::850::TaskManager.Task::(_setError) Task=`0e0a27f8-0bf0-495a-97ec-ccaeeadd2565`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 857, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 45, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 2051, in createVG
    (force.capitalize() == "True")))
  File "/usr/share/vdsm/storage/lvm.py", line 874, in createVG
    _initpvs(pvs, metadataSize, force)
  File "/usr/share/vdsm/storage/lvm.py", line 687, in _initpvs
    _initpvs_removeHolders()
  File "/usr/share/vdsm/storage/lvm.py", line 677, in _initpvs_removeHolders
    os.path.basename(device))
  File "/usr/share/vdsm/storage/devicemapper.py", line 146, in removeMappingsHoldingDevice
    removeMapping(getDevName(holder))
  File "/usr/share/vdsm/storage/devicemapper.py", line 117, in removeMapping
    return getProxy().removeDeviceMapping(deviceName)
  File "/usr/share/vdsm/supervdsm.py", line 50, in __call__
    return callMethod()
  File "/usr/share/vdsm/supervdsm.py", line 48, in <lambda>
    **kwargs)
  File "<string>", line 2, in removeDeviceMapping
  File "/usr/lib64/python2.6/multiprocessing/managers.py", line 740, in _callmethod
    raise convert_to_error(kind, result)
Exception: Could not remove mapping `508ba8de--379c--4d39--bb81--b0c0c03bacac-ids`

Comment 11 Aharon Canan 2013-08-14 11:10:13 UTC
Created attachment 786501 [details]
logs

Comment 12 Ayal Baron 2013-09-02 15:15:08 UTC
Daniel,

The issue is that they're using a LUN that already belongs to an SD to create a new sd.  The question is how is this allowed?
In any event, vdsm failing is more than fine.

Comment 15 Allon Mureinik 2013-09-29 11:17:38 UTC
Daniel, please add a link to gerrit in the tracker

Comment 16 Sandro Bonazzola 2014-01-14 08:45:03 UTC
ovirt 3.4.0 alpha has been released

Comment 17 Elad 2014-01-15 14:12:02 UTC
Using a LUN which is in use by other domain in the setup is now blocked from the backend. Checked using REST. I'm getting the following message when I try to do it:

<fault>
<reason>Operation Failed</reason>
<detail>
[Cannot add Storage. The following LUNs are already part of existing storage domains: 3514f0c54d5c00006 (iscsi1-1).]
</detail>
</fault>


Checked on ovirt-3.4.0-alpha1:
ovirt-engine-3.4.0-0.2.master.20140106180914.el6.noarch

Comment 19 Itamar Heim 2014-06-12 14:11:51 UTC
Closing as part of 3.4.0

Comment 20 Daniel Erez 2014-09-15 16:36:17 UTC
*** Bug 1133938 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.