Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 822262 - [vdsm] [ovirt] host fails connecting to pool due to wrong master domain
Summary: [vdsm] [ovirt] host fails connecting to pool due to wrong master domain
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: oVirt
Classification: Retired
Component: vdsm
Version: unspecified
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: 3.3.4
Assignee: Eduardo Warszawski
QA Contact: Haim
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-05-16 21:17 UTC by Haim
Modified: 2016-02-10 17:15 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-07-25 14:11:37 UTC
oVirt Team: Storage


Attachments (Terms of Use)
vdsm log (deleted)
2012-05-16 21:17 UTC, Haim
no flags Details

Description Haim 2012-05-16 21:17:54 UTC
Created attachment 585060 [details]
vdsm log

The following problem happened to many users so far on the community so I decided to document it: 

- storage type NFS 
- host looses spm 
- new spm is elected
- connectStoragePool is sent with correct params, but command fails stating: wrong master domain.

compared the parmas we got in the command, and they match exactly to the ones written in metadata. 
asked him to restart vdsmd and try again: same error,

see logs.

Comment 1 Eduardo Warszawski 2012-07-22 16:41:54 UTC
The provided log is only from the vdsm restart, and the only error is domain not found.
In spite the connectStorage server the mount is not issued and the MSD can't be found.
Therefore the error is StoragePoolMasterNotFound.


MainThread::INFO::2012-05-11 14:46:01,783::vdsm::70::vds::(run) I am the actual vdsm 4.9-0
Thread-18::INFO::2012-05-11 14:46:14,257::logUtils::37::dispatcher::(wrapper) Run and protect: connectStorageServer(domType=1, spUUID='af5bcc86-898a-11e1-9632-003048c85226', conList=[{'connection': 'cmcd-heilig.in.hwlab:/exports/iso', 'iqn': '', 'portal': '', 'user': '', 'password': '******', 'id': 'd3f7bc58-898a-11e1-acd5-003048c85226', 'port': ''}], options=None)
Thread-18::INFO::2012-05-11 14:46:14,259::logUtils::39::dispatcher::(wrapper) Run and protect: connectStorageServer, Return response: {'statuslist': [{'status': 0, 'id': 'd3f7bc58-898a-11e1-acd5-003048c85226'}]}
Thread-19::INFO::2012-05-11 14:46:14,278::logUtils::37::dispatcher::(wrapper) Run and protect: connectStoragePool(spUUID='af5bcc86-898a-11e1-9632-003048c85226', hostID=2, scsiKey='af5bcc86-898a-11e1-9632-003048c85226', msdUUID='e12a0f53-ee72-44bc-ad26-93f9b4613c6c', masterVersion=1, options=None)

StoragePoolMasterNotFound: Cannot find master domain: 'spUUID=af5bcc86-898a-11e1-9632-003048c85226, msdUUID=e12a0f53-ee72-44bc-ad26-93f9b4613c6c'



The complete connectStorageServer thread log:


Thread-18::DEBUG::2012-05-11 14:46:14,256::BindingXMLRPC::164::vds::(wrapper) [10.0.8.19]
Thread-18::DEBUG::2012-05-11 14:46:14,256::task::588::TaskManager.Task::(_updateState) Task=`3b868a60-fa19-4fb6-a52d-21caecc9435d`::moving from state init -> state preparing
Thread-18::INFO::2012-05-11 14:46:14,257::logUtils::37::dispatcher::(wrapper) Run and protect: connectStorageServer(domType=1, spUUID='af5bcc86-898a-11e1-9632-003048c85226', conList=[{'connection': 'cmcd-heilig.in.hwlab:/exports/iso', 'i
qn': '', 'portal': '', 'user': '', 'password': '******', 'id': 'd3f7bc58-898a-11e1-acd5-003048c85226', 'port': ''}], options=None)
Thread-18::DEBUG::2012-05-11 14:46:14,257::lvm::476::OperationMutex::(_invalidateAllPvs) Operation 'lvm invalidate operation' got the operation mutex
Thread-18::DEBUG::2012-05-11 14:46:14,258::lvm::478::OperationMutex::(_invalidateAllPvs) Operation 'lvm invalidate operation' released the operation mutex
Thread-18::DEBUG::2012-05-11 14:46:14,258::lvm::488::OperationMutex::(_invalidateAllVgs) Operation 'lvm invalidate operation' got the operation mutex
Thread-18::DEBUG::2012-05-11 14:46:14,258::lvm::490::OperationMutex::(_invalidateAllVgs) Operation 'lvm invalidate operation' released the operation mutex
Thread-18::DEBUG::2012-05-11 14:46:14,258::lvm::509::OperationMutex::(_invalidateAllLvs) Operation 'lvm invalidate operation' got the operation mutex
Thread-18::DEBUG::2012-05-11 14:46:14,259::lvm::511::OperationMutex::(_invalidateAllLvs) Operation 'lvm invalidate operation' released the operation mutex
Thread-18::INFO::2012-05-11 14:46:14,259::logUtils::39::dispatcher::(wrapper) Run and protect: connectStorageServer, Return response: {'statuslist': [{'status': 0, 'id': 'd3f7bc58-898a-11e1-acd5-003048c85226'}]}
Thread-18::DEBUG::2012-05-11 14:46:14,259::task::1172::TaskManager.Task::(prepare) Task=`3b868a60-fa19-4fb6-a52d-21caecc9435d`::finished: {'statuslist': [{'status': 0, 'id': 'd3f7bc58-898a-11e1-acd5-003048c85226'}]}
Thread-18::DEBUG::2012-05-11 14:46:14,259::task::588::TaskManager.Task::(_updateState) Task=`3b868a60-fa19-4fb6-a52d-21caecc9435d`::moving from state preparing -> state finished
Thread-18::DEBUG::2012-05-11 14:46:14,260::resourceManager::809::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {}
Thread-18::DEBUG::2012-05-11 14:46:14,260::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-18::DEBUG::2012-05-11 14:46:14,260::task::978::TaskManager.Task::(_decref) Task=`3b868a60-fa19-4fb6-a52d-21caecc9435d`::ref 0 aborting False

Comment 2 Eduardo Warszawski 2012-07-22 16:43:14 UTC
Is this the correct log?


Note You need to log in before you can comment on or make changes to this bug.