Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 966037 - [engine-backend] in a case of missing device, the domain is inaccessible but engine reports it as up
Summary: [engine-backend] in a case of missing device, the domain is inaccessible but ...
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.2.0
Hardware: x86_64
OS: Unspecified
Target Milestone: ---
: 3.4.0
Assignee: Liron Aravot
QA Contact: Elad
Whiteboard: storage
Depends On:
Blocks: rhev3.4beta 1142926
TreeView+ depends on / blocked
Reported: 2013-05-22 10:38 UTC by Elad
Modified: 2016-02-10 17:38 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2013-12-10 16:38:19 UTC
oVirt Team: Storage
Target Upstream Version:

Attachments (Terms of Use)
logs (deleted)
2013-05-22 10:38 UTC, Elad
no flags Details

Description Elad 2013-05-22 10:38:02 UTC
Created attachment 751654 [details]

Description of problem:

Despite host cannot perform connectStorageServer, engine still reports it as 'up' state.

Version-Release number of selected component (if applicable):


How reproducible:

Steps to Reproduce: 
On 1 host and one iscsi domain:
1. Try to extend the domain and during the extension, remove the device (pv) that you extended the domain with from the host with:

'multipath -f 1elad1313678616'

2. vdsm will fail to extend the domain and than will fail in connectStorageServer:

2013-05-22 13:15:01,794 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (pool-4-thread-49) START, ConnectStoragePoolVDSCommand(HostName = nott-vds1, HostId = 61ada6ee-b58a-11e2-b34e-
001a4a169734, storagePoolId = e5ab1ab3-f38e-4aef-9dfa-b4ebcad11ed4, vds_spm_id = 1, masterDomainId = da07317a-eaa1-4cf8-aaae-ac41c4b1fd87, masterVersion = 1), log id: 2e09ca40
2013-05-22 13:15:03,214 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (QuartzScheduler_Worker-77) No string for UNASSIGNED type. Use default Log
2013-05-22 13:15:04,218 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-4-thread-49) Command org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand return value
 StatusOnlyReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=304, mMessage=Cannot find master domain: 'spUUID=e5ab1ab3-f38e-4aef-9dfa-b4ebcad11ed4, msdUUID=da07317a-eaa1-4cf8-aaae-ac41c4b1fd87']]
2013-05-22 13:15:04,218 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-4-thread-49) HostName = nott-vds1
2013-05-22 13:15:04,218 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-4-thread-49) Command ConnectStoragePoolVDS execution failed. Exception: IRSNoMasterDomainException: IRSGenericException: IRSErro
rException: IRSNoMasterDomainException: Cannot find master domain: 'spUUID=e5ab1ab3-f38e-4aef-9dfa-b4ebcad11ed4, msdUUID=da07317a-eaa1-4cf8-aaae-ac41c4b1fd87'
2013-05-22 13:15:04,218 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (pool-4-thread-49) FINISH, ConnectStoragePoolVDSCommand, log id: 2e09ca40
2013-05-22 13:15:04,219 ERROR [org.ovirt.engine.core.bll.InitVdsOnUpCommand] (pool-4-thread-49) Could not connect host nott-vds1 to pool iscsi

3. The pool becomes non-responsive and the host non-operational

Actual results:
Engine reports that the domain is up even though there is no active hosts in the pool and the pool is non-responsive. there is nothing that user can do in order to remove the damaged domain.

Expected results:
The domain should become unknown

Additional info: logs

Comment 1 Liron Aravot 2013-12-05 09:43:44 UTC
Elad, the attached logs aren't match the one that you quoted.
please try to reproduce, that shouldn't happen.
if it does, please attach correct logs.

Comment 2 Elad 2013-12-10 16:38:19 UTC
No reproduction so far, checked on 3.2.5. Closing for now as WORKSFORME, will re-open if necessary.

Note You need to log in before you can comment on or make changes to this bug.