Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1595303 - HE VM migration fails with libvirtError: resource busy: Failed to acquire lock: Lease is held by another host
Summary: HE VM migration fails with libvirtError: resource busy: Failed to acquire loc...
Keywords:
Status: NEW
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.HostedEngine
Version: 4.2.2
Hardware: x86_64
OS: Linux
unspecified
medium vote
Target Milestone: ovirt-4.4.0
: ---
Assignee: Doron Fediuck
QA Contact: Polina
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-26 14:49 UTC by Polina
Modified: 2019-04-10 10:13 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
oVirt Team: SLA
pm-rhel: ovirt-4.4+


Attachments (Terms of Use)
logs (deleted)
2018-06-26 14:49 UTC, Polina
no flags Details

Description Polina 2018-06-26 14:49:33 UTC
Created attachment 1454685 [details]
logs

Description of problem: HE migration sometimes fails with libvirtError: resource busy: Failed to acquire lock: Lease is held by another host.

Version-Release number of selected component (if applicable): rhv-release-4.2.4-6-001.noarch

How reproducible: sometimes happens. not easily reproduced.

Steps to Reproduce:
1. Run HE environment with three hosts and the hosted storage is on iscsi
2. Sometimes he vm migration fails with the trace (please see lynx16_vdsm.log):

2018-06-23 15:52:15,547+0300 ERROR (vm/96b4f434) [virt.vm] (vmId='96b4f434-de9e-4be6-b842-adae55933dc2') The vm start process failed (vm:943)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 872, in _startUnderlyingVm
    self._run()
  File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2876, in _run
    dom.createWithFlags(flags)
  File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper
    return func(inst, *args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1099, in createWithFlags
    if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
libvirtError: resource busy: Failed to acquire lock: Lease is held by another host

Expected results: migration succeeds

Additional info: the attached contains: agent.log  broker.log  engine.log  logs  lynx14_vdsm.log  lynx16_vdsm.log  lynx17_vdsm.log. 
The Migration attempt from lynx16 to lynx17.

Comment 1 Michal Skrivanek 2018-06-27 10:53:22 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1459829#c31 ?

Comment 2 Martin Sivák 2018-06-27 11:34:14 UTC
Hi Paulina, can you please attach some additional information?

- what kind of storage domain did you use for hosted engine (NFS3, NFS4, iSCSI..)
- how did sanlock look like just before the migration (sanlock client status from both source and destination)

Meital already answered our question about how was the migration started - by clicking the Migrate button in the UI. Is that correct?

Comment 3 Martin Sivák 2018-06-27 11:35:19 UTC
(In reply to Michal Skrivanek from comment #1)
> https://bugzilla.redhat.com/show_bug.cgi?id=1459829#c31 ?

Maybe, but I do not think so.

The error here is "libvirtError: resource busy: Failed to acquire lock: Lease is held by another host" and that seems to imply the sanlock on the other host knew about the lockspace.

Comment 4 Polina 2018-07-05 06:43:39 UTC
(In reply to Martin Sivák from comment #2)

Hi, Hosted Engine disk is on iSCSI for this env.

About "how the migration started" - there was no clicking on UI button. The failures happened by automation build running.

The tests send a rest action:

2018-06-23 15:52:16,246 - MainThread - art.ll_lib.vms - INFO - Migrate VM HostedEngine
2018-06-23 15:52:16,246 - MainThread - vms - DEBUG - Action request content is --  url:/ovirt-engine/api/vms/96b4f434-de9e-4be6-b842-adae55933dc2/migrate body:<action>
    <async>false</async>
    <force>true</force>
    <grace_period>
        <expiry>10</expiry>
    </grace_period>
    <host id="074db613-5fb8-4722-8801-130797dc18b1"/>
</action>

sanlock client status is not reported to logs.

Comment 5 Ryan Barry 2019-04-08 17:36:55 UTC
Polina, still reproducible?

Comment 6 Polina 2019-04-10 05:37:31 UTC
yes, In the last automation runs we saw this problem twice on 4.3.3.2. 
Since engine was down after this for a long time and the whole environment was not able to run the tests, we had to reprovision and rebuild everything. didn't save the logs. the next time I see it , will update with the logs.


Note You need to log in before you can comment on or make changes to this bug.