Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1516712 - HA VM with lease could not be started because the lease does not created properly (EngineLock)
Summary: HA VM with lease could not be started because the lease does not created prop...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: General
Version: future
Hardware: x86_64
OS: Linux
unspecified
medium vote
Target Milestone: ovirt-4.1.9
: ---
Assignee: Eyal Shenitzky
QA Contact: Polina
URL:
Whiteboard:
Depends On: 1524119
Blocks: 1516322
TreeView+ depends on / blocked
 
Reported: 2017-11-23 10:12 UTC by Polina
Modified: 2018-01-24 10:40 UTC (History)
6 users (show)

Fixed In Version: ovirt-engine-4.1.9
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-01-24 10:40:55 UTC
oVirt Team: Storage
rule-engine: ovirt-4.1+
rule-engine: ovirt-4.2+


Attachments (Terms of Use)
engin.log and screen shot (deleted)
2017-11-23 10:12 UTC, Polina
no flags Details


Links
System ID Priority Status Summary Last Updated
oVirt gerrit 85137 master MERGED core: update VM lease even if running configuration needed 2017-12-18 14:09:45 UTC
oVirt gerrit 85572 ovirt-engine-4.1 MERGED core: update VM lease even if running configuration needed 2017-12-19 08:56:19 UTC

Description Polina 2017-11-23 10:12:32 UTC
Created attachment 1358114 [details]
engin.log and screen shot

Description of problem:
Sometimes created situation that HA VM with lease could not be started (screen-shot and engine.log are attached)

Version-Release number of selected component (if applicable):
rhvm-4.2.0-0.5.master.el7.noarch

How reproducible:80%


Steps to Reproduce:
1.Create VM with scsi disk . 
2.Check HA with lease scsi_0. Wait till the task is completed
3.Then try to run VM

Actual results:
Error: Cannot run VM. Invalid VM lease. Please note that it may take few minutes to create the lease.

Expected results:
VM must be run

Additional info:
in engine.log(attached) we see :
START, IsVmDuringInitiatingVDSCommand( IsVmDuringInitiatingVDSCommandParameters:{vmId='76185518-a843-4ea5-83cf-3758068a241d'}), log id: 265fc1b6
2017-11-23 11:06:36,211+02 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (default task-20) [d766f754-ffa6-4ffa-9f0c-46d12c7c5524] FINISH, IsVmDuringInitiatingVDSCommand, return: false, log id: 265fc1b6
2017-11-23 11:06:36,282+02 WARN  [org.ovirt.engine.core.bll.RunVmCommand] (default task-20) [d766f754-ffa6-4ffa-9f0c-46d12c7c5524] Validation of action 'RunVm' failed for user admin@internal-authz. Reasons: VAR__ACTION__RUN,VAR__TYPE__VM,ACTION_TYPE_FAILED_INVALID_VM_LEASE
2017-11-23 11:06:36,284+02 INFO  [org.ovirt.engine.core.bll.RunVmCommand] (default task-20) [d766f754-ffa6-4ffa-9f0c-46d12c7c5524] Lock freed to object 'EngineLock:{exclusiveLocks='[76185518-a843-4ea5-83cf-3758068a241d=VM]', sharedLocks=''}'

Comment 1 Michal Skrivanek 2017-11-23 10:24:54 UTC
Tal, anything to improve around lease allocation?

Comment 2 Tal Nisan 2017-11-29 10:59:18 UTC
About the actual creation? I really doubt, it should be a quick operation, the task polling takes the most of the time I guess

Comment 3 Michal Skrivanek 2017-11-29 11:14:43 UTC
so why does the message say "Please note that it may take few minutes to create the lease." :)
Without a way how to detect the lease is still being created it is problematic to not lock the VM. I suppose it's a similar situation as with disks, but there you have ImageLocked state you can check for individual disks, here you have nothing to look at
If it is indeed quick we can just lock the VM for the duration, or provide a different way how to check if lease is ready?

Comment 4 Polina 2017-11-30 08:59:43 UTC
The bug is not about the timing. sometimes after adding the lease the VM could not be started ever. just now I've reproduced this behavior:
1. VM is up. add HA, nfs_0 lease . Stop VM 
2. Try to run  - got the messages that it takes the time to add the lease . but actually there are no tasks in progress . this VM will not be run no matter how long to wait.
I learned that there is a command to see the leases on host , and actually I can't see this added lease there :


sanlock client status
daemon 13448199-e9e0-4368-8763-6a2df3fedc9c.cougar05.s
p -1 helper
p -1 listener
p -1 status
s 19cc5ef8-f4ea-463a-b6fd-025c624dfbbf:2:/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com\:_Compute__NFS_GE_compute-ge-3_nfs__1/19cc5ef8-f4ea-463a-b6fd-025c624dfbbf/dom_md/ids:0
s 26d8f98f-ff9c-444e-bb9c-b60c84b5dc10:2:/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com\:_QE__images_GEs_GE__compute__3/26d8f98f-ff9c-444e-bb9c-b60c84b5dc10/dom_md/ids:0
s ee3ad6ec-c3f6-4801-997a-b8027d99837b:2:/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com\:_Compute__NFS_GE_compute-ge-3_nfs__0/ee3ad6ec-c3f6-4801-997a-b8027d99837b/dom_md/ids:0
s c0d9097d-09a3-476f-9209-8853ba1205e9:2:/rhev/data-center/mnt/yellow-vdsb.qa.lab.tlv.redhat.com\:_Compute__NFS_GE_compute-ge-3_nfs__2/c0d9097d-09a3-476f-9209-8853ba1205e9/dom_md/ids:0
s 585d4e5d-2c4d-4c3a-9683-50db5d87a4cd:2:/dev/585d4e5d-2c4d-4c3a-9683-50db5d87a4cd/ids:0
s 946c3c7b-5175-4833-9419-a3eb124c6171:2:/rhev/data-center/mnt/glusterSD/gluster01.scl.lab.tlv.redhat.com\:_virt__local__ge1__volume__0/946c3c7b-5175-4833-9419-a3eb124c6171/dom_md/ids:0

From engine log:

2017-11-30 10:51:43,597+02 INFO  [org.ovirt.engine.core.bll.UpdateVmCommand] (default task-3) [dfc758d7-3235-4c65-8d54-0ccb33766ad7] Lock Acquired to object 'EngineLock:{exclusiveLocks='[vm_from_templ1=VM_NAME]', sharedLocks='[76185518-a843-4ea5-83cf-3758068a241d=VM]'}'
2017-11-30 10:51:43,679+02 INFO  [org.ovirt.engine.core.bll.UpdateVmCommand] (default task-3) [dfc758d7-3235-4c65-8d54-0ccb33766ad7] Running command: UpdateVmCommand internal: false. Entities affected :  ID: 76185518-a843-4ea5-83cf-3758068a241d Type: VMAction group EDIT_VM_PROPERTIES with role type USER
2017-11-30 10:51:43,785+02 INFO  [org.ovirt.engine.core.bll.UpdateRngDeviceCommand] (default task-3) [7789d8f0] Running command: UpdateRngDeviceCommand internal: true. Entities affected :  ID: 76185518-a843-4ea5-83cf-3758068a241d Type: VMAction group EDIT_VM_PROPERTIES with role type USER
2017-11-30 10:51:43,998+02 INFO  [org.ovirt.engine.core.bll.UpdateGraphicsDeviceCommand] (default task-3) [53ef1c0c] Running command: UpdateGraphicsDeviceCommand internal: true. Entities affected :  ID: 76185518-a843-4ea5-83cf-3758068a241d Type: VMAction group EDIT_VM_PROPERTIES with role type USER
2017-11-30 10:51:44,031+02 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-3) [53ef1c0c] EVENT_ID: USER_UPDATE_VM(35), VM vm_from_templ1 configuration was updated by admin@internal-authz.
2017-11-30 10:51:44,039+02 INFO  [org.ovirt.engine.core.bll.UpdateVmCommand] (default task-3) [53ef1c0c] Lock freed to object 'EngineLock:{exclusiveLocks='[vm_from_templ1=VM_NAME]', sharedLocks='[76185518-a843-4ea5-83cf-3758068a241d=VM]'}'

Comment 5 Eyal Shenitzky 2017-12-04 15:22:02 UTC
The root cause for this bug and for https://bugzilla.redhat.com/show_bug.cgi?id=1507214 seems the same.

The issue in both of the bugs is that when starting a VM update and switching the VM status or the status is not UP / DOWN, the creation of the lease does not take place while the engine set the selected storage domain as the lease holder.

Then, when trying to run the VM, there is a validation that check if the lease info that should be initialize at the end of the AddVmLease command is not null.
This validation fails and the error which present is that the lease is invalid and the VM cannot run.

So I suggest to set the bug as dependent on / duplication of bug https://bugzilla.redhat.com/show_bug.cgi?id=1507214.

Comment 6 Eyal Shenitzky 2017-12-05 09:41:12 UTC
Polina, did you stop the VM right after the VM update?

Comment 7 Polina 2017-12-06 06:48:09 UTC
Hi Eyal,

here is the scenario where it is reproducible quite easily:

1. run VM , get Edit window, check HA and choose lease nfs_0 (one of three nfs leases nfs_0, nfs_1, nfs_2). As result you get "Pending VM changes dialog", Ok in it.
2. Power off the VM , Run again - the VM never will be ran.
You've got window:

Error while executing action: 
golden_env_mixed_virtio_2:

    Cannot run VM. Invalid VM lease. Please note that it may take few minutes to create the lease.

Comment 8 Polina 2018-01-15 15:07:04 UTC
verified in ovirt-engine-4.2.1.1-0.1.el7.noarch

Comment 9 Sandro Bonazzola 2018-01-24 10:40:55 UTC
This bugzilla is included in oVirt 4.1.9 release, published on Jan 24th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.1.9 release, published on Jan 24th 2018, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.