Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1052024 - After a power outage two VMs marked as HA failed to start automatically, they were required to be started manually.
Summary: After a power outage two VMs marked as HA failed to start automatically, they...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.2.0
Hardware: Unspecified
OS: Unspecified
urgent
high
Target Milestone: ---
: 3.4.0
Assignee: Gilad Chaplik
QA Contact: Artyom
URL:
Whiteboard: sla
Depends On:
Blocks: 1074478 rhev3.4beta 1142926
TreeView+ depends on / blocked
 
Reported: 2014-01-13 06:34 UTC by Aval
Modified: 2018-12-09 17:25 UTC (History)
15 users (show)

Fixed In Version: av3
Doc Type: Bug Fix
Doc Text:
Previously, some virtual machines did not automatically restart after a power failure. As a result, they would have to be manually restarted. Now, the issue has been corrected and all virtual machines restart as expected.
Clone Of:
: 1074478 (view as bug list)
Environment:
Last Closed: 2014-06-09 15:08:41 UTC
oVirt Team: SLA
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2014:0506 normal SHIPPED_LIVE Moderate: Red Hat Enterprise Virtualization Manager 3.4.0 update 2014-06-09 18:55:38 UTC
oVirt gerrit 24651 None None None Never
oVirt gerrit 25461 None None None Never

Comment 2 Doron Fediuck 2014-01-13 15:25:58 UTC
Hi, can you please provide the exact rhev version, and the relevant engine log files?

Comment 3 Aval 2014-01-13 23:05:07 UTC
Created attachment 849684 [details]
engine.log

Comment 4 Aval 2014-01-13 23:22:05 UTC
(In reply to Doron Fediuck from comment #2)
> Hi, can you please provide the exact rhev version, and the relevant engine
> log files?

- Version-Release number of selected component (if applicable):

rhevm-3.2.2-0.41.el6ev.noarch

- Attached engine.log file

Comment 8 Itamar Heim 2014-03-07 12:13:10 UTC
Description of problem : After a power outage two of the VMs( out of 8 ) marked as HA failed to start automatically. Customer was required to start them manually after waiting for few hours expecting RHEV-M to handle this automatically.  

Environment details : 
- 2 Hypervisors with 24GB RAM each
- 10 VMs 

- Example of HA VM "mastro03srv" start failure ( already two VMs were started successfully before this VM ) . Later customer started it manually without any error.

~~~
2013-12-29 07:10:25,456 INFO  [org.ovirt.engine.core.bll.VdsEventListener] (QuartzScheduler_Worker-13) [33217069] Failed to start Highly Available VM. Attempting to restart. VM Name: mastro03srv, VM Id:c52b7bdb-9c3a-4d76-9f06-42cbb7687a17
2013-12-29 07:10:25,468 INFO  [org.ovirt.engine.core.bll.RunVmCommand] (QuartzScheduler_Worker-13) [33217069] Lock Acquired to object EngineLock [exclusiveLocks= key: c52b7bdb-9c3a-4d76-9f06-42cbb7687a17 value: VM
, sharedLocks= ]
2013-12-29 07:10:25,476 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (QuartzScheduler_Worker-13) [33217069] START, IsVmDuringInitiatingVDSCommand( vmId = c52b7bdb-9c3a-4d76-9f06-42cbb7687a17), log id: 583aa7de
2013-12-29 07:10:25,477 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (QuartzScheduler_Worker-13) [33217069] FINISH, IsVmDuringInitiatingVDSCommand, return: false, log id: 583aa7de
2013-12-29 07:10:25,490 INFO  [org.ovirt.engine.core.bll.RunVmCommand] (QuartzScheduler_Worker-13) [33217069] Running command: RunVmCommand internal: true. Entities affected :  ID: c52b7bdb-9c3a-4d76-9f06-42cbb7687a17 Type: VM
2013-12-29 07:10:25,514 INFO  [org.ovirt.engine.core.bll.RunVmCommand] (QuartzScheduler_Worker-13) [33217069] Lock freed to object EngineLock [exclusiveLocks= key: c52b7bdb-9c3a-4d76-9f06-42cbb7687a17 value: VM
, sharedLocks= ]
2013-12-29 07:10:25,514 INFO  [org.ovirt.engine.core.bll.RunVmCommand] (QuartzScheduler_Worker-13) [33217069] Failed to run desktop mastro03srv, rerun 
2013-12-29 07:10:25,519 INFO  
[org.ovirt.engine.core.vdsbroker.UpdateVdsDynamicDataVDSCommand] (QuartzScheduler_Worker-13) [33217069] START, UpdateVdsDynamicDataVDSCommand(HostName = rhev-hv01.xxxxxx.com, HostId = 0a4f8d16-ed7e-4d54-8199-ccb3f5e31baf, vdsDynamic=org.ovirt.engine.core.common.businessentities.VdsDynamic@5bdabd1b), log id: 4e7d3ce3
2013-12-29 07:10:25,521 INFO  [org.ovirt.engine.core.vdsbroker.UpdateVdsDynamicDataVDSCommand] (QuartzScheduler_Worker-13) [33217069] FINISH, UpdateVdsDynamicDataVDSCommand, log id: 4e7d3ce3

[...]

2013-12-29 07:10:25,549 INFO  [org.ovirt.engine.core.bll.RunVmCommand] (QuartzScheduler_Worker-13) [33217069] Lock Acquired to object EngineLock [exclusiveLocks= key: c52b7bdb-9c3a-4d76-9f06-42cbb7687a17 value: VM
, sharedLocks= ]
2013-12-29 07:10:25,566 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (QuartzScheduler_Worker-13) [33217069] START, IsVmDuringInitiatingVDSCommand( vmId = c52b7bdb-9c3a-4d76-9f06-42cbb7687a17), log id: 474630fa
2013-12-29 07:10:25,566 INFO  [org.ovirt.engine.core.vdsbroker.IsVmDuringInitiatingVDSCommand] (QuartzScheduler_Worker-13) [33217069] FINISH, IsVmDuringInitiatingVDSCommand, return: false, log id: 474630fa
2013-12-29 07:10:25,576 INFO  [org.ovirt.engine.core.bll.VdsSelector] (QuartzScheduler_Worker-13) [33217069]  VDS rhev-hv01.xxxxxx.com 0a4f8d16-ed7e-4d54-8199-ccb3f5e31baf have failed running this VM in the current selection cycle VDS rhev-hv02.xxxxxx.com 4e942526-ac3a-4a46-b969-4bbe139c67d5 is not in up status or belongs to the VM's cluster
2013-12-29 07:10:25,577 WARN  [org.ovirt.engine.core.bll.RunVmCommand] (QuartzScheduler_Worker-13) [33217069] CanDoAction of action RunVm failed. Reasons:VAR__ACTION__RUN,VAR__TYPE__VM,VAR__ACTION__RUN,VAR__TYPE__VM,ACTION_TYPE_FAILED_VDS_VM_CLUSTER  

2013-12-29 07:10:25,577 INFO  [org.ovirt.engine.core.bll.RunVmCommand] (QuartzScheduler_Worker-13) [33217069] Lock freed to object EngineLock [exclusiveLocks= key: c52b7bdb-9c3a-4d76-9f06-42cbb7687a17 value: VM
, sharedLocks= ]
~~~

After the above failure second HA VM "mx" failed with error
~~~
2013-12-29 07:10:36,839 INFO  [org.ovirt.engine.core.bll.VdsSelector] (QuartzScheduler_Worker-13) [33217069]  VDS rhev-hv01.xxxxxx.com 0a4f8d16-ed7e-4d54-8199-ccb3f5e31baf has insufficient memory to run the VM VDS rhev-hv02.xxxxxx.com 4e942526-ac3a-4a46-b969-4bbe139c67d5 is not in up status or belongs to the VM's cluster
2013-12-29 07:10:36,839 WARN  [org.ovirt.engine.core.bll.RunVmCommand] (QuartzScheduler_Worker-13) [33217069] CanDoAction of action RunVm failed. Reasons:VAR__ACTION__RUN,VAR__TYPE__VM,ACTION_TYPE_FAILED_VDS_VM_MEMORY
~~~

Which customer does not agree with as total required memory for all the VMs is 15GB and they have 2 hosts with 24GB each.

Version-Release number of selected component (if applicable):
rhevm-3.2.2-0.41.el6ev.noarch

How reproducible:
No consistent way, it happened after power outage on one the customers setup

Steps to Reproduce:
1.
2.
3.

Actual results:
2 out of 8 VMs marked as HA failed to automatically start after power outage

Expected results:
All VMs marked HA should be started automatically by RHEV-M

Additional info:

Comment 10 Artyom 2014-03-17 17:07:20 UTC
Verified on av3
Add host with 16G, and run on it four HA vms, 3 with 4096MB and one with 2048MB, after it powered off host, wait 5 minutes and power on host, all vms start fine.
Also test run under 'None' cluster policy

Comment 11 errata-xmlrpc 2014-06-09 15:08:41 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2014-0506.html


Note You need to log in before you can comment on or make changes to this bug.