|Summary:||migration_timeout not honoured, live migration goes on beyond it|
|Product:||Red Hat Enterprise Virtualization Manager||Reporter:||Julio Entrena Perez <jentrena>|
|Component:||vdsm||Assignee:||Vinzenz Feenstra [evilissimo] <vfeenstr>|
|Status:||CLOSED ERRATA||QA Contact:||Lukas Svaty <lsvaty>|
|Version:||3.1.4||CC:||acathrow, bazulay, eedri, flo_bugzilla, iheim, jentrena, jkt, lbopf, lpeer, lsvaty, lyarwood, mavital, michal.skrivanek, pbandark, pstehlik, sbonazzo, sputhenp, vfeenstr, yeylon|
|Target Milestone:||---||Keywords:||Triaged, ZStream|
|Fixed In Version:||ovirt-3.4.0-beta2||Doc Type:||Bug Fix|
Live migration operations now respect the 300 second limit, and live migration operations continue for only 300 seconds.
|:||1069220 (view as bug list)||Environment:|
|Last Closed:||2014-06-09 13:24:50 UTC||Type:||Bug|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Cloudforms Team:||---||Target Upstream Version:|
|Bug Depends On:||1015887|
|Bug Blocks:||1069220, 1069731, 1078909, 1142926|
Comment 2 Saveliev Peter 2013-06-05 13:14:26 UTC
The confusion is caused by variable naming. Actually, migration_timeout is counted not from the migration start, but from the moment the migration is stalled, so here it worked as designed. But the issue raises not only the question of variable naming — that's easy and will be fixed. More serious is the behaviour of the destination host, which is totally wrong. That's is being investigated.
Comment 3 Julio Entrena Perez 2013-06-05 13:39:47 UTC
(In reply to Saveliev Peter from comment #2) > The confusion is caused by variable naming. According to /usr/share/doc/vdsm-4.10.2/vdsm.conf.sample : # Maximum time the destination waits for migration to end. Source # waits twice as long (to avoid races). # migration_timeout = 300 > > Actually, migration_timeout is counted not from the migration start, but > from the moment the migration is stalled, so here it worked as designed. If that's the case we still need to rephrase the above comment (and explain behaviour around migration_timeout properly somewhere).
Comment 4 Saveliev Peter 2013-06-05 16:28:19 UTC
Yes, surely. It will be done as well.
Comment 5 Michal Skrivanek 2013-07-03 04:03:39 UTC
also need to address/verify engine error on timeout as it seems the migration fails with Migration failed due to Error: Internal Engine Error (VM: dev31bc4a, Source Host: devrhev06)."
Comment 6 Saveliev Peter 2013-07-09 14:32:38 UTC
(In reply to Michal Skrivanek from comment #5) > also need to address/verify engine error on timeout as it seems the > migration fails with Migration failed due to Error: Internal Engine Error > (VM: dev31bc4a, Source Host: devrhev06)." Ok.
Comment 7 Martin Kletzander 2013-08-15 14:05:18 UTC
*** Bug 965172 has been marked as a duplicate of this bug. ***
Comment 10 Vinzenz Feenstra [evilissimo] 2013-11-06 09:07:49 UTC
The internal error happened due to a 'ClassCastException' in the vdsbroker: 2013-05-17 12:34:00,569 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (pool-3-thread-49) START, MigrateStatusVDSCommand(HostName = i-mpapp3, HostId = 1a62f776-695e-11e2-a97a-fb8bf5530f36, vmId=d6446340-b00a-4068-8778-2227f89776fd), log id: 3b3e8edd 2013-05-17 12:34:00,607 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand] (pool-3-thread-49) Failed in MigrateStatusVDS method, for vds: i-mpapp3; host: 10.204.125.31 2013-05-17 12:34:00,607 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-3-thread-49) Command MigrateStatusVDS execution failed. Exception: ClassCastException: java.util.HashMap cannot be cast to java.lang.Integer 2013-05-17 12:34:00,607 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.MigrateStatusVDSCommand] (pool-3-thread-49) FINISH, MigrateStatusVDSCommand, log id: 3b3e8edd 2013-05-17 12:34:00,781 INFO [org.ovirt.engine.core.bll.VdsSelector] (pool-3-thread-49) VDS i-mpapp1 419a3eb6-4452-11e2-ab96-575e82ebec1e is not in up status or belongs to the VM's cluster VDS i-mpapp4 2bb65ff4-5bd0-11e2-8088-8f3b14835353 have failed running this VM in the current selection cycle VDS jtest02 1948e33c-490b-11e2-8443-1b53e1383a1a is not in up status or belongs to the VM's cluster VDS i-mpweb2 33ff1c5e-7a9e-11e2-ab5e-170d2d7c2bd6 is not in up status or belongs to the VM's cluster VDS jtest01 c5ea366a-43a0-11e2-b207-ff9e163144da is not in up status or belongs to the VM's cluster VDS i-mpapp2 3550eabc-5b43-11e2-af4e-5b3ed4fe7828 is not in up status or belongs to the VM's cluster VDS i-mpweb1 92af67dc-4938-11e2-baf4-eb85f55b5ed5 is not in up status or belongs to the VM's cluster 2013-05-17 12:34:00,781 WARN [org.ovirt.engine.core.bll.MigrateVmCommand] (pool-3-thread-49) CanDoAction of action MigrateVm failed. Reasons:ACTION_TYPE_FAILED_VDS_VM_CLUSTER,VAR__ACTION__MIGRATE,VAR__TYPE__VM This most likely is due to receiving a different value (probably an error message) from VDSM than it was expected.
Comment 15 Eyal Edri 2014-02-10 10:31:34 UTC
moving to 3.3.2 since 3.3.1 was built and moved to QE. please make sure to backport into z-stream.
Comment 19 Lukas Svaty 2014-02-27 15:29:54 UTC
FailedQA Changing migration_max_time_per_gib_mem to smaller value (5) makes migration times out Appropriate message should be displayed about this in the event log. Instead we get two errors: 2014-Feb-27, 16:22 Migration failed due to Error: Migration not in progress (VM: a, Source: host1, Destination: host2). 2014-Feb-27, 16:22 Migration failed due to Error: Migration not in progress. Trying to migrate to another Host (VM: a, Source: host1, Destination: host2). "Message like migration timed out after %d seconds." should be displayed instead.
Comment 22 Michal Skrivanek 2014-02-28 11:45:56 UTC
error message tracked as bug 1071260. moving back to ON_QA as the functionality is not affected
Comment 23 Lukas Svaty 2014-02-28 15:32:49 UTC
functionality working moving to verified
Comment 24 errata-xmlrpc 2014-06-09 13:24:50 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2014-0504.html