Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.

Bug 1356027

Summary: During cluster level upgrade - reconfig VMs to old cluster level compatibility level until they restart
Product: [oVirt] ovirt-engine Reporter: Michal Skrivanek <michal.skrivanek>
Component: BLL.VirtAssignee: Marek Libra <mlibra>
Status: CLOSED CURRENTRELEASE QA Contact: sefi litmanovich <slitmano>
Severity: high Docs Contact:
Priority: high    
Version: 4.0.0CC: aperotti, baptiste.agasse, bugs, c.handel, eedri, jentrena, jiri.slezka, mavital, mgoldboi, michal.skrivanek, mkalinin, mlibra, redhat, rs, sbonazzo, sites-redhat, slitmano, tjelinek
Target Milestone: ovirt-4.0.3Flags: rule-engine: ovirt-4.0.z+
rule-engine: blocker+
mgoldboi: planning_ack+
michal.skrivanek: devel_ack+
mavital: testing_ack+
Target Release: 4.0.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: Cluster Compatibility Version change was forbidden till all cluster VMs are down. Consequence: Cluster version change was complicated to perform in production environment. Result: After Cluster Compatibility Version change in the Cluster Edit dialog, the user is requested to shut down and restart all running or suspended VMs as soon as possible. To further denote that, all running or suspended VMs are marked with the Next-Run icon (triangle with '!'). Hosted Engine and external VMs are excluded from this Next-Run setting. Custom compatibility version of a running or suspended VM is temporarily set to the previous cluster version until restart. Cluster Compatibility Version change is not allowed when a VM snapshot is in preview. The user has to either commit or undo such a preview. Known issues: High Available VM is missing the Next-Run mark after crash and automatic restart.
Story Points: ---
Clone Of: 1348907 Environment:
Last Closed: 2016-08-31 09:34:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Virt RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1348907, 1356194, 1357513    
Bug Blocks:    

Description Michal Skrivanek 2016-07-13 09:53:58 UTC
+++ This bug was initially created as a clone of Bug #1348907 +++

--- Additional comment from Michal Skrivanek on 2016-07-13 11:46:45 CEST ---

we can take advantage of VM custom compatibility override introduced in 4.0 and temporarily change the VM's compat level to the old cluster. We can use the next_run config to revert back to the default(no override, inheriting the cluster's level) on VM shutdown

Comment 1 Marina 2016-07-13 15:40:54 UTC
Not to confuse with the original bug:
<mskrivanek> mku: that's another improvement of the process, but that is not backportable to 3.6 as it depends on a different 4.0 feature

Comment 2 Michal Skrivanek 2016-08-04 10:00:51 UTC
the way warning during the cluster level upgrade should work is:
during CL update - warn on suspended VMs, VMs with snapshots with RAM, running VMs, paused VMs
after CL upgrade - reconfig icon for suspended VMs, running VMs, paused VMs
on snapshot preview - block restoring with RAM

Comment 6 Michal Skrivanek 2016-08-23 13:54:23 UTC
pending testing with HE
still may cause issues on many running VMs due to bug 1366786 which is 4.0.4

Comment 7 sefi litmanovich 2016-08-30 15:55:28 UTC
Tested upgrade from cluster 3.6 (2 hosts with vdsm-4.17.33-1) to cluster 4.0 (upgraded the hosts to vdsm-4.18.11-1). On 3.6 created and ran vms with various configurations and kept vms running during/after upgrade.

Tested flows/configurations:

1. Snapshot creation before upgrade (with and without memory) and restoring after upgrade:
without memory - the vm snapshot is restored and vm starts with 4.0 xml.
with memory - we get the expected warning message, after confirming the vm is restored and starts with 3.6 xml.
Both are expected behaviours.
2. Snapshot create-preview-undo-clone-remove after upgrade - pass.
3. Migration after upgrade - passed.
5. Run once + cloud init - Tested run once before upgrade and used cloud init to set hostname/username/password, custom cpu type, changed console from spice to vnc - all configurations works fine as expected. After upgrade poweroff vm - vm's configuration revert back and vm starts with 4.0 xml. - pass
6. Consoles - both vnc and spice consoles had no regression after upgrade. On spice checked usb support, file transfer, copy paste support. - passed.
7. Memory hotplug after upgrade - passed.
8. Cpu hotplug after upgrade - passed. Tested twice: once when 'HotPlugCpuSupported' is set to 'false' for 3.6 with arch x86_64 - couldn't hot plug and got the expected error message. Once when it set to true - hot plug succeeded.
9. Nic hotplug - passed.
10. disk hotplug - passed.
11. HA vm - after upgrade kill vm's process see that it starts right away - passed.
12. Hyperv enlightenment for windows vm - passed: checked configuration level only - vms created as windows vm had all the hyper-v flags enabled in xml, whereas linux vms did not.
13. I/O threads - set a vm with 4 I/O threads - configuration wasn't changed after upgrade. - passed.

Problems:
Both problems that were found in pre integration build persisted:
1. HA VM isn't updated correctly in case that the process is killed in qemu level and engine invokes restart - https://bugzilla.redhat.com/show_bug.cgi?id=1369521
2. When upgrading cluster with HE vm, the HE vm's xml doesn't change and no mark for restart appears - https://bugzilla.redhat.com/show_bug.cgi?id=1370120

Verifying as there are open bugs on the issue and in most flows the upgrade works as expected.