Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1672859 - Cannot correctly upgrade an hosted engine env from 4.2 to 4.3 if the specific CPU type disappeared in 4.3
Summary: Cannot correctly upgrade an hosted engine env from 4.2 to 4.3 if the specific...
Keywords:
Status: ON_QA
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: BLL.Virt
Version: 4.3.0
Hardware: x86_64
OS: Linux
high
high vote
Target Milestone: ovirt-4.3.3-1
: ---
Assignee: Steven Rosenberg
QA Contact: Nikolai Sednev
URL:
Whiteboard:
Depends On: 1699913
Blocks: 1694787
TreeView+ depends on / blocked
 
Reported: 2019-02-06 04:10 UTC by Juhani Rautiainen
Modified: 2019-04-16 11:57 UTC (History)
10 users (show)

Fixed In Version: ovirt-engine-4.3.3.5
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1694787 (view as bug list)
Environment:
Last Closed:
oVirt Team: Virt
pm-rhel: ovirt-4.3+
mtessun: planning_ack+
rbarry: devel_ack+
mavital: testing_ack+


Attachments (Terms of Use)
Virsh capabilities (deleted)
2019-02-11 16:09 UTC, Juhani Rautiainen
no flags Details
VDSM host capabilites (deleted)
2019-02-11 16:09 UTC, Juhani Rautiainen
no flags Details


Links
System ID Priority Status Summary Last Updated
Github oVirt ovirt-ansible-cluster-upgrade issues 40 None None None 2019-02-06 14:28:57 UTC
oVirt gerrit 98137 master MERGED engine: Update Deprecated CPU Types on Upgrade 2019-03-31 18:43:12 UTC
oVirt gerrit 99097 ovirt-engine-4.3 MERGED engine: Update Deprecated CPU Types on Upgrade 2019-04-02 08:34:21 UTC

Description Juhani Rautiainen 2019-02-06 04:10:08 UTC
Description of problem:

It's not possible to change cluster cpu type.  I have 2 node cluster with Epyc processors. It was originally installed with 4.2 so it chose CPU type as Opteron G3 (no Epyc support back then). In Engine 4.3 Epyc is available as CPU type when I choose Compatibility Version: 4.3. Big problem is that it doesn't allow to upgrade CPU because all hosts are not in maintenance: "Error while executing action: Cannot change Cluster CPU type unless all Hosts attached to this Cluster are in Maintenance". Putting all hosts to maintenance is impossible because Engine is hosted in the cluster. I tried with Global HA maintenance, but that didn't help.


Version-Release number of selected component (if applicable):
4.3

How reproducible:


Steps to Reproduce:
1. Install oVirt 4.2 on Epyc hardware with self hosted engine
2. Upgrade to 4.3
3. Try change cpu type from Opteron G3->Epyc

Actual results:
Can't change CPY type because of this: "Error while executing action: Cannot change Cluster CPU type unless all Hosts attached to this Cluster are in Maintenance"


Expected results:
You could change CPU type. As it stands upgrading hardware still locks you to old CPU.


Additional info:

Comment 1 Simone Tiraboschi 2019-02-06 14:30:08 UTC
A manual workaround procedure is:

* set HE global maintenance mode
* set one of the hosted-engine hosts into maintenance mode
* move it to a different temporary cluster
* shutdown the engine VM
* manually restart the engine VM on the host on the temporary cluster directly executing on that host: 'hosted-engine --vm-start'
* connect again to the engine
* set all the hosts of the initial cluster into maintenance mode
* upgrade the cluster
* shut down again the engine VM
* manually restart the engine VM on one of the hosts of the initial cluster
* move back the host that got into a temporary cluster to its initial cluster

but this could be a bit challenging on user side.
Let's try to see if can automate it with ovirt-ansible-cluster-upgrade

Comment 2 Michal Skrivanek 2019-02-08 09:12:22 UTC
Why was it using Opteron G3? Was it auto detected in 4.2 as that? Weird...

Comment 3 Juhani Rautiainen 2019-02-08 10:03:51 UTC
It was autodetected as such. That's why when 4.2.7 started to warn that CPU is going to be deprecated I was surprised. Then I saw that 4.3 release notes had support for Epyc. Now checking things it seems that QEMU is the reason because KVM users have noticed Epyc->Opteron_G3 switch if qemu is too old. Maybe it's fallback in QEMU?

Comment 4 Michal Skrivanek 2019-02-11 12:18:29 UTC
did you upgrade hosts first? 
what does "virsh -r capabilities" and "vdsm-client Host getCapabilities" return?

Comment 5 Juhani Rautiainen 2019-02-11 16:09:02 UTC
Created attachment 1529079 [details]
Virsh capabilities

Comment 6 Juhani Rautiainen 2019-02-11 16:09:42 UTC
Created attachment 1529080 [details]
VDSM host capabilites

Comment 7 Juhani Rautiainen 2019-02-11 16:11:58 UTC
Attached the files for asked capabilities from virsh and VDSM. I updated engine first. Tried to update nodes from there but I had to do it from cli.

Comment 8 Michal Skrivanek 2019-02-11 16:19:37 UTC
thanks! that looks...weird. Is that before or after "rm /var/cache/libvirt/qemu/capabilities/*.xml" (as per bug 1674265)? if you haven't done that, could you give it a try and re-run both capability queries?
Also, did you check for any microcode updates for your CPU?

Comment 9 Juhani Rautiainen 2019-02-11 16:41:19 UTC
I didn't clear any capabilities this is all result of 4.2 install and upgrade to 4.3. I'll try clearing tomorrow. Didn't check any microcode updates but BIOS should be newest for Proliant Gen10 servers.

Comment 10 Michal Skrivanek 2019-02-11 17:01:08 UTC
Ok. Please do. Also try to remove that cache and reboot and rerun both

Do you happen to have a non-upgraded server with the same hardware?

Comment 11 Juhani Rautiainen 2019-02-11 17:19:21 UTC
I managed to do the test today. I put nodes in maintenance one after another, cleared the cache and restarted libvirtd. I can upload the files but they are identical in node that I already uploaded. Or not totally vdsm version has differences in gc_timer lines.

Comment 12 Juhani Rautiainen 2019-02-11 17:34:39 UTC
Just remembered that I did do 'Refresh capabilities' from webadmin after the node updates. Does it do the same operation?

Comment 13 Juhani Rautiainen 2019-02-11 17:45:46 UTC
Forgot to comment that I don't have extra server where to test.

Comment 14 Michal Skrivanek 2019-02-12 14:54:28 UTC
hm. Seems fma4 flag added in G4 and G5 was removed in EPYC, so EPYC processors on non-EPYC enabled oVirt gets detected as G3. That's a problem then when we removed G3. I wonder....it could be that adding G3 back is the most easy solution right now.

Comment 16 Ryan Barry 2019-02-12 16:54:00 UTC
Ugly...

Even with a known workaround, adding G3 back to upstream is probably the nicest suggestion.

It means keeping support for a side-by-side vulnerable CPU type for an entire release, but at least unblocks upgrades

Comment 17 Sandro Bonazzola 2019-02-13 08:22:05 UTC
Moving to Virt team for re-introducing G3 back


Note You need to log in before you can comment on or make changes to this bug.