Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.

Bug 1512325

Summary: We are missing nested virtualization CPU flag vmx on guest VM.
Product: Red Hat OpenStack Reporter: Siggy Sigwald <ssigwald>
Component: openstack-novaAssignee: Eoghan Glynn <eglynn>
Status: CLOSED NOTABUG QA Contact: Joe H. Rahme <jhakimra>
Severity: high Docs Contact:
Priority: high    
Version: 10.0 (Newton)CC: awaugama, berrange, dansmith, eglynn, kchamart, mbooth, rbryant, rlondhe, sbauza, sferdjao, sgordon, srevivo, vromanso
Target Milestone: ---Keywords: Reopened
Target Release: ---Flags: kchamart: needinfo? (rlondhe)
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-03-22 09:04:11 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description Siggy Sigwald 2017-11-12 19:46:32 UTC
Description of problem (from customer):
I have noticed that every time we finish addition of new compute nodes to our environment, the nested virtualization is not enabled. Immediately after deployment finishes and I login to the compute node I see:

# cat /sys/module/kvm_intel/parameters/nested
N

This is because of the fact that module kvm_intel has not been loaded with parameter nested=1. This happens just after node restart. First problem with this is that when the deployment finishes new instances are being scheduled on newly added nodes that need to be restarted. And second issue relates to https://access.redhat.com/solutions/3208881. When libvirtd is started for the first time without nested virt enabled, it creates a capabilities cache that is not updated during second run and this results in nested virt being effectively disabled until we deleted the cache file and restart libvirtd.

Are you planning to fix this? Either the problem with capabilities cache or restarting the node or unloading/reloading kvm-intel/kvm-amd modules with nested=1 parameter?

Version-Release number of selected component (if applicable):
openstack-nova-api-14.0.8-2.el7ost.noarch
openstack-nova-cert-14.0.8-2.el7ost.noarch
openstack-nova-common-14.0.8-2.el7ost.noarch
openstack-nova-compute-14.0.8-2.el7ost.noarch
openstack-nova-conductor-14.0.8-2.el7ost.noarch
openstack-nova-console-14.0.8-2.el7ost.noarch
openstack-nova-migration-14.0.8-2.el7ost.noarch
openstack-nova-novncproxy-14.0.8-2.el7ost.noarch
openstack-nova-scheduler-14.0.8-2.el7ost.noarch
puppet-nova-9.6.0-1.el7ost.noarch
python-nova-14.0.8-2.el7ost.noarch
python-novaclient-6.0.1-1.el7ost.noarch

Comment 2 Daniel Berrange 2017-11-13 10:23:03 UTC
(In reply to Siggy Sigwald from comment #0)
> Description of problem (from customer):
> I have noticed that every time we finish addition of new compute nodes to
> our environment, the nested virtualization is not enabled. Immediately after
> deployment finishes and I login to the compute node I see:
> 
> # cat /sys/module/kvm_intel/parameters/nested
> N
> 
> This is because of the fact that module kvm_intel has not been loaded with
> parameter nested=1. This happens just after node restart. First problem with
> this is that when the deployment finishes new instances are being scheduled
> on newly added nodes that need to be restarted. And second issue relates to
> https://access.redhat.com/solutions/3208881. When libvirtd is started for
> the first time without nested virt enabled, it creates a capabilities cache
> that is not updated during second run and this results in nested virt being
> effectively disabled until we deleted the cache file and restart libvirtd.
> 
> Are you planning to fix this? Either the problem with capabilities cache or
> restarting the node or unloading/reloading kvm-intel/kvm-amd modules with
> nested=1 parameter?

Disabling nested KVM is intentional because this feature is not supported for production usage. It currently just exists as a 'tech preview' feature, so is not appropriate to enable it except for ad-hoc evaluation.

A future RHEL release will make nested KVM a fully supported feature and provide a reliable mechanism to enable it

Comment 3 Kashyap Chamarthy 2017-11-13 10:43:40 UTC
As an addendum to what Dan said above, take a look at this bug (most of the comments in that bug are private, because the nature of the discussion was about support exceptions) for historical discussion:

    https://bugzilla.redhat.com/show_bug.cgi?id=1386822 -- 
    [RFE] Enable KVM nested virtualization by default

Comment 6 Kashyap Chamarthy 2019-03-15 15:00:14 UTC
For RHOS, if you have 'nested' enabled on the bare metal host, you can
request 'vmx' flag by configuring `cpu_model_extra_flags`:

    [libvirt]
    cpu_mode = custom
    cpu_model = Haswell-noTSX-IBRS
    cpu_model_extra_flags = vmx


               * * *

On the platform side, there are plans to enable nested virt officially 
for RHEL-8:

    https://bugzilla.redhat.com/show_bug.cgi?id=1559845
    [Fast] Nested KVM: support for migration of nested hypervisors -
    Fast Train

Comment 7 Kashyap Chamarthy 2019-03-15 15:12:26 UTC
For RHOS, we won't enable 'vmx' by default for a Nova guest.  Operators need to enable it explicitly (refer comment#6), as they know their hardware environments.

Hope it is okay to close this bug based on comment#6?

Comment 8 Kashyap Chamarthy 2019-03-15 15:35:53 UTC
(In reply to Kashyap Chamarthy from comment #7)
> For RHOS, we won't enable 'vmx' by default for a Nova guest.  Operators need
> to enable it explicitly (refer comment#6), as they know their hardware
> environments.

I could have been clearer.  The Nova guests will get 'vmx' flag in the
following cases:

(a) If you have 'nested' enabled on the bare metal host, and if you are
    using 'host-model' CPU mode.

(b) If you have 'nested' enabled on the bare metal host, and if you are
    using 'host-passthrough' CPU mode.

Comment 9 Kashyap Chamarthy 2019-03-22 09:04:11 UTC
Closing the bug based on comment#8