Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1684584 - Migration failed when OS loads directly from NFS.
Summary: Migration failed when OS loads directly from NFS.
Keywords:
Status: CLOSED DUPLICATE of bug 1689269
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Virtualization
Version: 2.0
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: ---
: 2.0
Assignee: Vladik Romanovsky
QA Contact: Denys Shchedrivyi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-01 15:19 UTC by Denys Shchedrivyi
Modified: 2019-04-04 12:43 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-04-02 12:15:30 UTC
Target Upstream Version:


Attachments (Terms of Use)
pod logs and vmi describe (deleted)
2019-03-01 15:19 UTC, Denys Shchedrivyi
no flags Details

Description Denys Shchedrivyi 2019-03-01 15:19:20 UTC
Created attachment 1539868 [details]
pod logs and vmi describe

Description of problem:
 VM with Fedora or Windows loaded from NFS server can't be successfully migrated



Version-Release number of selected component (if applicable):
2.0

How reproducible:


Steps to Reproduce:
1. Create VM with shared PVC pointing to OS image on NFS server 
2. Run migration

Actual results:
 Migration failed with error:
 "component":"virt-launcher","level":"error","msg":"internal error: unable to execute QEMU command 'migrate': Nested VMX virtualization does not support live migration yet","pos":"qemuMonitorJSONCheckError:395","subcomponent":"libvirt","timestamp":"2019-02-27T21:11:41.934000Z"}

Expected results:
 Migration successfully completed without errors

Additional info:
 logs attached

Comment 1 Denys Shchedrivyi 2019-03-01 15:26:38 UTC
 VM with containerDisk and NFS attached as second drive - migrated successfully. 
 The issue happens when NFS is main drive (when OS is loaded from image on NFS server)

Comment 3 Fabian Deutsch 2019-03-04 10:28:04 UTC
Vladik, iirc the you mentioned (in a different thread) that live migration in a nested env is problematic atm.

Is this correct?
Do you know when fixes for live-migration on nesting will arrive?

Denys, can you re-test this in a bare-metal env?

Comment 6 Dr. David Alan Gilbert 2019-03-06 09:14:18 UTC
I'm not seeing how the NFS-use could cause that problem.

The recent restriction that's added is that a VM that has nested VMX enabled can't be migrated;

A common subcase of that is nesting being enabled on the host and then using -cpu host  (nesting is disabled by default).

Can you please:
  a) Give us cat /sys/module/kvm_intel/parameters/nested   from the host affected
  b) Give us the qemu command lines being used

My guess here is that the host you're using for NFS testing happens to have it enabled and you're using -cpu host.

(The restriction is due to the kernel not being able to guarantee us
correct register state in this case; so while in many cases - i.e. without
actually running a nest it might work, we can't tell until we land with a broken
mess on the destination).

Comment 7 Denys Shchedrivyi 2019-03-06 16:42:02 UTC
 There are results of running on bare metal.
 

[root@vm-pvc-nfs-cirros /]# cat /sys/module/kvm_intel/parameters/nested
Y

[root@vm-pvc-nfs-cirros /]# ps aux | grep qemu
qemu        165  1.2  0.0 2558060 109444 ?      Sl   15:22   0:05 /usr/bin/qemu-system-x86_64 -name guest=default_vm-pvc-nfs-cirros,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-default_vm-pvc-nfs-c/master-key.aes -machine pc-q35-3.1,accel=kvm,usb=off,dump-guest-core=off -cpu Skylake-Server-IBRS,ss=on,vmx=on,hypervisor=on,tsc_adjust=on,clflushopt=on,pku=on,ssbd=on -m 128 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -object iothread,id=iothread1 -uuid 61daed55-7aed-5af9-9525-d30a77c0ae17 -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=23,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 -device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 -device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 -device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 -device virtio-serial-pci,id=virtio-serial0,bus=pci.2,addr=0x0 -drive file=/var/run/kubevirt-private/vmi-disks/nfs-pvc-cirros/disk.img,format=raw,if=none,id=drive-ua-nfs-pvc-cirros,cache=none -device virtio-blk-pci,scsi=off,bus=pci.3,addr=0x0,drive=drive-ua-nfs-pvc-cirros,id=ua-nfs-pvc-cirros,bootindex=1,write-cache=on -netdev tap,fd=25,id=hostua-default,vhost=on,vhostfd=26 -device virtio-net-pci,host_mtu=1450,netdev=hostua-default,id=ua-default,mac=0a:58:0a:81:00:20,bus=pci.1,addr=0x0 -chardev socket,id=charserial0,fd=27,server,nowait -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,fd=28,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -vnc vnc=unix:/var/run/kubevirt-private/ab5444cd-4023-11e9-8e45-00109b3588a0/virt-vnc -device VGA,id=video0,vgamem_mb=16,bus=pcie.0,addr=0x1 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on


 But I've found that if I'm adding "cpu model: Nehalem" to VM yaml file - migration works well! 

qemu command line in this case:

[root@vm-pvc-nfs-cirros /]# ps aux | grep qemu
qemu        162  2.3  0.0 2274312 108484 ?      Sl   15:41   0:03 /usr/bin/qemu-system-x86_64 -name guest=default_vm-pvc-nfs-cirros,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-default_vm-pvc-nfs-c/master-key.aes -machine pc-q35-3.1,accel=kvm,usb=off,dump-guest-core=off -cpu Nehalem -m 128 -realtime mlock=off -smp 1,sockets=1,cores=1,threads=1 -object iothread,id=iothread1 -uuid 61daed55-7aed-5af9-9525-d30a77c0ae17 -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=23,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 -device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 -device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 -device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 -device virtio-serial-pci,id=virtio-serial0,bus=pci.2,addr=0x0 -drive file=/var/run/kubevirt-private/vmi-disks/nfs-pvc-cirros/disk.img,format=raw,if=none,id=drive-ua-nfs-pvc-cirros,cache=none -device virtio-blk-pci,scsi=off,bus=pci.3,addr=0x0,drive=drive-ua-nfs-pvc-cirros,id=ua-nfs-pvc-cirros,bootindex=1,write-cache=on -netdev tap,fd=25,id=hostua-default,vhost=on,vhostfd=26 -device virtio-net-pci,host_mtu=1450,netdev=hostua-default,id=ua-default,mac=0a:58:0a:81:00:21,bus=pci.1,addr=0x0 -chardev socket,id=charserial0,fd=27,server,nowait -device isa-serial,chardev=charserial0,id=serial0 -chardev socket,id=charchannel0,fd=28,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -vnc vnc=unix:/var/run/kubevirt-private/4e35d838-4026-11e9-8e45-00109b3588a0/virt-vnc -device VGA,id=video0,vgamem_mb=16,bus=pcie.0,addr=0x1 -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on


 I don't have here some cpu keys (including vmx=on) and migration successfully completed.

Comment 8 Dr. David Alan Gilbert 2019-03-06 16:55:24 UTC
OK, so the question is how did you get that 'vmx=on' - that's enabling the nesting, you need to turn that off and/or turn off the 'nested' module setting (it defaults to off).

Comment 10 Dr. David Alan Gilbert 2019-03-07 20:00:52 UTC
How do CPU types and flags work for CNV ?
What's different about the NFS setup here?

Comment 11 Fabian Deutsch 2019-03-08 08:43:51 UTC
Redirecting this question to Vladik.

Comment 12 Fabian Deutsch 2019-04-02 12:15:30 UTC
Tihs bug is not specific to CNV.

It's about the issue that live migration is not supported in a nested environment.

David, is there a bug for qemu/kvm to track fixing this on their side?

Closing as CANTFIX, as we can't fix it here.

Comment 13 Dr. David Alan Gilbert 2019-04-02 17:03:56 UTC
Yes, I believe it's:
https://bugzilla.redhat.com/show_bug.cgi?id=1689269

Comment 14 Fabian Deutsch 2019-04-04 12:43:48 UTC
Thanks

*** This bug has been marked as a duplicate of bug 1689269 ***


Note You need to log in before you can comment on or make changes to this bug.