Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1360519 - RFE: vhost-user multi-queue and live migration support
Summary: RFE: vhost-user multi-queue and live migration support
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.2
Hardware: All
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: Amnon Ilan
QA Contact: Pei Zhang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-07-27 00:54 UTC by bigswitch
Modified: 2017-02-20 15:28 UTC (History)
20 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-01-25 09:31:56 UTC


Attachments (Terms of Use)

Description bigswitch 2016-07-27 00:54:39 UTC
Description of problem:
We have this package dependency to support Big Switch's DPDK based NFVswitch. Please package it in next RHEL release

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Karen Noel 2016-07-29 13:19:44 UTC
Your request came directly to engineering. Who is the product manager or partner manager you are working with?

What features or fixes in qemu are you looking for? 

The plan is to support RHEL 7.3 compute nodes with qemu-kvm-rhev based on upstream QEMU 2.6. However, Red Hat also backports upstream features and fixes from later QEMU versions. Therefore, we prefer that software does not rely on QEMU version numbers to discover the availability of features/fixes. 

Also, there are no plans to rebase the version of qemu-kvm shipped with RHEL. Thanks.

Comment 3 Stephen Gordon 2016-08-02 15:29:07 UTC
(In reply to bigswitch from comment #0)
> Description of problem:
> We have this package dependency to support Big Switch's DPDK based
> NFVswitch. Please package it in next RHEL release

Can you be more specific as to exactly which QEMU capabilities the solution requires, ideally with links to commits? Simply requesting a given version number does not guarantee that functionality will be enabled in our build of it if you don't specify what is actually needed.

Comment 4 bigswitch 2016-08-02 16:32:45 UTC
Hi,
This is the requirement:

We need vhost-user multiqueue and live migration.

b931bfbf0429 ("vhost-user: add multiple queue support")
f6f56291de87 ("vhost user: add support of live migration")

Plus the bugfixes and other patches required for these features.

Comment 9 bigswitch 2016-08-15 16:34:11 UTC
multi-queue is needed to get better networking performance when deploying with BSN's DPDK based virtual switch (NFVSwitch) with NFV workloads.

Comment 15 Mike Burns 2016-09-16 17:17:12 UTC
For BigSwitch:

The current state of the multi-queue support is that it is included in 7.2.z.  It was backported on top of the version included in 7.2 (so the fact that the version is lower than upstream is OK).  It is considered  Tech Preview in 7.2.  It will move to full support in RHEL 7.3.

BigSwitch is going to test this directly and verify.

Comment 17 bigswitch 2016-10-19 00:01:39 UTC
Update on multi-queue support:
We testing the following workflow:

1. Attach a vhostuser interface to the VM when bringing up the VM
This works!
[Output from virsh dumpxml instanceXYZ]

    <interface type='vhostuser'>
      <mac address='fa:16:3e:f6:7c:3f'/>
      <source type='unix' path='/run/vhost/vhost2' mode='server'/>
      <model type='virtio'/>
      <driver name='vhost' queues='4'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>


2. Bring up a VM, and then attach the vhostuser interface to it
This doesn't work.

The following error is observed in the log: /var/log/nova/nova-compute.log
2016-10-18:35:53.000 20623 ERROR nova.virt.libvirt.driver [instance: 330532f9-4e36-401f-8f3a-4caaa8ebd4d0] Traceback (most recent call last):
2016-10-18 18:35:53.000 20623 ERROR nova.virt.libvirt.driver [instance: 330532f9-4e36-401f-8f3a-4caaa8ebd4d0]   File "/usr/lib/python2.7/site-packages/nova/virt/libvir
t/driver.py", line 1504, in attach_interface
2016-10-18 18:35:53.000 20623 ERROR nova.virt.libvirt.driver [instance: 330532f9-4e36-401f-8f3a-4caaa8ebd4d0]     guest.attach_device(cfg, persistent=True, live=live)
2016-10-18 18:35:53.000 20623 ERROR nova.virt.libvirt.driver [instance: 330532f9-4e36-401f-8f3a-4caaa8ebd4d0]   File "/usr/lib/python2.7/site-packages/nova/virt/libvir
t/guest.py", line 250, in attach_device
2016-10-18 18:35:53.000 20623 ERROR nova.virt.libvirt.driver [instance: 330532f9-4e36-401f-8f3a-4caaa8ebd4d0]     self._domain.attachDeviceFlags(conf.to_xml(), flags=f
lags)
2016-10-18 18:35:53.000 20623 ERROR nova.virt.libvirt.driver [instance: 330532f9-4e36-401f-8f3a-4caaa8ebd4d0]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.p
y", line 183, in doit
2016-10-18 18:35:53.000 20623 ERROR nova.virt.libvirt.driver [instance: 330532f9-4e36-401f-8f3a-4caaa8ebd4d0]     result = proxy_call(self._autowrap, f, *args, **kwarg
s)
2016-10-18 18:35:53.000 20623 ERROR nova.virt.libvirt.driver [instance: 330532f9-4e36-401f-8f3a-4caaa8ebd4d0]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.p
y", line 141, in proxy_call
2016-10-18 18:35:53.000 20623 ERROR nova.virt.libvirt.driver [instance: 330532f9-4e36-401f-8f3a-4caaa8ebd4d0]     rv = execute(f, *args, **kwargs)
2016-10-18 18:35:53.000 20623 ERROR nova.virt.libvirt.driver [instance: 330532f9-4e36-401f-8f3a-4caaa8ebd4d0]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.p
y", line 122, in execute
2016-10-18 18:35:53.000 20623 ERROR nova.virt.libvirt.driver [instance: 330532f9-4e36-401f-8f3a-4caaa8ebd4d0]     six.reraise(c, e, tb)
2016-10-18 18:35:53.000 20623 ERROR nova.virt.libvirt.driver [instance: 330532f9-4e36-401f-8f3a-4caaa8ebd4d0]   File "/usr/lib/python2.7/site-packages/eventlet/tpool.p
y", line 80, in tworker
2016-10-18 18:35:53.000 20623 ERROR nova.virt.libvirt.driver [instance: 330532f9-4e36-401f-8f3a-4caaa8ebd4d0]     rv = meth(*args, **kwargs)
2016-10-18 18:35:53.000 20623 ERROR nova.virt.libvirt.driver [instance: 330532f9-4e36-401f-8f3a-4caaa8ebd4d0]   File "/usr/lib64/python2.7/site-packages/libvirt.py", l
ine 554, in attachDeviceFlags
2016-10-18 18:35:53.000 20623 ERROR nova.virt.libvirt.driver [instance: 330532f9-4e36-401f-8f3a-4caaa8ebd4d0]     if ret == -1: raise libvirtError ('virDomainAttachDev
iceFlags() failed', dom=self)
2016-10-18 18:35:53.000 20623 ERROR nova.virt.libvirt.driver [instance: 330532f9-4e36-401f-8f3a-4caaa8ebd4d0] libvirtError: unsupported configuration: Multiqueue network is not supported for: vhostuser
2016-10-18 18:35:53.000 20623 ERROR nova.virt.libvirt.driver [instance: 330532f9-4e36-401f-8f3a-4caaa8ebd4d0] 
2016-10-18 18:35:53.001 20623 WARNING nova.compute.manager [req-a35c4a76-2299-47ad-af98-0e3f5448638a e8461562c57a48a3a89ef5326c6d70f9 b745cf763efa49cd9fa5b523bf097fa1 
- - -] [instance: 330532f9-4e36-401f-8f3a-4caaa8ebd4d0] attach interface failed , try to deallocate port cee4095b-3f91-4118-935a-6b6c930949be, reason: Failed to attach
 network adapter device to 330532f9-4e36-401f-8f3a-4caaa8ebd4d0

Line of relevance:
2016-10-18 18:35:53.000 20623 ERROR nova.virt.libvirt.driver [instance: 330532f9-4e36-401f-8f3a-4caaa8ebd4d0] libvirtError: unsupported configuration: Multiqueue network is not supported for: vhostuser


So, appears like Flow2 doesn't work, and unfortunately both these flows are used by the customers.

Comment 18 Karen Noel 2016-10-19 01:08:42 UTC
Please provide package versions for the host compute node - livbirt, qemu-kvm-rhev, kernel... Is this rhel-7.2 still? What is the rhosp version? Thanks.

Comment 19 bigswitch 2016-10-19 01:18:41 UTC
This is with RHOSP9

[root@overcloud-compute-0 heat-admin]# libvirtd --version
libvirtd (libvirt) 1.2.17

[root@overcloud-compute-0 heat-admin]# uname -a
Linux overcloud-compute-0.localdomain 3.10.0-327.28.3.el7.x86_64 #1 SMP Fri Aug 12 13:21:05 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux

[root@overcloud-compute-0 heat-admin]# rpm -qa | grep qemu-
libvirt-daemon-driver-qemu-1.2.17-13.el7_2.5.x86_64
qemu-kvm-rhev-2.3.0-31.el7_2.21.x86_64
ipxe-roms-qemu-20160127-1.git6366fa7a.el7.noarch
qemu-img-rhev-2.3.0-31.el7_2.21.x86_64
qemu-kvm-common-rhev-2.3.0-31.el7_2.21.x86_64

[root@overcloud-compute-0 heat-admin]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.2 (Maipo)

Comment 24 Karen Noel 2016-10-19 23:47:56 UTC
(In reply to bigswitch from comment #19)
> This is with RHOSP9
> 
> [root@overcloud-compute-0 heat-admin]# libvirtd --version
> libvirtd (libvirt) 1.2.17
> 
> [root@overcloud-compute-0 heat-admin]# uname -a
> Linux overcloud-compute-0.localdomain 3.10.0-327.28.3.el7.x86_64 #1 SMP Fri
> Aug 12 13:21:05 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
> 
> [root@overcloud-compute-0 heat-admin]# rpm -qa | grep qemu-
> libvirt-daemon-driver-qemu-1.2.17-13.el7_2.5.x86_64
> qemu-kvm-rhev-2.3.0-31.el7_2.21.x86_64
> ipxe-roms-qemu-20160127-1.git6366fa7a.el7.noarch
> qemu-img-rhev-2.3.0-31.el7_2.21.x86_64
> qemu-kvm-common-rhev-2.3.0-31.el7_2.21.x86_64
> 
> [root@overcloud-compute-0 heat-admin]# cat /etc/redhat-release 
> Red Hat Enterprise Linux Server release 7.2 (Maipo)

Versions look good. Thanks!

Studying comment #17 again, this looks like a bug in libvirt vhost-user hot-plug support. We can either move this BZ to libvirt or you can enter a new BZ. I prefer the latter.

This BZ is about basic vhost-user MQ and live migration support. (This BZ is already overloaded with 2 features.) Did you also verify live migration support? If so, I think this BZ is verified. 

Can you please enter a new BZ for libvirt to fix vhost-user MQ hot-plug support? Thanks.

Comment 25 bigswitch 2016-10-19 23:58:29 UTC
> Studying comment #17 again, this looks like a bug in libvirt vhost-user
> hot-plug support. We can either move this BZ to libvirt or you can enter a
> new BZ. I prefer the latter.

Thanks, sounds good. opened https://bugzilla.redhat.com/show_bug.cgi?id=1386976 to track this separately.

> This BZ is about basic vhost-user MQ and live migration support. (This BZ is
> already overloaded with 2 features.) Did you also verify live migration
> support? If so, I think this BZ is verified. 

We are awaiting a beefy hardware delivery to test out migration. Due to https://bugzilla.redhat.com/show_bug.cgi?id=1381704 we can't test it on differing computes. We shall update this BZ once we test live migration.

Comment 26 bigswitch 2016-11-03 23:04:41 UTC
We tested migration and ran into the issue mentioned here:
https://access.redhat.com/solutions/2191071

Looks like the issue is known and there is no actual resolution:
"Resolution: Currently due to OPEN bugs in nova, we recommend not to (live or cold) migrate instances that are using numa+cpu-pinning."

Comment 27 Amnon Ilan 2016-11-04 00:08:12 UTC
Is it OK to close this specific bz and monitor the remaining issues 
using the relevant libvirt and Nova bzs?

Comment 28 Chegu Vinod 2017-01-10 20:20:59 UTC
Regarding live VM migration feature of a VM using virtio backed by vhost-user/OVS-DPDK : 

In addition to addressing pending issues in qemu/libvirt and/or OpenStack to allow for live VM migration to work, customers expect that live VM migration continues to work fine for the cases where the VM being migrated is hosting an actual DPDK enabled application in the presence of traffic through that DPDK application.

Please include the KVM migration performance improvements (e.g. reduced downtime etc) that were pursued as part of the OPNFV KVM subgroup effort.

Comment 31 bigswitch 2017-01-25 01:16:45 UTC
Amnon, it is ok to close this ticket and track the remaining issues with more specific BZs

Comment 32 Amnon Ilan 2017-01-25 09:31:56 UTC
(In reply to bigswitch from comment #31)
> Amnon, it is ok to close this ticket and track the remaining issues with
> more specific BZs

Thanks, closing this bug.


Note You need to log in before you can comment on or make changes to this bug.