Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1369362 - [RFE] OVN - port migration
Summary: [RFE] OVN - port migration
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: openvswitch
Version: 7.5-Alt
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: Mark Michelson
QA Contact: haidong li
URL:
Whiteboard:
Depends On:
Blocks: 1366899 1475436
TreeView+ depends on / blocked
 
Reported: 2016-08-23 08:27 UTC by Marcin Mirecki
Modified: 2018-04-12 12:10 UTC (History)
11 users (show)

Fixed In Version: openvswitch-2.8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-03-19 10:19:14 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0550 None None None 2018-03-19 10:21:24 UTC

Description Marcin Mirecki 2016-08-23 08:27:26 UTC
OVN must support migration of a port from one host to another. This is a common scenario when a vm is migrated to another host.

To migrate a port in the current implementation, the external-ids:iface-id property must be removed on the source host, and added on the destination host.
This change will then trigger the underlying flows to be changed.
There are a few problems in this scenario:
- synchronization - when do we know that the source port is unplugged, so that we can plug in the destination port, how do we make sure that the propagation of the changes to the flows takes place in the correct order
- timing - what happens to the packets destined for this port in the period when the port is not plugged anywhere
- duration - how long will the propagation to the flows take, the vm would be left without networking during this time

Some of the possible approaches:
involve libvirt in this and let it decide when to do the switching
allow two ports to be active for some time (have two sets of flows active) - this would however bring some risks, like for example delivering some packages twice
The convinient way for us for now (a stopgap) would be to update the source and destination external-ids:iface-id in north db in one transaction, and leave it up to ovn to handle it synchronously (can this be even done considering that it has to be done on multiple hosts?)


The conversation which has preceeded this bug:

Edward Haas

OVN currently detects the association between an OVS port and an OVN logical switch port using the “iface-id” external-id. To make migration work as you expect, you would need to remove this ID from the OVS port on the source and add it to the OVS port at the destination when you’re ready for OVN to change the flows throughout the environment to reflect the new location.

Who is setting the iface-id? libvirt?
Seems to me like this mechanism is very slow: It will take too much time for the change to propagate.
It will be good to have both flow rules in place until one is removed, is that possible?

Russell Bryant

libvirt can probably set iface-id, you can also set it using the "ovs-vsctl" command.

I'm not sure how slow is "too slow". Changes should propagate through the environment in less than a second. Maybe we should do some experimentation here?

It's not possible for a port to live on two hypervisors at the same time right now. I'm not sure what the desired behavior would be. Where would packets destined for that VM be sent?

I'm definitely open to working on changes to making this work better. I'm just trying to explain how it would work with the current state.

Dan Kenigsberg

Russell, even if it is only a second, we still need to know that the change has taken place before we set iface-id on the destination, and let the VM start there.

Do you know what the openstack vif driver is doing in this regard?

It seems that libvirt must be involved, since only it knows when the VM state has migrated and the VM is ready to be started on the destination.

Russell Bryant

This conversation has convinced me that we haven't sorted out live migration properly for OpenStack, either. We need to open a bug to track this one.

Comment 2 Lance Richardson 2017-02-28 16:37:59 UTC
Solution is under discussion upstream:

   https://mail.openvswitch.org/pipermail/ovs-dev/2017-February/329148.html

Comment 3 Lance Richardson 2017-05-17 18:16:26 UTC
Upstream discussion has not progressed, there is no solution at this time.

Comment 4 Yaniv Lavi 2017-05-29 10:49:02 UTC
Any updates?

Comment 5 Lance Richardson 2017-05-29 17:33:20 UTC
Upstream discussion has stalled out, will discuss how to revive it with Russell.

Comment 6 Lance Richardson 2017-07-18 13:58:49 UTC
Current status: high-level proposal exists, implementation does not. At this
point it seems this will be included in upstream 2.9.

Comment 7 Lance Richardson 2017-08-22 20:13:37 UTC
New scheme (discussed in last RHV/OVN monthly meeting) has been posted upstream:

   https://mail.openvswitch.org/pipermail/ovs-dev/2017-August/337648.html

It has since been committed to the master and 2.8 branches.

Comment 8 Lance Richardson 2017-09-06 15:44:53 UTC
The patch for this issue is in upstream master and 2.8 branches, and is
contained in released version 2.8.0. Note that a follow-up enhancement
patch from Russell is needed for neutron integration (not yet committed):

    https://patchwork.ozlabs.org/patch/809039/

Author: Lance Richardson <lrichard@redhat.com>
Date:   Sat Aug 19 16:23:34 2017 -0400

    ovn: support requested-chassis option for logical switch ports
    
    This patch adds support for a "requested-chassis" option for logical
    switch ports. If set, the only chassis that will claim this port is the
    chassis identfied by this option; if already bound by another chassis,
    it will be released.
    
    The primary benefit of this enhancement is allowing a CMS to prevent
    "thrashing" in the southbound database during live migration by keeping
    the original chassis from attempting to re-bind a port that is in the
    process of migrating.
    
    This would also allow (with some additional work) RBAC to be applied
    to the Port_Binding table for additional security.
    
    Signed-off-by: Lance Richardson <lrichard@redhat.com>
    Signed-off-by: Russell Bryant <russell@ovn.org>

Comment 9 Flavio Leitner 2017-10-02 23:01:41 UTC
Hi, 

This is part of branch-2.8:
commit f37dc273243cdc32e74e20a0b97f15c0acebc11e
Author: Lance Richardson <lrichard@redhat.com>
Date:   Sat Aug 19 16:23:34 2017 -0400

    ovn: support requested-chassis option for logical switch ports


I am closing this bug as it's done in upstream and eventually will part of our package when it gets rebased to 2.8 or newer.
If you need earlier, please re-open stating when and why it is needed.
Thanks,
fbl

Comment 10 Dan Kenigsberg 2017-10-03 07:12:54 UTC
Flavio, why wouldn't we set a proper target version, and wait for QA to properly test it before it closes?

I prefer doing so in order to have an indication when downstream RHV can consume the future

Comment 11 Yaniv Lavi 2017-11-01 10:38:15 UTC
Reopening to review the request in comment 10.

Comment 12 Flavio Leitner 2017-11-01 13:19:55 UTC
(In reply to Dan Kenigsberg from comment #10)
> Flavio, why wouldn't we set a proper target version, and wait for QA to
> properly test it before it closes?

I was told that this was tracking upstream effort and there was no target release for this to be backported.

> I prefer doing so in order to have an indication when downstream RHV can
> consume the future

2.8 is in fdBeta, so you can already try it.

Comment 13 Yaniv Lavi 2017-11-01 14:09:47 UTC
(In reply to Flavio Leitner from comment #12)
> (In reply to Dan Kenigsberg from comment #10)
> > Flavio, why wouldn't we set a proper target version, and wait for QA to
> > properly test it before it closes?
> 
> I was told that this was tracking upstream effort and there was no target
> release for this to be backported.
> 
> > I prefer doing so in order to have an indication when downstream RHV can
> > consume the future
> 
> 2.8 is in fdBeta, so you can already try it.

RHV has a requirement for this.
Please add it to the QE test plan, so we can get this feature test and won't hit any integration roadblocks.

Comment 15 haidong li 2018-03-05 14:41:54 UTC
Hi Flavio,
   Is it necessary to configure the "requested-chassis" option before vm migration in ovn environment,and will the option help make vm migration faster?The migration time sometimes is 4s when I haven't use the option before,hope the it will help to make faster.Need I use a special version of libvirt? Please help to explain,thanks a lot!

Comment 16 Mark Michelson 2018-03-05 15:43:24 UTC
The requested-chassis option is designed to prevent the situation where multiple ovn-controller instances are trying to claim a specific logical port at the same time, resulting in "thrashing".

The idea is that the active ovn-controller instance should be the one in the requested-chassis setting. This way, the standby ovn-controller instance will not attempt to claim the port for itself.

Can you explain the procedure you are using for VM migration? It may be possible that migration can happen faster, but my instinct is that this will not greatly speed up the process.

Comment 17 haidong li 2018-03-06 03:31:09 UTC
Hi Mark,
   Thanks for the explanation, maybe there is some problem with my environment,I will give more test on the vm migration.

Comment 18 haidong li 2018-03-06 04:36:42 UTC
This bug is verified on the latest version:
[root@dell-per730-19 ovn]# ip link add name hv1-if0 type veth peer name hv1-if1
[root@dell-per730-19 ovn]# ovn-nbctl ls-add ls0
[root@dell-per730-19 ovn]# ovn-nbctl lsp-add ls0 lsp0
[root@dell-per730-19 ovn]# ovs-vsctl -- add-port br-int hv1-if0
[root@dell-per730-19 ovn]# ovs-vsctl set interface hv1-if0 external-ids:iface-id=lsp0
[root@dell-per730-19 ovn]# ovn-nbctl lsp-set-options lsp0 requested-chassis=hv1
[root@dell-per730-19 ovn]# ovn-sbctl list port_binding
_uuid               : b1da9bcc-5624-45c9-b3c7-118b7e145758
chassis             : 096ece9a-b99f-4064-b0b4-494c9816a8d0
datapath            : 81310ca0-e6d9-4f6c-bdb5-770eeddb3ef0
external_ids        : {}
gateway_chassis     : []
logical_port        : "lsp0"
mac                 : []
nat_addresses       : []
options             : {requested-chassis="hv1"}
parent_port         : []
tag                 : []
tunnel_key          : 1
type                : ""
[root@dell-per730-19 ovn]# ovn-sbctl list chassis
_uuid               : 772c8676-3132-40c8-a629-d02394725aa2
encaps              : [d01a3931-dac5-4d75-a6ec-6ed64f76be43]
external_ids        : {datapath-type="", iface-types="geneve,gre,internal,lisp,patch,stt,system,tap,vxlan", ovn-bridge-mappings=""}
hostname            : "dell-per730-49.rhts.eng.pek2.redhat.com"
name                : "hv0"
nb_cfg              : 0
vtep_logical_switches: []

_uuid               : 096ece9a-b99f-4064-b0b4-494c9816a8d0
encaps              : [c5a5171f-8062-49c7-9100-51a9ed1ebfc7]
external_ids        : {datapath-type="", iface-types="geneve,gre,internal,lisp,patch,stt,system,tap,vxlan", ovn-bridge-mappings=""}
hostname            : "dell-per730-19.rhts.eng.pek2.redhat.com"
name                : "hv1"
nb_cfg              : 0
[root@dell-per730-19 ovn]# ovn-nbctl lsp-set-options lsp0 requested-chassis=hv0
[root@dell-per730-19 ovn]# ovn-sbctl list port_binding
_uuid               : b1da9bcc-5624-45c9-b3c7-118b7e145758
chassis             : 772c8676-3132-40c8-a629-d02394725aa2
datapath            : 81310ca0-e6d9-4f6c-bdb5-770eeddb3ef0
external_ids        : {}
gateway_chassis     : []
logical_port        : "lsp0"
mac                 : []
nat_addresses       : []
options             : {requested-chassis="hv0"}
parent_port         : []
tag                 : []
tunnel_key          : 1
type                : ""
[root@dell-per730-19 ovn]# ovn-nbctl lsp-set-options lsp0 requested-chassis=hv1
[root@dell-per730-19 ovn]# ovn-sbctl --bare --columns chassis find port_binding logical_port=366f925b-36d6-42e8-b6d2-64e251fe17c9
096ece9a-b99f-4064-b0b4-494c9816a8d0
[root@dell-per730-19 ovn]# 
[root@dell-per730-19 ovn]# ovs-vsctl show
bde05c29-7f7a-4508-b670-f4260ea41772
    Bridge br-int
        fail_mode: secure
        Port "hv1-if0"
            Interface "hv1-if0"
        Port "hv1_vm00_vnet1"
            Interface "hv1_vm00_vnet1"
        Port br-int
            Interface br-int
                type: internal
        Port "ovn-hv0-0"
            Interface "ovn-hv0-0"
                type: geneve
                options: {csum="true", key=flow, remote_ip="20.0.0.26"}
    ovs_version: "2.9.0"
[root@dell-per730-19 ovn]#

Comment 21 errata-xmlrpc 2018-03-19 10:19:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0550


Note You need to log in before you can comment on or make changes to this bug.