Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1059532 - ixgbe virtual function driver advertises vlan tagging offload when unavailable due to HV configuration
Summary: ixgbe virtual function driver advertises vlan tagging offload when unavailabl...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.5
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Nikolay Aleksandrov
QA Contact: Network QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-01-30 05:18 UTC by David Gibson
Modified: 2018-12-06 15:47 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-08-12 13:50:50 UTC


Attachments (Terms of Use)
ixgbe/ixgbevf: add new reset parameter for vlan capability [test patch] (deleted)
2014-03-10 14:05 UTC, Nikolay Aleksandrov
no flags Details | Diff
Add new rx_add_vid method with check of return value (deleted)
2014-03-17 14:56 UTC, Nikolay Aleksandrov
no flags Details
Add new rx_add_vid method with check of return value v2 (deleted)
2014-03-28 11:26 UTC, Nikolay Aleksandrov
no flags Details
Add new rx_add_vid method with check of return value v3 (deleted)
2014-08-12 13:46 UTC, Nikolay Aleksandrov
no flags Details

Description David Gibson 2014-01-30 05:18:32 UTC
Description of problem:

The ixgbevf driver unconditionally advertises the NETIF_F_HW_VLAN_TX, NETIF_F_HW_VLAN_RX and NETIF_F_HW_VLAN_FILTER features allowing hardware offload of vlan tagging.

However, the host creating the virtual function is itself imposing VLAN tagging on the guest, the VF cannot offload vlan tagging, since it is already being used to apply the hypervisor outer tagging.

Version-Release number of selected component (if applicable):

RHEL 6.5
Still verifying exact kernel version with customer, but I believe this will affect all kernels up to and including kernel-2.6.32-431.2.1.el6

How reproducible:

100%

Steps to Reproduce:
1. On a system with an ixgbe card, configure a libvirt/kvm guest using a passthrough VF of the ixgbe as its NIC
2. In the libvirt configuration specify a vlan tag to be applied to all guest traffic using the <vlan> xml tag (this is the "outer tag").
3. In the guest, configure a vlan interface to apply a different tag (this is the "inner tag").

Actual results:

Only the outer VLAN tag is applied, guest logs error messages because its vlan configuration operations fail on the VF, which won't permit the hypervisor configured tagging behaviour to be overriden by the guest.

Expected results:

Guest applies the inner tags in software, hypervisor applies the outer tags using the ixgbe hardware, leading to QinQ double tagged packets once the guest traffic reaches the physical wire.

Additional info:

Comment 2 David Gibson 2014-01-31 03:06:43 UTC
*** Bug 1059531 has been marked as a duplicate of this bug. ***

Comment 3 David Gibson 2014-02-03 02:52:31 UTC
Actually, having looked at the Intel 82599 datasheet (http://www.intel.com/content/www/us/en/ethernet-controllers/82599-10-gbe-controller-datasheet.html) it appears that when a VF is configured to have a VLAN tag inserted, any attempt by the guest using the VF to set a tag - either by hardware or software - will be rejected.

So, if the hypervisor has set a VLAN tag to be inserted, the ixgbevf driver should advertise NETIF_F_VLAN_CHALLENGED instead of the other VLAN features.

It would be nice if the hardware had a setting to treat tags set by the guest as inner tags, and emit QinQ frames, but it seems it doesn't.

Comment 4 Andy Gospodarek 2014-02-14 14:51:32 UTC
Nik, can you take a look at this for the customer?

Comment 5 Nikolay Aleksandrov 2014-02-17 14:24:38 UTC
I can confirm the issue with upstream kernel as well, the upstream ixgbevf driver
also advertises hw vlan offloading.

With <vlan> specified:
[   51.329686] 8021q: 802.1Q VLAN Support v1.8
[   51.330516] 8021q: adding VLAN 0 to HW filter on device ens6
[   51.347735] ixgbevf 0000:00:06.0: Last Request of type 04 to PF Nacked
[   51.348787] 8021q: adding VLAN 0 to HW filter on device eth1
[   51.366503] ixgbevf 0000:00:06.0: Last Request of type 04 to PF Nacked
ERROR: trying to add VLAN #100 to IF -:ens6:-  error: Permission denied

Without <vlan> specified it works.

Comment 6 Terry Bowling 2014-02-19 22:59:05 UTC
Nikolay,

When you say upstream, do you mean RHEL 7, Fedora, or linux.org?

Could you clarify what driver version of ixgbe and ixgbevf you are seeing for your output of cmt 5?

Comment 7 Nikolay Aleksandrov 2014-02-19 23:47:35 UTC
(In reply to Terry Bowling from comment #6)
> Nikolay,
> 
> When you say upstream, do you mean RHEL 7, Fedora, or linux.org?
> 
> Could you clarify what driver version of ixgbe and ixgbevf you are seeing
> for your output of cmt 5?
I tested with current upstream version (Dave Miller's net tree) which is the
latest kernel with all fixes.
It is actually normal, I could predict that would happen looking at current
ixgbevf code as it advertises the same offloads.
I'll talk to Andy and we'll probably have to contact Intel about this.
I'll let you know as soon as there's some development on the issue.

Comment 10 David Gibson 2014-02-27 23:08:28 UTC
There is no customer specific information in this case, so I'm making it public.

Comment 14 David Gibson 2014-03-03 22:30:10 UTC
>Yes, unfortunately I read the same over the weekend. I think this should be an
>option to be configured on the host part, but it would probably require libvirt
>extension as well to support it after that so it can be enabled on per VM/VF >basis.

Um..? I thought we just concluded from the datasheet that this isn't possible, regardless of what we do in the driver or libvirt.  If the host is inserting tags, then it's ignoring tags from the guest.  All we could do is lobby Intel to add this feature to the firmware.

>I tested this, they don't stack but override, in order to stack a different
>configuration of the host must be done where double tagging is enabled which
>changes a lot of stuff.

As I thought.  For the double tagging are you talking about "Double VLAN mode" as described in §7.4.5 of the datasheet?  My understanding from the description was that this wasn't useful for our purposes, because the outer VLAN it specifies is applied across the entire NIC, so there's no way we could restrict it to a single VM/VF.

>We could, but I don't see what we win since the anti-spoofing is still there in
>this BZ's case, so we won't be able to Tx any different tag than the configured.

Sorry, that was more for the benefit of bug 1069028 than for this one.

>IMO we can close this as NOTABUG. What do you think ?

No, we certainly can't close as NOTABUG - check the original description of this bug.  The firmware may not do what we really want, but we should still advertise our capabilities properly in the driver.  When a host-applied VLAN is specified, we should be advertising NETIF_F_VLAN_CHALLENGED, instead of the vlan offload capabilities.

Comment 15 David Gibson 2014-03-03 22:43:55 UTC
Oh, btw, the customer would also like an actual statement from Intel confirming that guest/host QinQ really isn't possible.

Andy, Nik, do you know who the right contacts would be to try and get that?

Comment 16 Nikolay Aleksandrov 2014-03-03 23:20:11 UTC
(In reply to David Gibson from comment #14)
> >Yes, unfortunately I read the same over the weekend. I think this should be an
> >option to be configured on the host part, but it would probably require libvirt
> >extension as well to support it after that so it can be enabled on per VM/VF >basis.
> 
> Um..? I thought we just concluded from the datasheet that this isn't
> possible, regardless of what we do in the driver or libvirt.  If the host is
> inserting tags, then it's ignoring tags from the guest.  All we could do is
> lobby Intel to add this feature to the firmware.
> 
Yes, it is, but with double VLAN mode it won't ignore them just leave them
as the "outer" tags, and apply the configured VLAN as the "inner" tag according
to the datasheet but that would be global.

> >I tested this, they don't stack but override, in order to stack a different
> >configuration of the host must be done where double tagging is enabled which
> >changes a lot of stuff.
> 
> As I thought.  For the double tagging are you talking about "Double VLAN
> mode" as described in §7.4.5 of the datasheet?  My understanding from the
> description was that this wasn't useful for our purposes, because the outer
> VLAN it specifies is applied across the entire NIC, so there's no way we
> could restrict it to a single VM/VF.
> 
Right, that's why I wrote it was global.

> >We could, but I don't see what we win since the anti-spoofing is still there in
> >this BZ's case, so we won't be able to Tx any different tag than the configured.
> 
> Sorry, that was more for the benefit of bug 1069028 than for this one.
> 
> >IMO we can close this as NOTABUG. What do you think ?
> 
> No, we certainly can't close as NOTABUG - check the original description of
> this bug.  The firmware may not do what we really want, but we should still
> advertise our capabilities properly in the driver.  When a host-applied VLAN
> is specified, we should be advertising NETIF_F_VLAN_CHALLENGED, instead of
> the vlan offload capabilities.
Well, to be fair there's vlan offloading since the VLAN is applied automatically
and stripped automatically.
I don't think NETIF_F_VLAN_CHALLENGED should be advertised, the NACK in the log
is the warning saying that the vlan won't work which should give the user a clue.

Comment 17 David Gibson 2014-03-04 00:33:51 UTC
Yeah, double VLAN mode looks useless to me - it just doesn't match the configuration model that Linux wants.  Nor does it match this customer's configuration - the BZs haven't covered it since it seemed extraneous, but they usually want multiple outer tags as well as multiple inner tags.

> Well, to be fair there's vlan offloading since the VLAN is applied automatically
> and stripped automatically.

Not from the guest's perspective.  When the tag is host imposed, it's invisible to the guest, so it really doesn't care whether there was hardware assistance helping it be that way.

> I don't think NETIF_F_VLAN_CHALLENGED should be advertised, the NACK in the log
> is the warning saying that the vlan won't work which should give the user a clue.

From the guest's perspective, VLANing simply doesn't work, so absolutely NETIF_F_VLAN_CHALLENGED should be advertised - that's what the flag's for.

Comment 18 Nikolay Aleksandrov 2014-03-04 12:46:09 UTC
(In reply to David Gibson from comment #17)
> Yeah, double VLAN mode looks useless to me - it just doesn't match the
> configuration model that Linux wants.  Nor does it match this customer's
> configuration - the BZs haven't covered it since it seemed extraneous, but
> they usually want multiple outer tags as well as multiple inner tags.
> 
> > Well, to be fair there's vlan offloading since the VLAN is applied automatically
> > and stripped automatically.
> 
> Not from the guest's perspective.  When the tag is host imposed, it's
> invisible to the guest, so it really doesn't care whether there was hardware
> assistance helping it be that way.
> 
Right.

> > I don't think NETIF_F_VLAN_CHALLENGED should be advertised, the NACK in the log
> > is the warning saying that the vlan won't work which should give the user a clue.
> 
> From the guest's perspective, VLANing simply doesn't work, so absolutely
> NETIF_F_VLAN_CHALLENGED should be advertised - that's what the flag's for.
Okay, I see your point but there's no reliable way to tell if we're in a host VLAN
in order to advertise that. For now short of adding a new mbox message, I don't
see any other way to tell for certain how the host has configured the guest from
the guest's POV.

Comment 19 David Gibson 2014-03-05 05:51:26 UTC
I think we have to bite the bullet and add the mbox message.  Ugly, but the confusion without it is even uglier.

Comment 20 Nikolay Aleksandrov 2014-03-05 12:16:53 UTC
(In reply to David Gibson from comment #19)
> I think we have to bite the bullet and add the mbox message.  Ugly, but the
> confusion without it is even uglier.

I agree, my plan is this:
create mbox message - IXGBE_VF_GET_PORT_TYPE
with possible values:
IXGBE_VF_PORT_UNTAGGED <- no vlans configured
IXGBE_VF_PORT_ACCESS <- hypervisor vlan
IXGBE_VF_PORT_TRUNK <- trunk mode

In the beginning I'll make PORT_TRUNK and PORT_UNTAGGED the same thing as 
currently the virtualization software out there can't make use of that and we
must be backwards compatible.

I'll make the patch and suggest it to Intel, see what they have to say about the
issue.

When this becomes completely utilized then a VF in UNTAGGED/ACCESS modes
should advertise VLAN_CHALLENGED, but for a start it should do so for the ACCESS
mode, that won't break anything and fix the issue of this BZ.

Comment 21 David Gibson 2014-03-06 05:22:49 UTC
Sounds ok.  I'm not quite clear on the distinction you're making between UNTAGGED and TRUNK.  Can you clarify?

Comment 22 Nikolay Aleksandrov 2014-03-06 11:02:47 UTC
(In reply to David Gibson from comment #21)
> Sounds ok.  I'm not quite clear on the distinction you're making between
> UNTAGGED and TRUNK.  Can you clarify?

UNTAGGED is supposed to be a port which doesn't have the <vlan type='trunk'> or
the <vlan> tag entirely, it should also advertise VLAN_CHALLENGED later on when
support gets added for it in the virtualization software. I think I'm getting 
ahead a bit making this similar to switch ports without any support from the virt 
software, for the purposes of this BZ I'll only leave the ACCESS and TRUNK types.

Comment 23 David Gibson 2014-03-10 06:01:29 UTC
Hmm.. ok.  It doesn't make sense to me to define the inter-kernel communications in terms of the libvirt configuration.  It should be based on the actual semantic differences between the modes and I'm not clear what those are.

AFAICT, with no <vlan> specific in the libvirt config, we'll have soemthing that should act as a trunk port.

Comment 24 Nikolay Aleksandrov 2014-03-10 14:05:52 UTC
Created attachment 872709 [details]
ixgbe/ixgbevf: add new reset parameter for vlan capability [test patch]

This is a test patch for RHEL6 which adds a new parameter to the MAC permaddr
PF -> VF message which denotes if the VF should advertise VLAN_CHALLENGED that is 
if the VF is in a host VLAN/qos then it should advertise VLAN_CHALLENGED. I've
tested this locally on a 82599 card. Any comments and suggestions are very 
welcome.

Thanks,
 Nik

Comment 25 Nikolay Aleksandrov 2014-03-10 14:37:50 UTC
(In reply to Nikolay Aleksandrov from comment #24)
> Created attachment 872709 [details]
> ixgbe/ixgbevf: add new reset parameter for vlan capability [test patch]
> 
> This is a test patch for RHEL6 which adds a new parameter to the MAC permaddr
> PF -> VF message which denotes if the VF should advertise VLAN_CHALLENGED
> that is 
> if the VF is in a host VLAN/qos then it should advertise VLAN_CHALLENGED.
> I've
> tested this locally on a 82599 card. Any comments and suggestions are very 
> welcome.
> 
> Thanks,
>  Nik

One note: the patch is only a test version, I think it'd be nice to bump the
API version so to know if the drivers need to look for the new parameter.
That is to cover the cases old ixgbe - new ixgbevf and vice versa.
If/when the time comes for official posting I'll take care of this.

Comment 26 Nikolay Aleksandrov 2014-03-11 19:12:32 UTC
Taking a look at upstream, this is resolved by checking the return value of
ndo_vlan_rx_add_vid. Since we can't change the netdev structure, I'll make a new
patch which adds a new ndo_vlan_rx_add_vid to the netdev_extended one which
can override the old one, if present. I'll make the vlan code use it and check the return value if it's defined for the interface, otherwise it'll fallback to the old one.

Comment 27 David Gibson 2014-03-12 06:06:37 UTC
Ok.

Do I undestand correctly that that will avoid extending the mailbox protocol?  If so, that sounds like the way forward, although workarounds to preserve kABI will be pretty awkward.

Comment 28 Nikolay Aleksandrov 2014-03-12 10:41:53 UTC
(In reply to David Gibson from comment #27)
> Ok.
> 
> Do I undestand correctly that that will avoid extending the mailbox
> protocol?  If so, that sounds like the way forward, although workarounds to
> preserve kABI will be pretty awkward.
Yes, that's the idea. And I'm not happy about the kABI workarounds at all, they
all look like hacks at best, but I think this is the cleanest way to go.
Also the new mbox message isn't going to be accepted upstream since that is solved there.

Comment 29 David Gibson 2014-03-13 02:42:04 UTC
Yeah.  Sometimes ugly hacks are the best you can do.

Let me know when you have something I can test.

Comment 31 Nikolay Aleksandrov 2014-03-17 12:42:15 UTC
*** Bug 1074564 has been marked as a duplicate of this bug. ***

Comment 32 Nikolay Aleksandrov 2014-03-17 14:56:18 UTC
Created attachment 875517 [details]
Add new rx_add_vid method with check of return value

There're 2 patches in the archive, the first one adds the new ndo_vlan_rx_add_vid_ret to netdev_extended which, if present, is used instead of the other one in the vlan code with the difference that the return value of this method is checked afterwards and if there was an error the vlan won't get created.
Patch 02 adds support for this new method to ixgbevf.

I ran local tests by putting a VM in a hypervisor vlan and it was no longer possible to create vlans on top of the attached ixgbevf interface since they get NACKed by the host and an error is returned by the new rx_add_vid method.

Please give these a try, I'll continue testing them.

Thanks,
 Nik

Comment 33 David Gibson 2014-03-20 03:51:28 UTC
Btw, have we heard from Intel confirming that the hardware/firmware behaves like we thinks it does.

The customer would really like a statement from Intel stating that it's not possible to apply an outer tag from the host and an inner one from the guest.

Comment 36 John Ronciak 2014-03-20 22:19:19 UTC
It is not possible to apply an outer tag from the host and the inner tag from the guest.  So comment #33 is correct.  This cannot be done with out HW.

Comment 37 David Gibson 2014-03-20 23:20:46 UTC
John,

Thanks for the statement, good to have that cleared up.

Any chance of requesting that feature in a future firmware?  Even if it requires that the inner tag be software-applied without the usual vlan hw acceleration, it seems like it could be quite useful for virtualized environments.

Since you're here, would you also care to comment on bug 1067802?  Although we have a workaround, the underlying cause there I'd consider a bug in the firmware, albeit a documented one.

Comment 45 Nikolay Aleksandrov 2014-03-28 11:26:56 UTC
Created attachment 879793 [details]
Add new rx_add_vid method with check of return value v2

This is the updated patch-set, some notes:
after these patches both methods (old and new) should be kept up-to-date and
supported since some devices (e.g. bonding, bridge) call in the old methods directly. If the new method is used it should be kept in mind that vlgrp might
not be available when it's called so it should not be dependent on it.

Comment 48 David Gibson 2014-04-02 04:34:35 UTC
Andy,

I'm unclear are you saying the technical problems extend to v2 of the patchset, or just the original version?

The problem with just issuing a warning to the kernel log is that it's very non-obvious from a user perspective: vconfig appears to have succeded, and there's no indication that the kernel log will contain more information on why things aren't working.

Comment 49 John Ronciak 2014-04-04 20:22:37 UTC
In comment #37 I'm not sure what the feature request is.  Having the HW be able to insert multiple VLAN tags by both the quests and hosts?  Not sure how that would be workable with the Linux stack today.  Also, not sure how that would get managed in the HW as there would seem be lots of edge cases and network configurations that would break.

Please give an exact description of the feature and how you would expect it to work.

Comment 53 David Gibson 2014-04-09 00:08:48 UTC
@Vlad, note that in this case the switch configuration is fine.  The customer is already using QinQ in production via Linux bridge devices on top of the raw (host) ixgbe.  The difficulty is combining that with SR-IOV.  We understand from Intel that it's simply not possible in the hardware, so this bug is now just about it fail in a way that's less cryptic.

Comment 55 David Gibson 2014-04-09 07:13:20 UTC
@John,

> In comment #37 I'm not sure what the feature request is.  Having the HW be able to insert multiple VLAN tags by both the quests and hosts?  Not sure how that would be workable with the Linux stack today.  Also, not sure how that would get managed in the HW as there would seem be lots of edge cases and network configurations that would break.

> Please give an exact description of the feature and how you would expect it to work.

* The idea is that you can add one VLAN tag from host, and one from guest.

* I'd envisage this as a new mode in the PFVMVIR, instructing guest supplied tags to be inner tags to the host supplied outer tag, instead of the guest tag being either discarded or overriding the host tag.

* Two ways it could work
    1) Allow the guest to use the normal tagging features, even when the host also uses them, with guest set tags being applied as inner tags to the host outer tags.

    2) Allowing guest VLAN spoof detection to be configured in such a way that packets given VLAN tags in guest software (rather than using the VF side VLAN tagging acceleration) will be encapsulated as-is in the host's outer VLAN frame, rather than being dropped as a spoof attempt.  Ideally this would also need a guest visible bit so that it knows it must use software tagging rather than the usual hardware accelerated paths.


The situation we have now is that the customer has an existing QinQ setup.  This works correctly, provided that:
    * Both layers of tag are set by the host (one with hardware accel, one in software only)
OR  * (once bug 106928 is fixed) both layers of tag are set in the guest (again, one using the hw accel, one in software)

It seems perverse that there's no way to have the exact same tags set, but with one configured from the host, and one from the guest.

Comment 61 David Gibson 2014-05-29 03:38:02 UTC
No customer confidential information in this case, so switching to public at customer request.

Comment 72 Nikolay Aleksandrov 2014-08-12 13:46:15 UTC
Created attachment 926073 [details]
Add new rx_add_vid method with check of return value v3

Attaching the final patch version.

Comment 73 John Greene 2014-08-12 13:50:50 UTC
Closing this as WONTFIX.  It's been NACK'ed in internal review by a number of developers.  However, proposed patches that do address the problem are posted here by Nik (well done Nik) if you wish to use them.  They would be rejected by upstream since the issue is solved there already later kernels (and RHEL 7 as well) already.  So there are 2 possible solutions.  

Patches for RHEL6
Upgrade to RHEL7

The development team as a whole regrets this, but we find no good solution at this time in RHEL 6.  Should something arise later, I may revisit this or note anything that develops later that may be helpful.


Note You need to log in before you can comment on or make changes to this bug.