Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1437113 - PCIe: Allow configuring Generic PCIe Root Ports MMIO Window
Summary: PCIe: Allow configuring Generic PCIe Root Ports MMIO Window
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.4
Hardware: All
OS: All
medium
medium
Target Milestone: rc
: ---
Assignee: Marcel Apfelbaum
QA Contact: jingzhao
URL:
Whiteboard:
Depends On:
Blocks: 1344299 1434747
TreeView+ depends on / blocked
 
Reported: 2017-03-29 14:05 UTC by Marcel Apfelbaum
Modified: 2018-04-11 00:16 UTC (History)
9 users (show)

Fixed In Version: qemu-kvm-rhev-2.10.0-7.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-04-11 00:16:25 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:1104 normal SHIPPED_LIVE Important: qemu-kvm-rhev security, bug fix, and enhancement update 2018-04-10 22:54:38 UTC
Red Hat Bugzilla 1514105 None CLOSED backport edk2 commit 6e3287442774 so that PciBusDxe not over-claim resources 2019-04-15 05:24:28 UTC

Internal Links: 1514105

Description Marcel Apfelbaum 2017-03-29 14:05:16 UTC
Description of problem:


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 2 Marcel Apfelbaum 2017-03-29 14:11:44 UTC
By default the Generic PCIe Root Port exposes a 2M MMIO window size. In case we want to attach a phys device to a VM, it is not enough for modern PCIe devices that may require more.

Gerd:
The best way to communicate window size hints would be to use a vendor specific pci capability (instead of setting the desired size on reset).  The information will always be available then and we don't run into initialization order issues.

Comment 5 jingzhao 2017-11-14 06:50:36 UTC
Hi  Marcel

 Could you share with more info about this bz.

 As QE know, checked the device that attached to pcie-root-port in guest 

 # cat /proc/iomem

  [root@localhost ~]# lspci
  .......
   01:00.0 Ethernet controller: Red Hat, Inc. Virtio network device (rev 01)

  [root@localhost ~]# cat /proc/iomem 
  ..........
    fc200000-fc23ffff : 0000:01:00.0
    fc240000-fc240fff : 0000:01:00.0
  ..............

Am I right?

Thanks
Jing

Comment 6 Marcel Apfelbaum 2017-11-14 12:42:15 UTC
(In reply to jingzhao from comment #5)
> Hi  Marcel
> 
>  Could you share with more info about this bz.
> 
>  As QE know, checked the device that attached to pcie-root-port in guest 
> 
>  # cat /proc/iomem
> 
>   [root@localhost ~]# lspci
>   .......
>    01:00.0 Ethernet controller: Red Hat, Inc. Virtio network device (rev 01)
> 
>   [root@localhost ~]# cat /proc/iomem 
>   ..........
>     fc200000-fc23ffff : 0000:01:00.0
>     fc240000-fc240fff : 0000:01:00.0
>   ..............
> 

Hi Jing,

> Am I right?
> 

I am afraid is a little more complicated. This is about reserving more/less
MMIO or IO than the default values.

You run QEMU with:
   -device pcie-root-port,id=p1,io-reserve=0x2000,mem-reserve=0x400000,pref32-reserve=0x400000
and check the lspci command in guest (if linux) or Device manager in Windows
and see the values are passed correctly.

And this is also not enough... you need an updated firmware
that supports the above hints.

You should use the latest OVMF rebase for RHEL-7.5 (thanks Laszlo!):

* https://brewweb.engineering.redhat.com/brew/buildinfo?buildID=608934
  ovmf-20171011-1.git92d07e48907f.el7


Thanks,
Marcel



> Thanks
> Jing

Comment 12 jingzhao 2017-11-16 05:11:02 UTC
Thanks marcel and lazslo

I had tried it with ("https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=14539582") and ovmf (OVMF-20171011-1.git92d07e48907f.el7.noarch)

-device pcie-root-port,bus=pcie.0,id=root0,io-reserve=4K,mem-reserve=8M,pref32-reserve=16M \


the test result:

ovmf log:

PciBus: Discovered PPB @ [00|03|00]
GetResourcePadding: Address=00:03.0 DevicePath=PciRoot(0x0)/Pci(0x3,0x0)
GetResourcePadding: BusNumbers=0xFFFFFFFF Io=0x1000 NonPrefetchable32BitMmio=0x800000
GetResourcePadding: Prefetchable32BitMmio=0x1000000 Prefetchable64BitMmio=0xFFFFFFFFFFFFFFFF
   Padding: Type = PMem32; Alignment = 0xFFFFFF;        Length = 0x1000000  
                                                        ^^^^^^^^^^^^^^^^(size=16M)

   Padding: Type =  Mem32; Alignment = 0x7FFFFF;        Length = 0x800000

                                                              ^^^^^^^^^(size=8M)
   Padding: Type =     Io; Alignment = 0xFFF;   Length = 0x1000
                                                ^^^^^^^^^^(size=4K)
   BAR[0]: Type =  Mem32; Alignment = 0xFFF;    Length = 0x1000;        Offset = 0x10



PciBus: Resource Map for Bridge [00|03|00]
Type =   Io16; Base = 0x7000;   Length = 0x1000;        Alignment = 0xFFF
   Base = Padding;      Length = 0x1000;        Alignment = 0xFFF
Type =  Mem32; Base = 0x98000000;       Length = 0x2000000;     Alignment = 0xFFFFFF
   Base = Padding;      Length = 0x1000000;     Alignment = 0xFFFFFF
   Base = Padding;      Length = 0x800000;      Alignment = 0x7FFFFF
   Base = 0x98000000;   Length = 0x1000;        Alignment = 0xFFF;      Owner = PCI [01|00|00:14]
Type =  Mem32; Base = 0x9A205000;       Length = 0x1000;        Alignment = 0xFFF

^^^^^^^^^^(confused about above log, how can I check it?)
Type = PMem64; Base = 0x800000000;      Length = 0x100000;      Alignment = 0xFFFFF
   Base = 0x800000000;  Length = 0x4000;        Alignment = 0x3FFF;     Owner = PCI [01|00|00:20]



lspci result:

	I/O behind bridge: 00007000-00007fff  (size=4K)
	Memory behind bridge: 98000000-99ffffff (size = 64M) ? confused, didn't fixed it or other issues?
	Prefetchable memory behind bridge: 0000000800000000-00000008000fffff (size = 16M)


Marcel, could you help to confirm it?

Thanks
Jing

Comment 13 Laszlo Ersek 2017-11-16 13:40:06 UTC
Hi Jing,

here's an explanation for the log snippets you see:

(In reply to jingzhao from comment #12)

> Thanks marcel and lazslo
>
> I had tried it with
> ("https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=14539582")
> and ovmf (OVMF-20171011-1.git92d07e48907f.el7.noarch)
>
> -device pcie-root-port,bus=pcie.0,id=root0,io-reserve=4K,mem-reserve=8M,pref32-reserve=16M \
>
>
> the test result:
>
> ovmf log:
>
> PciBus: Discovered PPB @ [00|03|00]
> GetResourcePadding: Address=00:03.0 DevicePath=PciRoot(0x0)/Pci(0x3,0x0)
> GetResourcePadding: BusNumbers=0xFFFFFFFF Io=0x1000 NonPrefetchable32BitMmio=0x800000
> GetResourcePadding: Prefetchable32BitMmio=0x1000000 Prefetchable64BitMmio=0xFFFFFFFFFFFFFFFF

These lines show that the PCI capability with the resource reservation hints
has been parsed correctly.


>    Padding: Type = PMem32; Alignment = 0xFFFFFF;        Length = 0x1000000
>                                                         ^^^^^^^^^^^^^^^^(size=16M)
>
>    Padding: Type =  Mem32; Alignment = 0x7FFFFF;        Length = 0x800000
>
>                                                               ^^^^^^^^^(size=8M)
>    Padding: Type =     Io; Alignment = 0xFFF;   Length = 0x1000
>                                                 ^^^^^^^^^^(size=4K)
>    BAR[0]: Type =  Mem32; Alignment = 0xFFF;    Length = 0x1000;        Offset = 0x10

So this part basically lists the BARs ("resources needed") by the root port.
The hints from the PCI capability are correctly turned  into "Padding"
pseudo-resources.

The BAR#0 resource is a real one: IIRC, it is from the SHPC (Standard
HotPlug Controller) BAR for the root port. "Offset" means the address of the
BAR (base address register) itself in the config space of the device.

> PciBus: Resource Map for Bridge [00|03|00]

OK, this is printed after the enumeration and resource assignment have
completed. This part will list the actual addresses allocated.

> Type =   Io16; Base = 0x7000;   Length = 0x1000;        Alignment = 0xFFF
>    Base = Padding;      Length = 0x1000;        Alignment = 0xFFF

This corresponds to io-reserve=4K; the reservation has been allocated at IO
port 0x7000, for 0x1000 ports.

> Type =  Mem32; Base = 0x98000000;       Length = 0x2000000;     Alignment = 0xFFFFFF
>    Base = Padding;      Length = 0x1000000;     Alignment = 0xFFFFFF
>    Base = Padding;      Length = 0x800000;      Alignment = 0x7FFFFF
>    Base = 0x98000000;   Length = 0x1000;        Alignment = 0xFFF;      Owner = PCI [01|00|00:14]
> Type =  Mem32; Base = 0x9A205000;       Length = 0x1000;        Alignment = 0xFFF
>
> ^^^^^^^^^^(confused about above log, how can I check it?)
> Type = PMem64; Base = 0x800000000;      Length = 0x100000;      Alignment = 0xFFFFF
>    Base = 0x800000000;  Length = 0x4000;        Alignment = 0x3FFF;     Owner = PCI [01|00|00:20]

So this is more tricky, and indeed I see a bug here (in the firmware).

First, OVMF's PciHostBridgeLib passes the
EFI_PCI_HOST_BRIDGE_COMBINE_MEM_PMEM flag to PciHostBridgeDxe. This means
that the same root complex-level MMIO aperture will be used for allocating
both prefetchable and non-prefetchable MMIO. This is why you see the
Paddings for both mem-reserve=8M and pref32-reserve=16M under "Type =
Mem32" -- PciBusDxe degrades the PMem32 resource request to Mem32.

However, this degradation does not mean that the *maximum* of both will be
taken, for resource reservation. The maximum *would* be taken if the
resource requests were *originally* of the same type. Because they are
originally different types, here they are handled separately after
degradation, so they are added (rather than taking their maximum). IOW,
we'll have a summed padding (reservation) for 32-bit MMIO of 24MB
(0x180_0000).


Second, there's an allocation at 0x98000000. This is for the sake of BAR#1
(offset 0x14) of the device that is plugged into the root port. This is
marked as "Owner = PCI [01|00|00:14]": bus 1, slot 0 -- slot is guaranteed
to be 0 for devices plugged into pcie root ports --,  function 0, offset
0x14 (i.e. BAR#1). Note that this allocation is not in *addition* to the
24MB described above; instead it is *within* either the 16MB or the 8MB
reservation (I can't tell without seeing more of the log, but this is a side
topic anyway.)


Third, you see an allocation at 0x9A205000 for the root port's own SHPC BAR.
This is separate from the reservations, for good reason: resources needed by
the port itself are accounted for in the aperture of the port's *parent* bus
(i.e., the root complex in this case).


Fourth, you see a 64-bit MMIO allocation at 0x8_0000_0000, namely for BAR#4
(= offset 0x20) of the same PCI device that is plugged into this pcie root
port ("Owner = PCI [01|00|00:...]").

Notice "Length = 0x100000" near "Type = PMem64". While it doesn't seem to
follow from the BAR size (0x4000) of the device behind the root port, this
is in fact an expected rounding-up. Section

    3.2.5.9. Prefetchable Memory Base Register and Prefetchable Memory Limit
    Register

in the

    PCI-to-PCI Bridge Architecture Specification

says,

    Thus, the bottom of the defined prefetchable memory address range will
    be aligned to a 1 MB boundary and the top of the defined memory address
    range will be one less than a 1 MB boundary.


OK, so what about the bug I mentioned above: if you look at the top line, it
says "Length = 0x2000000". That's wrong, it means 32MB, but it should be
24MB ("Length = 0x1800000"). This is fixed by the following upstream edk2
commit -- I just tested it --, which I will have to backport:

* 6e3287442774 ("MdeModulePkg/PciBus: Fix bug that PCI BUS claims too much
                resource", 2017-10-20)

> lspci result:

Here you mis-calculated a few values, but your basic question is right:


> 	I/O behind bridge: 00007000-00007fff  (size=4K)

Correct calculation on your part, and the result is correct too.

> 	Memory behind bridge: 98000000-99ffffff (size = 64M) ? confused, didn't fixed it or other issues?

In fact (0x99ffffff + 1 - 0x98000000) equals 0x200_0000, i.e., 32MB, not
64MB.

Here lspci should report 24MB. This issue is a consequence of the
above-mentioned firmware bug (where the firmware should set "Length =
0x1800000").

> 	Prefetchable memory behind bridge: 0000000800000000-00000008000fffff (size = 16M)

Another mis-calculation on your part; (0x8000fffff + 1 - 0x800000000) equals
0x10_0000; that is, 1MB.

Again, this is an expected result; see the rounding I mentioned above, from
section 3.2.5.9. of the PCI bridge spec.


> Marcel, could you help to confirm it?

I won't clear the NEEDINFO just yet so Marcel can agree with or dispute my
above analysis.

Thanks,
Laszlo

Comment 14 Laszlo Ersek 2017-11-16 16:34:13 UTC
(In reply to Laszlo Ersek from comment #13)

> OK, so what about the bug I mentioned above: if you look at the top line, it
> says "Length = 0x2000000". That's wrong, it means 32MB, but it should be
> 24MB ("Length = 0x1800000"). This is fixed by the following upstream edk2
> commit -- I just tested it --, which I will have to backport:
> 
> * 6e3287442774 ("MdeModulePkg/PciBus: Fix bug that PCI BUS claims too much
>                 resource", 2017-10-20)

Now tracked by bug 1514105.

Comment 18 Miroslav Rezanina 2017-11-22 15:20:43 UTC
Fix included in qemu-kvm-rhev-2.10.0-7.el7

Comment 20 jingzhao 2017-12-12 06:54:00 UTC
1. Tested it with kernel-3.10.0-820.el7.x86_64 & OVMF-20171011-4.git92d07e48907f.el7.noarch & qemu-kvm-rhev-2.10.0-12.el7.x86_64

checked lspci in guest and ovmf log

ovmf log:

PciBus: Discovered PPB @ [00|03|00]
GetResourcePadding: Address=00:03.0 DevicePath=PciRoot(0x0)/Pci(0x3,0x0)
GetResourcePadding: BusNumbers=0xFFFFFFFF Io=0x1000 NonPrefetchable32BitMmio=0x800000
GetResourcePadding: Prefetchable32BitMmio=0x1000000 Prefetchable64BitMmio=0xFFFFFFFFFFFFFFFF
   Padding: Type = PMem32; Alignment = 0xFFFFFF;        Length = 0x1000000
   Padding: Type =  Mem32; Alignment = 0x7FFFFF;        Length = 0x800000
   Padding: Type =     Io; Alignment = 0xFFF;   Length = 0x1000

lspci result:

	I/O behind bridge: 00007000-00007fff  (size=4K)
	Memory behind bridge: 98000000-997fffff  (size=24M)
	Prefetchable memory behind bridge: 0000000800000000-00000008000fffff (size=1M)

According to comment 13, it is the expected result

2. Tested it with kernel-3.10.0-820.el7.x86_64 & seabios-1.11.0-1.el7.x86_64 & qemu-kvm-rhev-2.10.0-12.el7.x86_64


checked lspci result in guest:

	I/O behind bridge: 0000c000-0000cfff  (size = 4K)
	Memory behind bridge: fc000000-fc7fffff  (size = 8M )
	Prefetchable memory behind bridge: 00000000fd800000-00000000fe7ffff (size = 16M)

Is it the expected behavior with seabios?

Could you help to check it?

Following the key command of qemu:

-device pcie-root-port,bus=pcie.0,id=root0,io-reserve=4K,mem-reserve=8M,pref32-reserve=16M \
-device virtio-net-pci,netdev=tap10,mac=9a:6a:6b:6c:6d:6e,bus=root0 -netdev tap,id=tap10 \


Thanks
Jing

Comment 21 Marcel Apfelbaum 2017-12-12 09:36:12 UTC
(In reply to jingzhao from comment #20)
> 1. Tested it with kernel-3.10.0-820.el7.x86_64 &
> OVMF-20171011-4.git92d07e48907f.el7.noarch &
> qemu-kvm-rhev-2.10.0-12.el7.x86_64
> 
> checked lspci in guest and ovmf log
> 
> ovmf log:
> 
> PciBus: Discovered PPB @ [00|03|00]
> GetResourcePadding: Address=00:03.0 DevicePath=PciRoot(0x0)/Pci(0x3,0x0)
> GetResourcePadding: BusNumbers=0xFFFFFFFF Io=0x1000
> NonPrefetchable32BitMmio=0x800000
> GetResourcePadding: Prefetchable32BitMmio=0x1000000
> Prefetchable64BitMmio=0xFFFFFFFFFFFFFFFF
>    Padding: Type = PMem32; Alignment = 0xFFFFFF;        Length = 0x1000000
>    Padding: Type =  Mem32; Alignment = 0x7FFFFF;        Length = 0x800000
>    Padding: Type =     Io; Alignment = 0xFFF;   Length = 0x1000
> 
> lspci result:
> 
> 	I/O behind bridge: 00007000-00007fff  (size=4K)
> 	Memory behind bridge: 98000000-997fffff  (size=24M)
> 	Prefetchable memory behind bridge: 0000000800000000-00000008000fffff
> (size=1M)
> 
> According to comment 13, it is the expected result
> 
> 2. Tested it with kernel-3.10.0-820.el7.x86_64 & seabios-1.11.0-1.el7.x86_64
> & qemu-kvm-rhev-2.10.0-12.el7.x86_64
> 
> 
> checked lspci result in guest:
> 
> 	I/O behind bridge: 0000c000-0000cfff  (size = 4K)
> 	Memory behind bridge: fc000000-fc7fffff  (size = 8M )
> 	Prefetchable memory behind bridge: 00000000fd800000-00000000fe7ffff (size =
> 16M)
> 
> Is it the expected behavior with seabios?
> 
> Could you help to check it?
> 
> Following the key command of qemu:
> 
> -device
> pcie-root-port,bus=pcie.0,id=root0,io-reserve=4K,mem-reserve=8M,pref32-
> reserve=16M \
> -device virtio-net-pci,netdev=tap10,mac=9a:6a:6b:6c:6d:6e,bus=root0 -netdev
> tap,id=tap10 \
> 

Exactly as expected, thank.
A little thing about io-reseve=4k. This is the default size, you should use 8k to check it


> 
> Thanks
> Jing

Comment 22 jingzhao 2017-12-13 01:58:17 UTC
Test against with "io-reserve=8K,mem-reserve=8M,pref32-reserve=16M" with kernel-3.10.0-820.el7.x86_64 & OVMF-20171011-4.git92d07e48907f.el7.noarch & qemu-kvm-rhev-2.10.0-12.el7.x86_64 & seabios-1.11.0-1.el7.x86_64


1. test result of ovmf & q35 machine type:

lspci result:

I/O behind bridge: 00006000-00007fff  (size=8k)
Memory behind bridge: 98000000-997fffff
Prefetchable memory behind bridge: 0000000800000000-00000008000fffff


ovmf log:
PciBus: Discovered PPB @ [00|03|00]
GetResourcePadding: Address=00:03.0 DevicePath=PciRoot(0x0)/Pci(0x3,0x0)
GetResourcePadding: BusNumbers=0xFFFFFFFF Io=0x2000 NonPrefetchable32BitMmio=0x800000
GetResourcePadding: Prefetchable32BitMmio=0x1000000 Prefetchable64BitMmio=0xFFFFFFFFFFFFFFFF
   Padding: Type = PMem32; Alignment = 0xFFFFFF;        Length = 0x1000000
   Padding: Type =  Mem32; Alignment = 0x7FFFFF;        Length = 0x800000
   Padding: Type =     Io; Alignment = 0x1FFF;  Length = 0x2000   (size=8k)
   BAR[0]: Type =  Mem32; Alignment = 0xFFF;    Length = 0x1000;        Offset = 0x1


According to comment 13, it is the expected behavior


2. test result of q35 & seabios:

lspci result:

I/O behind bridge: 0000c000-0000dfff  (size=8k)
Memory behind bridge: fc000000-fc7fffff
Prefetchable memory behind bridge: 00000000fd800000-00000000fe7ffff


According to comment 21, it is the expect behavior


Thanks
Jing

Comment 23 jingzhao 2017-12-13 01:59:25 UTC
According to comment 13, 20, 21, 22, verified the issue

changed to verified status

Thanks
Jing

Comment 25 errata-xmlrpc 2018-04-11 00:16:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1104


Note You need to log in before you can comment on or make changes to this bug.