Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1366953 - virtio balloon can not work with pci-bridge
Summary: virtio balloon can not work with pci-bridge
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: SLOF
Version: 7.3
Hardware: ppc64le
OS: Linux
medium
medium
Target Milestone: rc
: 7.4
Assignee: Thomas Huth
QA Contact: xianwang
URL:
Whiteboard:
Depends On: 1392055
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-15 06:34 UTC by mazhang
Modified: 2017-08-01 22:33 UTC (History)
12 users (show)

Fixed In Version: SLOF-20161019
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-01 22:33:27 UTC


Attachments (Terms of Use)
Guest dmesg (deleted)
2016-08-15 06:35 UTC, mazhang
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:2093 normal SHIPPED_LIVE SLOF bug fix and enhancement update 2017-08-01 19:35:59 UTC

Description mazhang 2016-08-15 06:34:45 UTC
Description of problem:


Version-Release number of selected component (if applicable):

Host:
3.10.0-489.el7.ppc64le
qemu-kvm-rhev-2.6.0-20.el7.ppc64le

Guest:
3.10.0-492.el7.ppc64

How reproducible:
100%

Steps to Reproduce:
1.Start qemu-kvm with following command line:
/usr/libexec/qemu-kvm \
    -S  \
    -name 'avocado-vt-vm1'  \
    -sandbox off  \
    -M pseries-rhel7.3.0 \
    -nodefaults  \
    -vga std  \
    -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/avocado_OvZeQU/monitor-qmpmonitor1-20160814-230506-kzmj1nQF,server,nowait \
    -mon chardev=qmp_id_qmpmonitor1,mode=control  \
    -chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/avocado_OvZeQU/monitor-catch_monitor-20160814-230506-kzmj1nQF,server,nowait \
    -mon chardev=qmp_id_catch_monitor,mode=control  \
    -chardev socket,id=serial_id_serial0,path=/var/tmp/avocado_OvZeQU/serial-serial0-20160814-230506-kzmj1nQF,server,nowait \
    -device spapr-vty,reg=0x30000000,chardev=serial_id_serial0 \
    -device pci-ohci,id=usb1,bus=pci.0,addr=03 \
    -device virtio-scsi-pci,id=virtio_scsi_pci0,bus=pci.0,addr=04,disable-legacy=off,disable-modern=on \
    -drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/home/RHEL-Server-7.3-ppc64-virtio-scsi.qcow2 \
    -device scsi-hd,id=image1,drive=drive_image1 \
    -device virtio-net-pci,mac=9a:a7:a8:a9:aa:ab,id=idOhBgeM,vectors=4,netdev=idNI3ooU,bus=pci.0,addr=05,disable-legacy=off,disable-modern=on  \
    -netdev tap,id=idNI3ooU,vhost=on \
    -m 8192  \
    -smp 8,maxcpus=8,cores=4,threads=1,sockets=2 \
    -device usb-tablet,id=usb-tablet1,bus=usb1.0,port=1  \
    -vnc :10  \
    -rtc base=utc,clock=host  \
    -boot order=cdn,once=c,menu=off,strict=off \
    -enable-kvm \
    -monitor stdio \
    -device pci-bridge,bus=pci.0,id=bridge1,chassis_nr=1,addr=0x6 \
    -device virtio-balloon-pci,id=balloon0,bus=bridge1,addr=0x7 \

2. Test virtio balloon device.

Actual results:
Virtio balloon device cant work.

Hmp:
(qemu)  info balloon 
balloon: actual=8192
(qemu) balloon 4096
(qemu)  info balloon 
balloon: actual=8192

Guest:
# dmesg |grep balloon
[    6.172708] virtio_balloon: probe of virtio2 failed with error -22


Expected results:
Virtio balloon work well.

Additional info:
1. Virtio balloon with pci-bus work well.
2. x86 platform work well.

Comment 1 mazhang 2016-08-15 06:35:41 UTC
Created attachment 1190796 [details]
Guest dmesg

Comment 3 Thomas Huth 2016-08-15 15:53:15 UTC
Observation: It seems like it is not failing with a different PCI slot address. This is working for me:

sudo /usr/libexec/qemu-kvm -enable-kvm -nographic -vga none -hda /path/to/image.qcow2 -m 8192 -smp 2 -device pci-bridge,bus=pci.0,id=bridge1,chassis_nr=1,addr=0x6 -device virtio-balloon-pci,id=balloon0,bus=bridge1,addr=2

But as soon as I change the "addr=2" to a higher value, it is failing.

Also not sure if it is related, but for the working cases, "lspci -vv | grep Interrupt" shows "pin A routed to IRQ 18", while in the non-working cases, I get "pin A routed to IRQ 0" instead.

Comment 4 David Gibson 2016-08-18 00:50:32 UTC
I think the IRQ thing is almost certainly related to this.  Even with the various virtual irq number mappings, I'm pretty sure IRQ 0 is not a plausible value for a PCI irq.

My guess would be something is wrong with our irq swizzling across bridges.

Comment 5 David Gibson 2016-09-06 06:10:12 UTC
I'm pretty sure this is due to an error in the interrupt-map property in the device tree node for the PCI bridge.  In other words, a SLOF bug, probably.

# hexdump -C interrupt-map
00000000  00 00 18 00 00 00 00 00  00 00 00 00 00 00 00 01  |................|
00000010  1e 52 29 60 00 00 30 00  00 00 00 00 00 00 00 00  |.R)`..0.........|
00000020  00 00 00 00 00 00 18 00  00 00 00 00 00 00 00 00  |................|
00000030  00 00 00 02 1e 52 29 60  00 00 30 00 00 00 00 00  |.....R)`..0.....|
00000040  00 00 00 00 00 00 00 01  00 00 18 00 00 00 00 00  |................|
00000050  00 00 00 00 00 00 00 03  1e 52 29 60 00 00 30 00  |.........R)`..0.|
00000060  00 00 00 00 00 00 00 00  00 00 00 02 00 00 18 00  |................|
00000070  00 00 00 00 00 00 00 00  00 00 00 04 1e 52 29 60  |.............R)`|
00000080  00 00 30 00 00 00 00 00  00 00 00 00 00 00 00 03  |..0.............|
00000090

The normal PCI interrupt specified should be either 1, 2, 3 or 4, mapping to pins A, B, C and D.  These are generally swizzled on bridges so that depending on which slot the device is in those pins are rotated to different pins on the bridge itself.

However, this particular interrupt map appears to be mapping:
     1->0, 2->1, 3->2, 4->3
PinA, the interrupt actually used by the balloon is mapped to the invalid irq specifier '0'.

Thomas, can you look into this.

Comment 6 David Gibson 2016-09-06 06:11:50 UTC
Notes for interpreting the dump above:
    * 00 00 18 00  00 00 00 00  00 00 00 00
      is the config address (1st reg entry) of the balloon device
    * 1e 52 29 60
      is the phandle of the PCI host bridge device node (i.e. the parent of this bridge).

Comment 7 Thomas Huth 2016-09-06 11:36:52 UTC
I think you're right, this looks very suspicious. There is likely a bug in the pci bridge setup code of SLOF that prepares the interrupt-map property. I can make it work when I apply the following patch:

diff --git a/board-qemu/slof/pci-interrupts.fs b/board-qemu/slof/pci-interrupts.fs
--- a/board-qemu/slof/pci-interrupts.fs
+++ b/board-qemu/slof/pci-interrupts.fs
@@ -1,6 +1,7 @@
 
 : pci-gen-irq-map-one ( prop-addr prop-len slot pin -- prop-addr prop-len )
         2dup + 4 mod                ( prop-addr prop-len slot pin parentpin )
+        dup 0= IF drop 4 THEN
         >r >r                       ( prop-addr prop-len slot R: swizzledpin pin )
 
         \ Child slot#

Not sure yet, whether this is the 100% correct solution, so I need to do some more tests with that before I can send a patch...

Comment 8 Thomas Huth 2016-09-07 08:37:59 UTC
Moving this BZ to 7.4 since it is IMHO not a blocker bug (you can put the balloon device to a different location on the bus to make it work again, i.e. there is a work-around).

Comment 9 David Gibson 2016-09-07 23:02:19 UTC
Michal,

This bug will affect most guest PCI devices (emulated or VFIO) behind a PCI to PCI bridge, which I think will be automatically created if you have enough PCI devices attached.

Under RHEL, is there anything the user can do to influence the order in which PCI devices are added to the guest, and therefore whether they'll be behind a P2P bridge or not?

This is to assess whether this bug is urgent enough to push into 7.3 or not.

(Note that I'm about 95% confident that bug 1370026 is a dupe of this one).

Comment 10 Michal Skrivanek 2016-09-08 08:36:37 UTC
(In reply to David Gibson from comment #9)
> Michal,
> 
> This bug will affect most guest PCI devices (emulated or VFIO) behind a PCI
> to PCI bridge, which I think will be automatically created if you have
> enough PCI devices attached.

we don't add any bridge ourselves nor do we use so many devices so we should be good here

Comment 11 Thomas Huth 2016-09-08 09:10:12 UTC
I've now sent two patches to upstream which should fix this issue:

http://patchwork.ozlabs.org/patch/667393/
http://patchwork.ozlabs.org/patch/667394/

Comment 17 Miroslav Rezanina 2017-03-14 13:51:46 UTC
Fixed by rebase

Comment 19 Yongxue Hong 2017-03-23 09:22:14 UTC
The following is the step of verification:

1.Version:
Host:3.10.0-623.el7.ppc64le
Qemu:qemu-kvm-rhev-2.9.0-0.el7.mrezanin201703210848
SLOF:SLOF.noarch  20170303-1.git66d250e.el7

2.Steps to Verify:
Same to the top Description

3.Actual results:
(qemu) info balloon
balloon: actual=8192
(qemu) balloon 4096
(qemu) info balloon 
balloon: actual=4096
(qemu) balloon 1024
(qemu) info balloon 
balloon: actual=1024

Virtio balloon work well.

This bug is fixed, and change the status to verified.

Comment 20 errata-xmlrpc 2017-08-01 22:33:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2093


Note You need to log in before you can comment on or make changes to this bug.