Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.

Bug 1690832

Summary: Virtual disk hotplugged to a vm in booting process will be lost with Q35
Product: Red Hat Enterprise Linux 8 Reporter: yisun
Component: qemu-kvmAssignee: Amnon Ilan <ailan>
Status: NEW --- QA Contact: CongLi <coli>
Severity: high Docs Contact:
Priority: high    
Version: 8.0CC: coli, fjin, hhan, imammedo, jinzhao, juzhang, lmen, meili, michen, pkrempa, rbalakri, ribarry, slopezpa, virt-maint, xuzhang, yafu, yalzhang, yduan
Target Milestone: rcKeywords: Automation
Target Release: 8.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
libvirtd-debug log none

Description yisun 2019-03-20 10:28:51 UTC
Description:
Virtual disk hotplugged to a vm in booting process will be lost

How reproducible:
100% and it's a REGRESSION, not reproduced on rhel7.6
 
Version:
kernel-4.18.0-80.el8.x86_64
libvirt-5.0.0-7.module+el8+2887+effa3c42.x86_64
qemu-kvm-3.1.0-20.module+el8+2888+cdc893a8.x86_64

Steps:
1. having a shutoff vm
[root@dell-per730-66 ~]# virsh list --all
 Id   Name             State
---------------------------------
 -    avocado-vt-vm1   shut off


2. having a script as follow:
[root@dell-per730-66 ~]# cat reproduce.sh
#!/bin/sh
VM="avocado-vt-vm1"
virsh start $VM
qemu-img create -f qcow2 /tmp/img.qcow2 1G
virsh attach-disk --domain $VM --source /tmp/img.qcow2 --target vdf --subdriver qcow2
for i in {1..10}
        do
                echo Round:$i
                virsh domblklist $VM
                sleep 2
        done

3. execute the script, after a short period, the attached disk gone
[root@dell-per730-66 ~]# sh reproduce.sh
Domain avocado-vt-vm1 started

Formatting '/tmp/img.qcow2', fmt=qcow2 size=1073741824 cluster_size=65536 lazy_refcounts=off refcount_bits=16
Disk attached successfully

Round:1
 Target   Source
----------------------------------------------------------------
 vda      /var/lib/libvirt/images/RHEL-7.6-x86_64-latest.qcow2
 vdf      /tmp/img.qcow2

...

...

Round:4
 Target   Source
----------------------------------------------------------------
 vda      /var/lib/libvirt/images/RHEL-7.6-x86_64-latest.qcow2
 vdf      /tmp/img.qcow2

Round:5
 Target   Source
----------------------------------------------------------------
 vda      /var/lib/libvirt/images/RHEL-7.6-x86_64-latest.qcow2
<======== HERE THE VDF GONE

...

Round:10
 Target   Source
----------------------------------------------------------------
 vda      /var/lib/libvirt/images/RHEL-7.6-x86_64-latest.qcow2

4. when login the vm, the second disk not there
[root@dell-per730-66 ~]# virsh console avocado-vt-vm1
Connected to domain avocado-vt-vm1
Escape character is ^]

Red Hat Enterprise Linux 8.0 (Ootpa)
Kernel 4.18.0-80.el8.x86_64 on an x86_64

localhost login: root
Password:
[root@localhost ~]# lsblk
NAME          MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda           252:0    0  10G  0 disk
├─vda1        252:1    0   1G  0 part /boot
└─vda2        252:2    0   9G  0 part
  ├─rhel-root 253:0    0   8G  0 lvm  /
  └─rhel-swap 253:1    0   1G  0 lvm  [SWAP]
<===== no attached disk


5. If we attach another disk now, it won't take effect either
[root@dell-per730-66 ~]# qemu-img create -f qcow2 /tmp/new.img 1G
Formatting '/tmp/new.img', fmt=qcow2 size=1073741824 cluster_size=65536 lazy_refcounts=off refcount_bits=16

[root@dell-per730-66 ~]# virsh attach-disk --domain avocado-vt-vm1 --source /tmp/new.img --target vdb --subdriver qcow2
Disk attached successfully

[root@dell-per730-66 ~]# virsh domblklist avocado-vt-vm1
 Target   Source
----------------------------------------------------------------
 vda      /var/lib/libvirt/images/RHEL-7.6-x86_64-latest.qcow2
 vdb      /tmp/new.img
<===== there is a new vdb in the list, but not working in vm

[root@dell-per730-66 ~]# virsh console avocado-vt-vm1
Connected to domain avocado-vt-vm1
Escape character is ^]

[root@localhost ~]# lsblk
NAME          MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda           252:0    0  10G  0 disk
├─vda1        252:1    0   1G  0 part /boot
└─vda2        252:2    0   9G  0 part
  ├─rhel-root 253:0    0   8G  0 lvm  /
  └─rhel-swap 253:1    0   1G  0 lvm  [SWAP]
<====== in vm, the disk not displayed


Additional info:
1. if vm booting done, the attach will work.
2. with the same vm img, the issue not reproduced on rhel7.6 host (kernel-3.10.0-957.10.1.el7.x86_64, libvirt-4.5.0-10.virtcov.el7_6.6.x86_64, qemu-kvm-rhev-2.12.0-18.el7_6.3.x86_64), so seems not a vm os issue (vm os info: kernel-4.18.0-80.el8.x86_64)

Expected result:
Attached disk should be working after vm booted

Actual resutl:
Attached disk gone and newly attached disk after boot won't work either.

Comment 2 Peter Krempa 2019-03-20 10:37:00 UTC
Please also attach the debug log.

Comment 4 yisun 2019-03-20 10:55:03 UTC
Created attachment 1546005 [details]
libvirtd-debug log

Comment 5 Peter Krempa 2019-03-20 11:25:20 UTC
I've extracted just the qemu monitor interactions happening in the log file using:

grep 'QEMU_MONITOR_[SEND|RECV]' libvirtd-debug.log

I've also trimmed the log starting from the time when the disk was plugged in:


2019-03-20 10:43:13.145+0000: 9574: info : qemuMonitorSend:1081 : QEMU_MONITOR_SEND_MSG: mon=0x7f067002f7d0 msg={"execute":"human-monitor-command","arguments":{"command-line":"drive_add dummy file=/tmp/img.qcow2,format=qcow2,if=none,id=drive-virtio-disk5"},"id":"libvirt-21"}
2019-03-20 10:43:13.147+0000: 9570: info : qemuMonitorJSONIOProcessLine:216 : QEMU_MONITOR_RECV_REPLY: mon=0x7f067002f7d0 reply={"return": "OK\r\n", "id": "libvirt-21"}
2019-03-20 10:43:13.148+0000: 9574: info : qemuMonitorSend:1081 : QEMU_MONITOR_SEND_MSG: mon=0x7f067002f7d0 msg={"execute":"device_add","arguments":{"driver":"virtio-blk-pci","scsi":"off","bus":"pci.1","addr":"0x0","drive":"drive-virtio-disk5","id":"virtio-disk5"},"id":"libvirt-22"}
2019-03-20 10:43:13.152+0000: 9570: info : qemuMonitorJSONIOProcessLine:216 : QEMU_MONITOR_RECV_REPLY: mon=0x7f067002f7d0 reply={"return": {}, "id": "libvirt-22"}
2019-03-20 10:43:13.152+0000: 9574: info : qemuMonitorSend:1081 : QEMU_MONITOR_SEND_MSG: mon=0x7f067002f7d0 msg={"execute":"qom-list","arguments":{"path":"/machine/peripheral"},"id":"libvirt-23"}
2019-03-20 10:43:13.154+0000: 9570: info : qemuMonitorJSONIOProcessLine:216 : QEMU_MONITOR_RECV_REPLY: mon=0x7f067002f7d0 reply={"return": [{"name": "type", "type": "string"}, {"name": "video0", "type": "child<qxl-vga>"}, {"name": "virtio-serial0", "type": "child<virtio-serial-pci>"}, {"name": "pci.8", "type": "child<pcie-pci-bridge>"}, {"name": "pci.7", "type": "child<pcie-root-port>"}, {"name": "pci.6", "type": "child<pcie-root-port>"}, {"name": "pci.5", "type": "child<pcie-root-port>"}, {"name": "pci.4", "type": "child<pcie-root-port>"}, {"name": "pci.3", "type": "child<pcie-root-port>"}, {"name": "pci.2", "type": "child<pcie-root-port>"}, {"name": "pci.1", "type": "child<pcie-root-port>"}, {"name": "channel1", "type": "child<virtserialport>"}, {"name": "channel0", "type": "child<virtserialport>"}, {"name": "balloon0", "type": "child<virtio-balloon-pci>"}, {"name": "virtio-disk5", "type": "child<virtio-blk-pci>"}, {"name": "input0", "type": "child<usb-tablet>"}, {"name": "redir1", "type": "child<usb-redir>"}, {"name": "redir0", "type": "child<usb-redir>"}, {"name": "usb", "type": "child<qemu-xhci>"}, {"name": "sound0-codec0", "type": "child<hda-duplex>"}, {"name": "rng0", "type": "child<virtio-rng-pci>"}, {"name": "sound0", "type": "child<ich9-intel-hda>"}, {"name": "serial0", "type": "child<isa-serial>"}, {"name": "virtio-disk0", "type": "child<virtio-blk-pci>"}], "id": "libvirt-23"}
2019-03-20 10:43:21.012+0000: 9570: info : qemuMonitorJSONIOProcessLine:211 : QEMU_MONITOR_RECV_EVENT: mon=0x7f067002f7d0 event={"timestamp": {"seconds": 1553078601, "microseconds": 11751}, "event": "DEVICE_DELETED", "data": {"path": "/machine/peripheral/virtio-disk5/virtio-backend"}}
2019-03-20 10:43:21.020+0000: 9570: info : qemuMonitorJSONIOProcessLine:211 : QEMU_MONITOR_RECV_EVENT: mon=0x7f067002f7d0 event={"timestamp": {"seconds": 1553078601, "microseconds": 20526}, "event": "DEVICE_DELETED", "data": {"device": "virtio-disk5", "path": "/machine/peripheral/virtio-disk5"}}
2019-03-20 10:43:21.021+0000: 9697: info : qemuMonitorSend:1081 : QEMU_MONITOR_SEND_MSG: mon=0x7f067002f7d0 msg={"execute":"human-monitor-command","arguments":{"command-line":"drive_del drive-virtio-disk5"},"id":"libvirt-24"}
2019-03-20 10:43:21.022+0000: 9570: info : qemuMonitorJSONIOProcessLine:216 : QEMU_MONITOR_RECV_REPLY: mon=0x7f067002f7d0 reply={"return": "Device 'drive-virtio-disk5' not found\r\n", "id": "libvirt-24"}
2019-03-20 10:43:25.812+0000: 9570: info : qemuMonitorJSONIOProcessLine:211 : QEMU_MONITOR_RECV_EVENT: mon=0x7f067002f7d0 event={"timestamp": {"seconds": 1553078605, "microseconds": 812152}, "event": "VSERPORT_CHANGE", "data": {"open": true, "id": "channel0"}}

As you can see, after starting the VM the disk is attached successfully (according to the returned value). After 8 seconds qemu emits DEVICE_DELETED (without being asked by libvirt) for the disk that was just attached and libvirt then removes the backend. After 4 more seconds then something opens the virtio channel0.

This means that libvirt correctly responded to the DEVICE_DELETED event after it was delivered. As the device detach does not necessarily have to be initiated from the host side it's possible that either the guest OS (or firmware) initiated the unplug or there is a bug in qemu.

If you are sure that the guest OS does not initiate the unplug, please move this bug to qemu-kvm-rhev component.

Comment 6 yisun 2019-03-21 02:01:23 UTC
(In reply to Peter Krempa from comment #5)
> ...
> 
> As you can see, after starting the VM the disk is attached successfully
> (according to the returned value). After 8 seconds qemu emits DEVICE_DELETED
> (without being asked by libvirt) for the disk that was just attached and
> libvirt then removes the backend. After 4 more seconds then something opens
> the virtio channel0.
> 
> This means that libvirt correctly responded to the DEVICE_DELETED event
> after it was delivered. As the device detach does not necessarily have to be
> initiated from the host side it's possible that either the guest OS (or
> firmware) initiated the unplug or there is a bug in qemu.
> 
> If you are sure that the guest OS does not initiate the unplug, please move
> this bug to qemu-kvm-rhev component.

Thanks Peter
The case failed before operate in vm, just booting, attach, failed. So nothing executed in vm os intentionally. And same step passed with same vm img on rhel7.6 host. So I doubt there is something wrong on host side.

Comment 8 Han Han 2019-03-21 02:28:45 UTC
(In reply to yisun from comment #0)
> Description:
> Virtual disk hotplugged to a vm in booting process will be lost
> 
> How reproducible:
> 100% and it's a REGRESSION, not reproduced on rhel7.6
>  
> Version:
> kernel-4.18.0-80.el8.x86_64
> libvirt-5.0.0-7.module+el8+2887+effa3c42.x86_64
> qemu-kvm-3.1.0-20.module+el8+2888+cdc893a8.x86_64
> 
> Steps:
> 1. having a shutoff vm
> [root@dell-per730-66 ~]# virsh list --all
>  Id   Name             State
> ---------------------------------
>  -    avocado-vt-vm1   shut off
> 
> 
> 2. having a script as follow:
> [root@dell-per730-66 ~]# cat reproduce.sh
> #!/bin/sh
> VM="avocado-vt-vm1"
> virsh start $VM
> qemu-img create -f qcow2 /tmp/img.qcow2 1G
> virsh attach-disk --domain $VM --source /tmp/img.qcow2 --target vdf
> --subdriver qcow2
> for i in {1..10}
>         do
>                 echo Round:$i
>                 virsh domblklist $VM
>                 sleep 2
>         done
> 
> 3. execute the script, after a short period, the attached disk gone
> [root@dell-per730-66 ~]# sh reproduce.sh
> Domain avocado-vt-vm1 started
> 
> Formatting '/tmp/img.qcow2', fmt=qcow2 size=1073741824 cluster_size=65536
> lazy_refcounts=off refcount_bits=16
> Disk attached successfully
> 
> Round:1
>  Target   Source
> ----------------------------------------------------------------
>  vda      /var/lib/libvirt/images/RHEL-7.6-x86_64-latest.qcow2
>  vdf      /tmp/img.qcow2
> 
> ...
> 
> ...
> 
> Round:4
>  Target   Source
> ----------------------------------------------------------------
>  vda      /var/lib/libvirt/images/RHEL-7.6-x86_64-latest.qcow2
>  vdf      /tmp/img.qcow2
> 
> Round:5
>  Target   Source
> ----------------------------------------------------------------
>  vda      /var/lib/libvirt/images/RHEL-7.6-x86_64-latest.qcow2
> <======== HERE THE VDF GONE
> 
> ...
> 
> Round:10
>  Target   Source
> ----------------------------------------------------------------
>  vda      /var/lib/libvirt/images/RHEL-7.6-x86_64-latest.qcow2
> 
> 4. when login the vm, the second disk not there
> [root@dell-per730-66 ~]# virsh console avocado-vt-vm1
> Connected to domain avocado-vt-vm1
> Escape character is ^]
> 
> Red Hat Enterprise Linux 8.0 (Ootpa)
> Kernel 4.18.0-80.el8.x86_64 on an x86_64
> 
> localhost login: root
> Password:
> [root@localhost ~]# lsblk
> NAME          MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
> vda           252:0    0  10G  0 disk
> ├─vda1        252:1    0   1G  0 part /boot
> └─vda2        252:2    0   9G  0 part
>   ├─rhel-root 253:0    0   8G  0 lvm  /
>   └─rhel-swap 253:1    0   1G  0 lvm  [SWAP]
> <===== no attached disk
> 
> 
> 5. If we attach another disk now, it won't take effect either
> [root@dell-per730-66 ~]# qemu-img create -f qcow2 /tmp/new.img 1G
> Formatting '/tmp/new.img', fmt=qcow2 size=1073741824 cluster_size=65536
> lazy_refcounts=off refcount_bits=16
> 
> [root@dell-per730-66 ~]# virsh attach-disk --domain avocado-vt-vm1 --source
> /tmp/new.img --target vdb --subdriver qcow2
> Disk attached successfully
> 
> [root@dell-per730-66 ~]# virsh domblklist avocado-vt-vm1
>  Target   Source
> ----------------------------------------------------------------
>  vda      /var/lib/libvirt/images/RHEL-7.6-x86_64-latest.qcow2
>  vdb      /tmp/new.img
> <===== there is a new vdb in the list, but not working in vm
> 
> [root@dell-per730-66 ~]# virsh console avocado-vt-vm1
> Connected to domain avocado-vt-vm1
> Escape character is ^]
> 
> [root@localhost ~]# lsblk
> NAME          MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
> vda           252:0    0  10G  0 disk
> ├─vda1        252:1    0   1G  0 part /boot
> └─vda2        252:2    0   9G  0 part
>   ├─rhel-root 253:0    0   8G  0 lvm  /
>   └─rhel-swap 253:1    0   1G  0 lvm  [SWAP]
> <====== in vm, the disk not displayed
> 
> 
> Additional info:
> 1. if vm booting done, the attach will work.
> 2. with the same vm img, the issue not reproduced on rhel7.6 host
> (kernel-3.10.0-957.10.1.el7.x86_64, libvirt-4.5.0-10.virtcov.el7_6.6.x86_64,
> qemu-kvm-rhev-2.12.0-18.el7_6.3.x86_64), so seems not a vm os issue (vm os
> info: kernel-4.18.0-80.el8.x86_64)
> 
> Expected result:
> Attached disk should be working after vm booted
> 
> Actual resutl:
> Attached disk gone and newly attached disk after boot won't work either.

I think your are likely to hit this issue:
https://www.redhat.com/archives/libvirt-users/2019-March/msg00008.html

Comment 9 yisun 2019-03-21 02:37:34 UTC
(In reply to Han Han from comment #8)
> (In reply to yisun from comment #0)
> > Description:
> > Virtual disk hotplugged to a vm in booting process will be lost
> > 
> > How reproducible:
> > 100% and it's a REGRESSION, not reproduced on rhel7.6
> >  
> > Version:
> > kernel-4.18.0-80.el8.x86_64
> > libvirt-5.0.0-7.module+el8+2887+effa3c42.x86_64
> > qemu-kvm-3.1.0-20.module+el8+2888+cdc893a8.x86_64
> > 
> > Steps:
> > 1. having a shutoff vm
> > [root@dell-per730-66 ~]# virsh list --all
> >  Id   Name             State
> > ---------------------------------
> >  -    avocado-vt-vm1   shut off
> > 
> > 
> > 2. having a script as follow:
> > [root@dell-per730-66 ~]# cat reproduce.sh
> > #!/bin/sh
> > VM="avocado-vt-vm1"
> > virsh start $VM
> > qemu-img create -f qcow2 /tmp/img.qcow2 1G
> > virsh attach-disk --domain $VM --source /tmp/img.qcow2 --target vdf
> > --subdriver qcow2
> > for i in {1..10}
> >         do
> >                 echo Round:$i
> >                 virsh domblklist $VM
> >                 sleep 2
> >         done
> > 
> > 3. execute the script, after a short period, the attached disk gone
> > [root@dell-per730-66 ~]# sh reproduce.sh
> > Domain avocado-vt-vm1 started
> > 
> > Formatting '/tmp/img.qcow2', fmt=qcow2 size=1073741824 cluster_size=65536
> > lazy_refcounts=off refcount_bits=16
> > Disk attached successfully
> > 
> > Round:1
> >  Target   Source
> > ----------------------------------------------------------------
> >  vda      /var/lib/libvirt/images/RHEL-7.6-x86_64-latest.qcow2
> >  vdf      /tmp/img.qcow2
> > 
> > ...
> > 
> > ...
> > 
> > Round:4
> >  Target   Source
> > ----------------------------------------------------------------
> >  vda      /var/lib/libvirt/images/RHEL-7.6-x86_64-latest.qcow2
> >  vdf      /tmp/img.qcow2
> > 
> > Round:5
> >  Target   Source
> > ----------------------------------------------------------------
> >  vda      /var/lib/libvirt/images/RHEL-7.6-x86_64-latest.qcow2
> > <======== HERE THE VDF GONE
> > 
> > ...
> > 
> > Round:10
> >  Target   Source
> > ----------------------------------------------------------------
> >  vda      /var/lib/libvirt/images/RHEL-7.6-x86_64-latest.qcow2
> > 
> > 4. when login the vm, the second disk not there
> > [root@dell-per730-66 ~]# virsh console avocado-vt-vm1
> > Connected to domain avocado-vt-vm1
> > Escape character is ^]
> > 
> > Red Hat Enterprise Linux 8.0 (Ootpa)
> > Kernel 4.18.0-80.el8.x86_64 on an x86_64
> > 
> > localhost login: root
> > Password:
> > [root@localhost ~]# lsblk
> > NAME          MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
> > vda           252:0    0  10G  0 disk
> > ├─vda1        252:1    0   1G  0 part /boot
> > └─vda2        252:2    0   9G  0 part
> >   ├─rhel-root 253:0    0   8G  0 lvm  /
> >   └─rhel-swap 253:1    0   1G  0 lvm  [SWAP]
> > <===== no attached disk
> > 
> > 
> > 5. If we attach another disk now, it won't take effect either
> > [root@dell-per730-66 ~]# qemu-img create -f qcow2 /tmp/new.img 1G
> > Formatting '/tmp/new.img', fmt=qcow2 size=1073741824 cluster_size=65536
> > lazy_refcounts=off refcount_bits=16
> > 
> > [root@dell-per730-66 ~]# virsh attach-disk --domain avocado-vt-vm1 --source
> > /tmp/new.img --target vdb --subdriver qcow2
> > Disk attached successfully
> > 
> > [root@dell-per730-66 ~]# virsh domblklist avocado-vt-vm1
> >  Target   Source
> > ----------------------------------------------------------------
> >  vda      /var/lib/libvirt/images/RHEL-7.6-x86_64-latest.qcow2
> >  vdb      /tmp/new.img
> > <===== there is a new vdb in the list, but not working in vm
> > 
> > [root@dell-per730-66 ~]# virsh console avocado-vt-vm1
> > Connected to domain avocado-vt-vm1
> > Escape character is ^]
> > 
> > [root@localhost ~]# lsblk
> > NAME          MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
> > vda           252:0    0  10G  0 disk
> > ├─vda1        252:1    0   1G  0 part /boot
> > └─vda2        252:2    0   9G  0 part
> >   ├─rhel-root 253:0    0   8G  0 lvm  /
> >   └─rhel-swap 253:1    0   1G  0 lvm  [SWAP]
> > <====== in vm, the disk not displayed
> > 
> > 
> > Additional info:
> > 1. if vm booting done, the attach will work.
> > 2. with the same vm img, the issue not reproduced on rhel7.6 host
> > (kernel-3.10.0-957.10.1.el7.x86_64, libvirt-4.5.0-10.virtcov.el7_6.6.x86_64,
> > qemu-kvm-rhev-2.12.0-18.el7_6.3.x86_64), so seems not a vm os issue (vm os
> > info: kernel-4.18.0-80.el8.x86_64)
> > 
> > Expected result:
> > Attached disk should be working after vm booted
> > 
> > Actual resutl:
> > Attached disk gone and newly attached disk after boot won't work either.
> 
> I think your are likely to hit this issue:
> https://www.redhat.com/archives/libvirt-users/2019-March/msg00008.html

Yes, same behavior, thx! and when problem happened, attaching another disk after vm booting won't take effect either, that would cause more troubles.

Comment 10 Peter Krempa 2019-03-21 15:17:42 UTC
(In reply to yisun from comment #9)

[...]

> > > Actual resutl:
> > > Attached disk gone and newly attached disk after boot won't work either.
> > 
> > I think your are likely to hit this issue:
> > https://www.redhat.com/archives/libvirt-users/2019-March/msg00008.html
> 
> Yes, same behavior, thx! and when problem happened, attaching another disk
> after vm booting won't take effect either, that would cause more troubles.

I don't think it's the same behaviour. According to that email the disk is still visible in 'virsh domblklist' and attempting to detach it times out while the disk is still in the XML. The description+log file of this bug hints to a problem when a non-libvirt initiated DEVICE_DELETED event is delivered to libvirt and thus the disk is detached.

I'm moving this to qemu per comment 6/7.

Comment 12 CongLi 2019-03-22 03:31:38 UTC
Hi yisun,

Could you please specify the exact time to hotplug the device when boot guest?
There would be different behaviors in different time during boot.

Thanks.

Comment 13 yisun 2019-03-22 06:44:21 UTC
(In reply to CongLi from comment #12)
> Hi yisun,
> 
> Could you please specify the exact time to hotplug the device when boot
> guest?
> There would be different behaviors in different time during boot.
> 
> Thanks.

Change the script to log the time between vm start and disk attachemet
During the test, it shows the attachment issued in 239ms after vm start.

[root@localhost ~]# cat reproduce.sh
#!/bin/sh
VM="avocado-vt-vm1"
time_start_vm=$[$(date +%s%N)/1000000]
virsh start $VM
qemu-img create -f qcow2 /tmp/img.qcow2 1G
time_attach_disk=$[$(date +%s%N)/1000000]
spend_time=$[$time_attach_disk-$time_start_vm]
echo "* $spend_time * ms passed between vm start and disk attachment"
virsh attach-disk --domain $VM --source /tmp/img.qcow2 --target vdf --subdriver qcow2
for i in {1..2}
        do
                echo Round:$i
		sleep 5
                virsh domblklist $VM
        done


[root@localhost ~]# sh reproduce.sh

Domain avocado-vt-vm1 started

Formatting '/tmp/img.qcow2', fmt=qcow2 size=1073741824 cluster_size=65536 lazy_refcounts=off refcount_bits=16

* 239 * ms passed between vm start and disk attachment

Disk attached successfully

Round:1
 Target   Source
----------------------------------------------------------------
 vda      /var/lib/libvirt/images/RHEL-8.0-x86_64-latest.qcow2
 vdf      /tmp/img.qcow2

Round:2
 Target   Source
----------------------------------------------------------------
 vda      /var/lib/libvirt/images/RHEL-8.0-x86_64-latest.qcow2

Comment 18 John Ferlan 2019-03-25 21:55:32 UTC
Not sure if related; however, looking for other instances of hotplug issues while booting I see bz 1678529 has a fairly similar footprint related to while booting a DEVICE_DELETE event occurs unexpectedly.

Not my area of expertise, but I do see changes for hotplug handling while realizing somewhat recently, so I figured I'd ask if there's any relationship to this bz before assigning it possibly elsewhere. I also see bz 1684022 has a different/worse result to what appears to be a similar action.

Comment 19 Igor Mammedov 2019-03-26 13:02:58 UTC
(In reply to John Ferlan from comment #18)
> Not sure if related; however, looking for other instances of hotplug issues
> while booting I see bz 1678529 has a fairly similar footprint related to
> while booting a DEVICE_DELETE event occurs unexpectedly.

From logs above it seems that the issue is reproduced only on Q35 machine
which should use native pcie hotplug, it's hard to guess what could be
the reason for DEVICE_DELETE event.

I'd suggest to debug where from comes device deleted event exactly (call stack) in QEMU.
Alternatively one can just bisect upstream qemu to find the offending commit.
 
> Not my area of expertise, but I do see changes for hotplug handling while
> realizing somewhat recently, so I figured I'd ask if there's any
Most of the changes were quite recent so I'd say it's unlikely that they made it
way in RHEL versions.

> relationship to this bz before assigning it possibly elsewhere. I also see
> bz 1684022 has a different/worse result to what appears to be a similar
> action.
that's totally unrelated, so far my educated guest is that's a guest kernel mm issue
and in no way related disk hotplug.

Comment 20 yisun 2019-03-28 09:20:03 UTC
Just recalled that our rhel7 auto jobs used pc-i440fx-rhel7.6.0 as default machine type, but rhel8 jobs use pc-q35-rhel7.6.0 as default machine type.
So I tried this scenario with q35 machine type on rhel7.6.z environment and it also reproduced, so it's not a regression but a q35 specific issue. I'll remove the keyword "regression" due to this.

[root@ibm-x3250m5-04 images]# cat reproduce.sh
#!/bin/sh
VM="avocado-vt-vm1"
time_start_vm=$[$(date +%s%N)/1000000]
virsh start $VM
qemu-img create -f qcow2 /tmp/img.qcow2 1G
time_attach_disk=$[$(date +%s%N)/1000000]
spend_time=$[$time_attach_disk-$time_start_vm]
echo "* $spend_time * ms passed between vm start and disk attachment"
virsh attach-disk --domain $VM --source /tmp/img.qcow2 --target vdf --subdriver qcow2
for i in {1..2}
        do
                echo Round:$i
		sleep 5
                virsh domblklist $VM
        done


[root@ibm-x3250m5-04 images]# rpm -qa | egrep "libvirt-4|qemu-kvm-rhev"
libvirt-4.5.0-10.el7_6.7.x86_64
qemu-kvm-rhev-2.12.0-18.el7_6.3.x86_64


[root@ibm-x3250m5-04 images]# virsh dumpxml avocado-vt-vm1 | grep q35
    <type arch='x86_64' machine='pc-q35-rhel7.6.0'>hvm</type>


[root@ibm-x3250m5-04 images]# sh reproduce.sh
Domain avocado-vt-vm1 started

Formatting '/tmp/img.qcow2', fmt=qcow2 size=1073741824 cluster_size=65536 lazy_refcounts=off refcount_bits=16
* 204 * ms passed between vm start and disk attachment
Disk attached successfully

Round:1
Target     Source
------------------------------------------------
vda        /var/lib/libvirt/images/RHEL-8.0-x86_64-latest.qcow2
vdf        /tmp/img.qcow2

Round:2
Target     Source
------------------------------------------------
vda        /var/lib/libvirt/images/RHEL-8.0-x86_64-latest.qcow2

Comment 21 Sergio Lopez 2019-04-09 11:18:49 UTC
I've been digging into this issue. Both "device_add" and "device_del" rely on the hotplug infrastructure. In the case of PCIe devices, the hotplug is expected to be done in coordination with the Guest OS, but in this test the addition of the new device is executed before Linux has loaded and initialized the PCIe root devices, so the ABP (Attention Button Press) event is lost. When Linux initializes the devices, QEMU sees a populated slot with powered-off device, and removes it, triggering the DEVICE_DEL event.

For PCI devices, hotplug works quite differently, having been hacked up into ACPI.

I'm not quite familiar with the PCIe Hotplug Specification, but what QEMU does here seems quite reasonable to me. If we really need this to work for PCIe the same way it did for PCI, we're probably going to need to justify with strong arguments.