Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1360971 - Offline migration failed during installing packages stage
Summary: Offline migration failed during installing packages stage
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.3
Hardware: All
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: Laurent Vivier
QA Contact: xianwang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-07-28 05:18 UTC by Min Deng
Modified: 2017-02-17 09:25 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-02-17 09:25:18 UTC


Attachments (Terms of Use)

Description Min Deng 2016-07-28 05:18:27 UTC
Description of problem:
Offline migration failed during installing packages stage
Version-Release number of selected component (if applicable):
kernel-3.10.0-478.el7.ppc64le
qemu-kvm-rhev-2.6.0-15.el7.ppc64le
qemu-kvm-rhev-debuginfo-2.6.0-15.el7.ppc64le
SLOF-20160223-5.gitdbbfda4.el7.noarch
How reproducible:
3 times
Steps to Reproduce:
1.Installed a new OS and did migration during installing packages stage
  /usr/libexec/qemu-kvm -m 8G -smp 16 -name vocado-vt-vm1 -sandbox off -M pseries-rhel7.3.0 -nodefaults -vga std -chardev socket,id=serial_id_serial0,path=/tmp/tt,server,nowait -device spapr-vty,chardev=serial_id_serial0 -device virtio-scsi-pci,id=scsi0 -device scsi-hd,id=disk,bus=scsi0.0,bootindex=1,drive=drive-disk0 -drive file=RHEL68BE.qcow2,format=qcow2,if=none,id=drive-disk0,werror=stop,rerror=stop -vnc :1 -enable-kvm -monitor stdio -uuid cbf8e8f5-6bb7-4d73-9581-d29b43aab22a -device nec-usb-xhci,id=usb1,bus=pci.0,addr=0x4 -device usb-kbd,id=input0,bus=usb1.0 -device usb-mouse,id=mouse1,bus=usb1.0 -qmp tcp:0:4444,server,nowait -device nec-usb-xhci,id=controller3 -device usb-mouse,id=usbmouse,bus=controller3.0 -device usb-kbd,id=usbkbd,bus=controller3.0 -device usb-tablet,id=usbtablet,bus=controller3.0 -device virtio-net-pci,mac=9a:d4:d5:d6:d7:d9,id=idkdMjSW,vectors=4,netdev=hostnet0,disable-legacy=off,disable-modern=on,bootindex=2 -netdev tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown -device virtio-scsi-pci,id=scsi1 -drive if=none,id=drive-scsi0-0-1-0,readonly=on,file=RHEL-6.8-20160414.0-Server-ppc64-dvd1.iso -device scsi-cd,bus=scsi1.0,drive=drive-scsi0-0-1-0,id=scsi0-0-1-0,bootindex=0

2.During packages were installing and did
  migrate -d "exec:gzip -c > tt.gz"

3.wait for completing

Actual results:
The migration failed at last 
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off 
Migration status: active
total time: 288771 milliseconds
expected downtime: 679 milliseconds
setup: 27 milliseconds
transferred ram: 6474166 kbytes
throughput: 253.29 mbps
remaining ram: 230124 kbytes
total ram: 8405312 kbytes
duplicate: 2672163 pages
skipped: 0 pages
normal: 1609522 pages
normal bytes: 6438088 kbytes
dirty sync count: 7
dirty pages rate: 5250 pages
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off 
Migration status: failed
total time: 0 milliseconds
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off 
Migration status: failed
total time: 0 milliseconds

Expected results:
The migration finish

Additional info:

Comment 3 juzhang 2016-08-01 03:01:44 UTC
Hi Wei,

Could you have a check on X86?

Best Regards,
Junyi

Comment 4 weliao 2016-08-09 05:17:32 UTC
QE tested with X86, can reproduced, but seems not 100% reproduced.
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off 
Migration status: failed
total time: 0 milliseconds


Host versions:
3.10.0-481.el7.x86_64
qemu-kvm-rhev-2.6.0-17.el7.x86_64

Comment 5 Amit Shah 2016-08-11 07:05:32 UTC
How does migration fail?  Is there any message logged?

Is the failure similar on ppc and x86?

Is it always reproducible on ppc?  And just sometimes on x86?

Comment 6 Min Deng 2016-08-11 09:05:05 UTC
(In reply to Amit Shah from comment #5)
> How does migration fail? Is there any message logged?
  QE knew it from HMP error message directly
  (qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off 
Migration status: failed
total time: 0 milliseconds 
> 
> Is the failure similar on ppc and x86?
  Yes,please see the error log's Description and comment4
> Is it always reproducible on ppc?  And just sometimes on x86?
  About 50%.

Comment 8 Laurent Vivier 2017-02-15 13:04:05 UTC
Tried 5 times on ppc64le, never happened.

Guest:
RHEL-6.8-20160414.0-Server-ppc64-dvd1.iso

Host:
qemu-kvm-rhev-2.6.0-28.el7_3.6.ppc64le
kernel-3.10.0-560.el7.ppc64le
SLOF-20160223-6.gitdbbfda4.el7.noarch

Did you check you have enough space in your directory to store all the files?

For "Basic" installation:

RHEL68BE.qcow2                            (2.3 GB)
tt.gz                                     (2.3 GB)
RHEL-6.8-20160414.0-Server-ppc64-dvd1.iso (3.3 GB)

But the sizes can depend on which packages you are installing (For instance, with a "Desktop" install the size increases to 4GB for each qcow2 and gz)

Could you try to reproduce it with the latest packages on the host and then check the available space on the host disk when it fails?

Comment 9 Min Deng 2017-02-16 07:58:38 UTC
(In reply to Laurent Vivier from comment #8)
> Tried 5 times on ppc64le, never happened.
> 
> Guest:
> RHEL-6.8-20160414.0-Server-ppc64-dvd1.iso
> 
> Host:
> qemu-kvm-rhev-2.6.0-28.el7_3.6.ppc64le
> kernel-3.10.0-560.el7.ppc64le
> SLOF-20160223-6.gitdbbfda4.el7.noarch
> 
> Did you check you have enough space in your directory to store all the files?
  Every time we install a new host so the space should be enough for testing.
> For "Basic" installation:
> 
> RHEL68BE.qcow2                            (2.3 GB)
> tt.gz                                     (2.3 GB)
> RHEL-6.8-20160414.0-Server-ppc64-dvd1.iso (3.3 GB)
  
> But the sizes can depend on which packages you are installing (For instance,
> with a "Desktop" install the size increases to 4GB for each qcow2 and gz)
> 
> Could you try to reproduce it with the latest packages on the host and then
> check the available space on the host disk when it fails?
  QE tried the bug on build for about 10 times manually but cannot reproduce the issue.
kernel-3.10.0-563.el7.ppc64le
qemu-kvm-common-rhev-2.8.0-4.el7.ppc64le
SLOF-20160223-6.gitdbbfda4.el7.noarch
  And QE also tried the build for about 5 times but not reproduce it.
qemu-kvm-rhev-2.6.0-27.el7.ppc64le.rpm

Hi Laurent,
  But it was reproducible on the old builds by more than one QE on two different platforms.And the reproducible rate was not 100% either.How do you think if QE prepare a host with the old packages for your investigation further ?

Thanks 
Min

Comment 10 Laurent Vivier 2017-02-16 08:18:25 UTC
(In reply to dengmin from comment #9)
>   But it was reproducible on the old builds by more than one QE on two
> different platforms.And the reproducible rate was not 100% either.How do you
> think if QE prepare a host with the old packages for your investigation
> further ?

Thank you Min.

I'd like you try to reproduce the problem and you check there is enough space on the disk to store the file.
You can also try to migrate to /dev/null to avoid this kind problem:
    migrate -d "exec:gzip -c > /dev/null"

If the problem is not with the disk space, I will try to reproduce it on my system with the old packages.

Thanks

Comment 11 Min Deng 2017-02-16 10:35:59 UTC
QE can reproduce on old packages,
Build info,
kernel-3.10.0-418.el7.ppc64le
qemu-kvm-rhev-2.6.0-22.el7.ppc64le
SLOF-20160223-5.gitdbbfda4.el7.noarch

Steps,please refer to description
Hostname,ibm-p8-kvm-01-qe.rhts.eng.bos.redhat.com
[root@ibm-p8-kvm-01-qe home]# ls -rlth RHEL68BE.qcow2
-rw-r--r--. 1 root root 5.6G Feb 16 05:03 RHEL68BE.qcow2
[root@ibm-p8-kvm-01-qe home]# ls -rlth tt.gz
-rw-r--r--. 1 root root 2.2G Feb 16 04:58 tt.gz
[root@ibm-p8-kvm-01-qe home]# ls -rlth RHEL-6.8-20160414.0-Server-ppc64-dvd1.iso
-rw-r--r--. 1 root root 3.3G Feb 16 04:50 RHEL-6.8-20160414.0-Server-ppc64-dvd1.iso
Actual results,
dirty sync count: 9
dirty pages rate: 418 pages
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off 
Migration status: failed
total time: 0 milliseconds
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off 
Migration status: failed
total time: 0 milliseconds
(qemu) info migrate
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: off postcopy-ram: off 
Migration status: failed
total time: 0 milliseconds

Host disk status,
Filesystem                                  1K-blocks     Used  Available Use% Mounted on
/dev/mapper/rhel_ibm--p8--kvm--01--qe-root   52403200  2665888   49737312   6% /
devtmpfs                                     61460352        0   61460352   0% /dev
tmpfs                                        64832320        0   64832320   0% /dev/shm
tmpfs                                        64832320    30592   64801728   1% /run
tmpfs                                        64832320        0   64832320   0% /sys/fs/cgroup
/dev/sdc2                                     1038336   235744     802592  23% /boot
/dev/mapper/rhel_ibm--p8--kvm--01--qe-home 3266401996 11583376 3254818620   1% /home
tmpfs                                        12966528        0   12966528   0% /run/user/0

Comment 12 Min Deng 2017-02-17 09:11:17 UTC
Per Laurent,QE narrowed down the issue build by build today.
It can be reproduced via build qemu-kvm-rhev-2.6.0-22.el7.ppc64le/qemu-kvm-rhev-2.6.0-25.el7.ppc64le
It cannot be reproduced via build qemu-kvm-rhev-2.6.0-27.el7.ppc64le any more.

Thanks

Comment 13 Laurent Vivier 2017-02-17 09:25:18 UTC
Thank you Min.

According to the changelog, I think this has been fixed in qemu-kvm-rhev-2.6.0-26.el7, by commit 42539f0 ("qemu: use bdrv_flush_all for vm_stop et al") as installation process eject the cdrom at the end of the installation:
    
    Reimplement bdrv_flush_all for vm_stop. In contrast to blk_flush_all,
    bdrv_flush_all does not have device model restrictions. This allows
    us to flush and halt unconditionally without error.
    
    This allows us to do things like migrate when we have a device with
    an open tray, but has a node that may need to be flushed, or nodes
    that aren't currently attached to any device and need to be flushed.
    
    Specifically, this allows us to migrate when we have a CDROM with
    an open tray.

This commit is also in 2.8.0 master.

So I close this BZ.


Note You need to log in before you can comment on or make changes to this bug.