Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1355683 - qemu core dump when do postcopy migration again after canceling a migration in postcopy phase
Summary: qemu core dump when do postcopy migration again after canceling a migration i...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Dr. David Alan Gilbert
QA Contact: Qianqian Zhu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-07-12 08:51 UTC by Qianqian Zhu
Modified: 2016-11-07 21:23 UTC (History)
6 users (show)

Fixed In Version: qemu-kvm-rhev-2.6.0-17.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-07 21:23:14 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:2673 normal SHIPPED_LIVE qemu-kvm-rhev bug fix and enhancement update 2016-11-08 01:06:13 UTC

Description Qianqian Zhu 2016-07-12 08:51:13 UTC
Description of problem:
Qemu core dump when do postcopy migration again after canceling a migration in postcopy phase.

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.6.0-12.el7.x86_64
kernel-3.10.0-461.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Launch src guest:
gdb /usr/libexec/qemu-kvm
(gdb) run -name linux -cpu Westmere,check -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 7bef3814-631a-48bb-bae8-2b1de75f7a13 -nodefaults -monitor stdio -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot order=c,menu=on -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive file=/nfsmount/RHEL-Server-7.3-64-virtio.qcow2,if=none,cache=writeback,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on -spice port=5901,disable-ticketing -vga qxl -global qxl-vga.revision=3 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=3C:D9:2B:09:AB:44,bus=pci.0,addr=0x3

2.Launch guest on dest host with same cmd
3.Start postcopy migration then cancel it immediately
(qemu) migrate_set_capability postcopy-ram on
(qemu) migrate -d tcp:10.73.72.55:1234
(qemu) migrate_start_postcopy
(qemu) migrate_cancel

4.Launch guest on dest host again.
5.Start postcopy migration again
(qemu) migrate -d tcp:10.73.72.55:1234
(qemu) migrate_start_postcopy

Actual results:
Qemu core dump:
(qemu) 2016-07-12T08:42:34.819057Z qemu-kvm: invalid runstate transition: 'finish-migrate' -> 'finish-migrate'

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fff567fe700 (LWP 28314)]
0x00007fffec5041d7 in raise () from /lib64/libc.so.6

Expected results:
Postcopy migration succeed

Additional info:

Comment 2 Dr. David Alan Gilbert 2016-07-15 10:12:36 UTC
Yes, I can recreate this.

It should be an unusual circumstance in practice; cancelling after postcopy has started is unsafe unless you control the destination.  If the destination hasn't started running it's OK to restart the source and try again, so libvirt could potentially do that - however, it would issue a continue to the source before retrying the migration so wouldn't hit this case.

I'll look into it.

Comment 4 Qianqian Zhu 2016-07-20 08:18:16 UTC
Test with:
qemu-kvm-rhev-2.6.0-13.el7.1355683a.x86_64
kernel-3.10.0-461.el7.x86_64

Steps:
1.Launch src guest
2.Launch guest on dest host with same cmd
3.Start postcopy migration then cancel it immediately
(qemu) migrate_set_capability postcopy-ram on
(qemu) migrate -d tcp:10.73.72.55:1234
(qemu) migrate_start_postcopy
(qemu) migrate_cancel

4.Launch guest on dest host again.
5.Start postcopy migration again
(qemu) migrate -d tcp:10.73.72.55:1234
(qemu) migrate_start_postcopy

Results:
No core dump, postcopy migration succeed and guest works well After step5.


Normal migration cancelling, succeed, but with below error:
(qemu) migrate_cancel 
(qemu) 2016-07-20T08:14:09.855908Z qemu-kvm: socket_writev_buffer: Got err=32 for (73885/18446744073709551615)

Cancelling in postcopy phase:
(qemu) 2016-07-20T08:06:34.581064Z qemu-kvm: socket_writev_buffer: Got err=32 for (131337/18446744073709551615)
2016-07-20T08:06:34.581090Z qemu-kvm: RP: Received invalid message 0x0000 length 0x0000

Comment 6 Miroslav Rezanina 2016-07-29 09:12:12 UTC
Fix included in qemu-kvm-rhev-2.6.0-17.el7

Comment 8 Qianqian Zhu 2016-08-23 05:44:48 UTC
Verified with:
qemu-kvm-rhev-2.6.0-20.el7.x86_64
kernel-3.10.0-491.el7.x86_64

Steps same as comment 4.
cli:
/usr/libexec/qemu-kvm -name linux -cpu SandyBridge -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 7bef3814-631a-48bb-bae8-2b1de75f7a13 -nodefaults -monitor stdio -rtc base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=discard -global PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot order=c,menu=on -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x6 -drive file=/mntnfs/RHEL-Server-7.3-64-virtio.qcow2,if=none,cache=none,id=drive-virtio-disk0,format=qcow2 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x8 -msg timestamp=on -spice port=5901,disable-ticketing -vga qxl -global qxl-vga.revision=3 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=3C:D9:2B:09:AB:44,bus=pci.0,addr=0x3 -qmp tcp::5555,server,nowait

Result:
Postcopy migration succeed and guest works well.(qemu) 
Cancelling with the same warning:
2016-07-20T08:06:34.581064Z qemu-kvm: socket_writev_buffer: Got err=32 for (131337/18446744073709551615)
2016-07-20T08:06:34.581090Z qemu-kvm: RP: Received invalid message 0x0000 length 0x0000

Comment 9 Qianqian Zhu 2016-08-23 05:45:38 UTC
Moving to VERIFIED as per comment 8

Comment 11 errata-xmlrpc 2016-11-07 21:23:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2673.html


Note You need to log in before you can comment on or make changes to this bug.