Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1299875 - system_reset should clear pending request for error (IDE)
Summary: system_reset should clear pending request for error (IDE)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm
Version: 7.2
Hardware: x86_64
OS: Windows
high
medium
Target Milestone: rc
: ---
Assignee: John Snow
QA Contact: aihua liang
URL:
Whiteboard:
Depends On: 1281713
Blocks: 1299876 1393042
TreeView+ depends on / blocked
 
Reported: 2016-01-19 13:05 UTC by Ademar Reis
Modified: 2017-08-01 17:46 UTC (History)
14 users (show)

Fixed In Version: qemu-kvm-1.5.3-137.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1281713
: 1299876 1393042 (view as bug list)
Environment:
Last Closed: 2017-08-01 17:46:48 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:1856 normal SHIPPED_LIVE Moderate: qemu-kvm security, bug fix, and enhancement update 2017-08-01 18:03:36 UTC

Description Ademar Reis 2016-01-19 13:05:40 UTC
+++ This bug was initially created as a clone of Bug #1281713 +++

Description of problem:
qemu-kvm quit with Segmentation fault after execute system_reset when no space left on host.

Version-Release number of selected component (if applicable):
qemu-img-0.12.1.2-2.481.el6.x86_64
qemu-kvm-tools-0.12.1.2-2.481.el6.x86_64
qemu-kvm-0.12.1.2-2.481.el6.x86_64
qemu-guest-agent-0.12.1.2-2.481.el6.x86_64
qemu-kvm-debuginfo-0.12.1.2-2.481.el6.x86_64
2.6.32-583.el6.x86_64

How reproducible:
70%

Steps to Reproduce:
1.Create a 25G win2012.qcow2 image and install a windows2012r2 guest.
2.In guest located filesystem, make it out of space by copy guest image several times until no space left on device prompt. Launch guest by qemu-kvm command:
/usr/libexec/qemu-kvm -name win2012 -m 2048 \
	-cpu Opteron_G4 \
	-smp 1,cores=1,threads=2,sockets=2,maxcpus=4 \
	 -vga qxl\
	-serial unix:/tmp/m,server,nowait \
	-drive file=win2012-64r2-virtio-scsi.qcow2,if=none,id=drive-scsi-disk0,format=qcow2,cache=none,werror=stop,rerror=stop -device virtio-scsi-pci,id=scsi0 -device scsi-hd,drive=drive-scsi-disk0,bus=scsi0.0,scsi-id=0,lun=0,id=scsi-disk0,bootindex=1 \
	-monitor stdio \
	-usb -device usb-kbd,id=input0 \
	-vnc :1

3. Interact with guest by browsing internet or other things until you see "block I/O error in device 'ide0-hd0': No space left on device (28)" prompt from qemu-kvm monitor(Prompt usually happen within 5 minutes), input system_reset in qemu monitor. And Segmentation fault will happen.

Actual results:
qemu-kvm quit with Segmentation fault after execute system_reset

Expected results:
qemu-kvm process should still alive and guest system can be reset without error

Additional info:
Stack info:
Core was generated by `/usr/libexec/qemu-kvm -name win2012 -m 2048 -cpu SandyBridge -smp 2,cores=1,thr'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007f85d1b04a90 in ?? ()
(gdb) bt
#0  0x00007f85d1b04a90 in ?? ()
#1  0x00007f85d03f5aee in bdrv_aio_cancel (acb=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/block.c:3842
#2  0x00007f85d052d46a in ide_dma_cancel (bm=0x7f85d26e1160)
    at /usr/src/debug/qemu-kvm-0.12.1.2/hw/ide/core.c:2395
#3  0x00007f85d052d499 in ide_dma_reset (bm=0x7f85d26e1160)
    at /usr/src/debug/qemu-kvm-0.12.1.2/hw/ide/core.c:2408
#4  0x00007f85d05335ad in piix3_reset (opaque=0x7f85d26e0010)
    at /usr/src/debug/qemu-kvm-0.12.1.2/hw/ide/piix.c:124
#5  0x00007f85d03b71d2 in qemu_system_reset (report=true)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:3417
#6  0x00007f85d03dd050 in qemu_kvm_system_reset (report=true)
    at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1992
#7  0x00007f85d03dd253 in kvm_main_loop ()
    at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2272
#8  0x00007f85d03be317 in main_loop (argc=<value optimized out>, 
    argv=<value optimized out>, envp=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4273
#9  main (argc=<value optimized out>, argv=<value optimized out>, 
    envp=<value optimized out>)
    at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6731

Qemu-kvm won't quit with Segmentation fault on Opteron_G5 host but windows guest cannot be reset after system_reset.


--- Additional comment from Markus Armbruster on 2015-11-24 14:37:15 BRST ---

Can you reproduce this with a qemu-kvm built with --enable-debug?

--- Additional comment from Guo, Zhiyi on 2015-11-27 00:00:55 BRST ---

Hi,
I guess you may want to see the function call ?? or argument value has been optimized. 
Stack trace still the same as reported in description.
I have enabled --enable-debug option and rebuild the qemu-kvm. -g option has been added to compile procedure from configure file:
+ ../configure --target-list=x86_64-softmmu '--extra-ldflags=-Wl,--build-id -pie -Wl,-z,relro -Wl,-z,now' '--extra-cflags=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -fPIE -DPIE' --with-pkgversion=qemu-kvm-0.12.1.2-2.479.el6.2 --prefix=/usr --localstatedir=/var --sysconfdir=/etc --disable-strip --disable-xen --block-drv-rw-whitelist=qcow2,raw,file,host_device,host_cdrom,qed,gluster,rbd --block-drv-ro-whitelist=vmdk,vpc --disable-debug-tcg --disable-sparse --disable-sdl --disable-curses --disable-curl --disable-check-utests --disable-bluez --enable-docs --disable-vde --disable-spice --trace-backend=nop --enable-smartcard --disable-smartcard-nss --enable-mixemu
Install prefix    /usr
BIOS directory    /usr/share/qemu
binary directory  /usr/bin
local state directory   /var
Manual directory  /usr/share/man
ELF interp prefix /usr/gnemul/qemu-%M
Source path       /root/rpmbuild/BUILD/qemu-kvm-0.12.1.2
C compiler        gcc
Host C compiler   gcc
CFLAGS            -O2 -g

BR/
Zhiyi

--- Additional comment from Markus Armbruster on 2015-11-27 05:17:53 BRST ---

I can't see --enable-debug in your configure line.  I can see -O2.  You need to get one roughly like this:

../configure --target-list=x86_64-softmmu '--extra-ldflags=-Wl,--build-id -pie -Wl,-z,relro -Wl,-z,now' '--extra-cflags=-g -pipe -Wall -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -fPIE -DPIE' --with-pkgversion=qemu-kvm-0.12.1.2-2.479.el6.2 --prefix=/usr --localstatedir=/var --sysconfdir=/etc --disable-strip --disable-xen --block-drv-rw-whitelist=qcow2,raw,file,host_device,host_cdrom,qed,gluster,rbd --block-drv-ro-whitelist=vmdk,vpc --disable-debug-tcg --disable-sparse --disable-sdl --disable-curses --disable-curl --disable-check-utests --disable-bluez --enable-docs --disable-vde --disable-spice --trace-backend=nop --enable-smartcard --disable-smartcard-nss --enable-mixemu

Please try again :)

--- Additional comment from Guo, Zhiyi on 2015-11-27 09:40:59 BRST ---

(In reply to Markus Armbruster from comment #4)
> I can't see --enable-debug in your configure line.  I can see -O2.  You need
> to get one roughly like this:
> 
> ../configure --target-list=x86_64-softmmu '--extra-ldflags=-Wl,--build-id
> -pie -Wl,-z,relro -Wl,-z,now' '--extra-cflags=-g -pipe -Wall -fexceptions
> -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -fPIE -DPIE'
> --with-pkgversion=qemu-kvm-0.12.1.2-2.479.el6.2 --prefix=/usr
> --localstatedir=/var --sysconfdir=/etc --disable-strip --disable-xen
> --block-drv-rw-whitelist=qcow2,raw,file,host_device,host_cdrom,qed,gluster,
> rbd --block-drv-ro-whitelist=vmdk,vpc --disable-debug-tcg --disable-sparse
> --disable-sdl --disable-curses --disable-curl --disable-check-utests
> --disable-bluez --enable-docs --disable-vde --disable-spice
> --trace-backend=nop --enable-smartcard --disable-smartcard-nss
> --enable-mixemu
> 
> Please try again :)

Stack trace with none optimized code: 
(gdb) bt
#0  0x00007f1e571cfb10 in ?? ()
#1  0x00007f1e55ebd5ed in bdrv_aio_cancel_async (acb=0x7f1e571cfc10) at /usr/src/debug/qemu-kvm-0.12.1.2/block.c:3876
#2  0x00007f1e55ebd499 in bdrv_aio_cancel (acb=0x7f1e571cfc10) at /usr/src/debug/qemu-kvm-0.12.1.2/block.c:3842
#3  0x00007f1e56008f37 in ide_dma_cancel (bm=0x7f1e57dac160) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/ide/core.c:2395
#4  0x00007f1e56008f5d in ide_dma_reset (bm=0x7f1e57dac160) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/ide/core.c:2408
#5  0x00007f1e5600c755 in piix3_reset (opaque=0x7f1e57dab010) at /usr/src/debug/qemu-kvm-0.12.1.2/hw/ide/piix.c:124
#6  0x00007f1e55e6765b in qemu_system_reset (report=true) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:3417
#7  0x00007f1e55e990ad in qemu_kvm_system_reset (report=true) at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:1992
#8  0x00007f1e55e9997d in kvm_main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/qemu-kvm.c:2272
#9  0x00007f1e55e683ba in main_loop () at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:4273
#10 0x00007f1e55e6d451 in main (argc=24, argv=0x7fff7f33ac18, envp=0x7fff7f33ace0) at /usr/src/debug/qemu-kvm-0.12.1.2/vl.c:6731

--- Additional comment from Markus Armbruster on 2015-11-27 11:17:42 BRST ---

Aha!  acb->aiocb_info->cancel_async seems to be garbage.  Hunch: use after free?  Chimes with your report that Opteron_G5 fails differently...  Please reproduce with your debug build of qemu-kvm under valgrind, and capture valgrind's report.

--- Additional comment from Guo, Zhiyi on 2015-12-01 06:32 BRST ---

Log generated on Valgrind 3.11.0, Valgrind 3.8.1 will core dump under same steps

--- Additional comment from Guo, Zhiyi on 2015-12-01 07:18 BRST ---



--- Additional comment from Markus Armbruster on 2015-12-01 08:01:26 BRST ---

valgrind is reporting a huge number of unrelated issues, probably in part because we lack upstream patches to suppress false positives.  It hits a cutoff and stops reporting some time before the crash.  Please try again with --error-limit=no.

Additional question: is qemu-kvm-rhev affected as well?

--- Additional comment from Guo, Zhiyi on 2015-12-02 06:09 BRST ---

Issue also can be reproduced on rhel7.2 intel skylake host with rhev:
kernel:3.10.0-334.el7.x86_64
qemu-kvm-rhev-2.3.0-31.el7.x86_64
qemu-kvm-rhev-debuginfo-2.3.0-31.el7.x86_64
qemu-img-rhev-2.3.0-31.el7.x86_64
qemu-kvm-tools-rhev-2.3.0-31.el7.x86_64
qemu-kvm-common-rhev-2.3.0-31.el7.x86_64

Attachment include valgrind log reproduced on rhel6.7 and rhel7.2. rhev packages have been compiled with -g and without -O2 optimize. valgrind log generate with option --error-limit=no

--- Additional comment from Guo, Zhiyi on 2015-12-02 06:12:38 BRST ---

Command used to reproduce the issue and capture valgrind log:
valgrind --log-file=valgrind.txt --error-limit=no /usr/libexec/qemu-kvm -name win2012 -m 2048 -smp 4 -cpu host -vga qxl -vnc :1 -monitor stdio -hda win2012.qcow2

--- Additional comment from Guo, Zhiyi on 2015-12-02 06:27 BRST ---

Mistake valgrind log on rhel6.7 please ignore attachment in comment 10 and use log in this comment.

Comment 2 John Snow 2016-09-20 18:52:25 UTC
Moving back to ASSIGNED as we decided to delay this to 7.4, at least for now. See comment 5 on #1299876

--js

Comment 4 Ademar Reis 2016-09-28 01:49:02 UTC
For reference, this is the cluster of BZ related to this issue: bug 1281713, bug 1299876, bug 1299875, bug 1361487, bug 1361490, bug 1361488, bug 1375520

Comment 6 aihua liang 2017-03-30 02:56:08 UTC
The issue still exist in RHEL7.4-3.10.0-618+qemu-kvm-1.5.3-134.

Comment 8 John Snow 2017-04-07 19:19:43 UTC
I'm having trouble with our build root at the moment, so I cannot re-post the patch currently.

Moving back to ASSIGNED so I can re-post the patch once the build root problem is addressed.

Thanks.

Comment 9 John Snow 2017-04-26 23:49:39 UTC
There.

Comment 10 Miroslav Rezanina 2017-04-28 04:17:37 UTC
Fix included in qemu-kvm-1.5.3-137.el7

Comment 12 aihua liang 2017-05-04 03:17:50 UTC
Verified,the problem has been resolved, so change its status to "Verified".

Test Version:
 kernel version:3.10.0-657.el7.x86_64
 qemu-kvm version:qemu-kvm-1.5.3-137.el7.x86_64

Test Steps:
 1.Full write host disk

 2.Start guest with qemu cmds bellow:
/usr/libexec/qemu-kvm \
-name 'avocado-vt-vm1'  \
-sandbox off  \
-machine pc  \
-nodefaults  \
-vga std  \
-chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/monitor-qmpmonitor1-20161219-042734-6fVMWCMz,server,nowait \
-mon chardev=qmp_id_qmpmonitor1,mode=control  \
-chardev socket,id=qmp_id_catch_monitor,path=/var/tmp/monitor-catch_monitor-20161219-042734-6fVMWCMz,server,nowait \
-mon chardev=qmp_id_catch_monitor,mode=control \
-drive id=drive_image1,if=none,snapshot=off,aio=native,cache=none,format=qcow2,file=/home/rhel74-64-virtio.qcow2 \
-device ide-hd,id=image1,drive=drive_image1,bootindex=0,bus=ide.0 \
-device virtio-net-pci,mac=9a:f2:f3:f4:f5:f6,id=id30uvBS,vectors=4,netdev=idADyVP5,bus=pci.0,addr=04  \
-netdev tap,id=idADyVP5,vhost=on \
-m 2048  \
-smp 16,maxcpus=16,cores=8,threads=1,sockets=2  \
-cpu host \
-vnc :0  \
-rtc base=localtime,clock=host,driftfix=slew  \
-boot order=cdn,once=d,menu=off,strict=off \
-enable-kvm \
-spice port=3000,ipv4,disable-ticketing \
-monitor stdio \

 3.Copy files to guest until qemu error report:
  (qemu) block I/O error in device 'drive_image1': No space left on device (28)

 4.Check vm status
  (qemu)info status  --> VM status:paused(io-error)
 
 5.Reset vm and continue it
   (qemu)system_reset
   (qemu)c            --> VM restart successfully.

Comment 13 errata-xmlrpc 2017-08-01 17:46:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:1856


Note You need to log in before you can comment on or make changes to this bug.