Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1519071 - Fail to rebuild the reference count tables of qcow2 image on the iscsi backend
Summary: Fail to rebuild the reference count tables of qcow2 image on the iscsi backend
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: qemu-kvm
Version: ---
Hardware: All
OS: Linux
medium
low
Target Milestone: rc
: 8.1
Assignee: John Snow
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-30 03:27 UTC by yilzhang
Modified: 2019-02-22 22:11 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug


Attachments (Terms of Use)

Description yilzhang 2017-11-30 03:27:25 UTC
Description of problem:
Create an iSCSI image with lazy_refcounts=on and install guest with cache=writethrough; After installation finished, write file inside guest and kill the qemu process; After that, check the image, "qemu-img check -r all" reports lots of errors.

Version-Release number of selected component (if applicable):
Host kernel:   4.14.0-4.el7a.ppc64le
qemu-kvm-rhev: qemu-kvm-rhev-2.10.0-9.el7

How reproducible: 100%


Steps to Reproduce:
1. Create a qcow2_v3 iamge with lazy_refcounts=on
[Host]# qemu-img create -f qcow2 -o compat=1.1,lazy_refcounts=on     iscsi://10.0.0.7/iqn.2017-08.com.yilzhang:t1/0   20G
2. Install guest with ** cache=writethrough**
3. After guest installed finished, dd file and get md5 value inside guest with sync
# dd if=/dev/urandom of=file1 conv=fsync bs=1M count=512 ; md5sum file1 ; sync

4. Kill qemu-kvm in host
[Host]# kill -9 `pidof qemu-kvm`
5. Rebuilt the reference count tables
[Host]# qemu-img check -r all  iscsi://10.0.0.7/iqn.2017-08.com.yilzhang:t1/0


Actual results:
ERROR cluster 69574 refcount=0 reference=1
ERROR cluster 69575 refcount=0 reference=1
ERROR cluster 69576 refcount=0 reference=1
ERROR cluster 69577 refcount=0 reference=1
ERROR cluster 69578 refcount=0 reference=1
ERROR cluster 69579 refcount=0 reference=1
ERROR cluster 69580 refcount=0 reference=1
ERROR cluster 69581 refcount=0 reference=1
ERROR cluster 69582 refcount=0 reference=1
ERROR cluster 69583 refcount=0 reference=1
ERROR cluster 69584 refcount=0 reference=1
ERROR cluster 69585 refcount=0 reference=1
Rebuilding refcount structure
qemu-img: iSCSI Failure: SENSE KEY:ILLEGAL_REQUEST(5) ASCQ:LBA_OUT_OF_RANGE(0x2100)
ERROR writing refblock: No space left on device
qemu-img: Check failed: No space left on device
[Host]# echo $?
1


Expected results:
After step5: No errors were found on this image. Can show fragmentaion,
allocation and compressed cluster.
# qemu-img check base.qcow2
No errors were found on the image.
29877/29902 = 99.92% allocated, 0.00% fragmented, 0.00% compressed clusters
Image end offset: 1959657472



Additional info:
/usr/libexec/qemu-kvm \
-smp 8,sockets=2,cores=4,threads=1 -m 8192 \
-serial unix:/tmp/1-serial.log,server,nowait \
-nodefaults \
 -rtc base=localtime,clock=host \
 -boot menu=on \
 -monitor stdio \
\
 -device pci-bridge,id=bridge1,chassis_nr=1,bus=pci.0 \
 -device virtio-scsi-pci,bus=bridge1,addr=0x1f,id=scsi0 \
-drive file=iscsi://10.0.0.7/iqn.2017-08.com.yilzhang:t1/0,media=disk,if=none,cache=writethrough,id=drive_sysdisk,format=qcow2,werror=stop,rerror=stop \
-device scsi-hd,drive=drive_sysdisk,bus=scsi0.0,id=sysdisk,bootindex=0 \
\
-drive file=/home/yilzhang/backup/RHEL-ALT-7.4-20171030.0-Server-ppc64le-dvd1.iso,if=none,id=scsi-cd-dr0,readonly=on,format=raw,cache=none \
-device scsi-cd,id=scsi-cd0,drive=scsi-cd-dr0,bus=scsi0.0,bootindex=1 \
\
-netdev tap,id=net0,script=/etc/qemu-ifup,downscript=/etc/qemu-ifdown,vhost=on \
-device virtio-net-pci,netdev=net0,id=nic0,mac=52:54:00:c3:e7:4f,bus=bridge1,addr=0x1e

Comment 2 Ping Li 2017-11-30 05:38:51 UTC
Could you reproduce the issue on x86?

Comment 3 yilzhang 2017-11-30 05:47:20 UTC
1. NFS backend image doesn't have this issue
2. x86 and PPC both have this issue.
Version of components for x86 platform:
Host kernel:       3.10.0-799.el7.x86_64
Guest install iso: RHEL-7.5-20171107.1-Server-x86_64-dvd1.iso
qemu-kvm-rhev:     qemu-kvm-rhev-2.10.0-9.el7

Comment 4 Longxiang Lyu 2017-11-30 09:23:37 UTC
rep

Comment 5 Longxiang Lyu 2017-11-30 09:24:56 UTC
reproduced in:
kernel-3.10.0-798.el7.x86_64
qemu-kvm-rhev-2.10.0-7.el7

set qa-ack+.

Comment 6 Ademar Reis 2017-12-18 17:17:22 UTC
(In reply to yilzhang from comment #0)
> Actual results:
> ERROR cluster 69574 refcount=0 reference=1
> ERROR cluster 69575 refcount=0 reference=1
> ERROR cluster 69576 refcount=0 reference=1
> ERROR cluster 69577 refcount=0 reference=1
> ERROR cluster 69578 refcount=0 reference=1
> ERROR cluster 69579 refcount=0 reference=1
> ERROR cluster 69580 refcount=0 reference=1
> ERROR cluster 69581 refcount=0 reference=1
> ERROR cluster 69582 refcount=0 reference=1
> ERROR cluster 69583 refcount=0 reference=1
> ERROR cluster 69584 refcount=0 reference=1
> ERROR cluster 69585 refcount=0 reference=1
> Rebuilding refcount structure
> qemu-img: iSCSI Failure: SENSE KEY:ILLEGAL_REQUEST(5)
> ASCQ:LBA_OUT_OF_RANGE(0x2100)
> ERROR writing refblock: No space left on device
> qemu-img: Check failed: No space left on device
> [Host]# echo $?
> 1

I think the refcount errors are expected with lazy_refcounts in this scenario, but the "ENOSPC" error is suspicious, but I don't understand how iscsi works in this case. Fam should know.

Comment 7 Fam Zheng 2017-12-20 07:00:04 UTC
;-)

Comment 8 Fam Zheng 2017-12-20 07:17:49 UTC
As shown above, the image is probably fully written, and the iscsi LUN is therefore also full, hence the ENOSPC error. (Refcount rebuilding needs to allocate new clusters.)

Please test again with a much larger iscsi LUN (e.g 10G larger than the qcow2 image size).

Comment 9 yilzhang 2017-12-20 08:47:01 UTC
(In reply to Fam Zheng from comment #8)
> As shown above, the image is probably fully written, and the iscsi LUN is
> therefore also full, hence the ENOSPC error. (Refcount rebuilding needs to
> allocate new clusters.)
> 
> Please test again with a much larger iscsi LUN (e.g 10G larger than the
> qcow2 image size).


Result of re-testing:
The iSCSI LUN created on iSCSI target side is 31G, then I only created a 20G qcow2 image on this iSCSI LUN (that is, the iSCSI LUN is 11G larger than the qcow2 image)
And, the problem in this bug still exists.


************  Host and qemu info:  ************
Host: power9 with kernel 4.14.0-18.el7a.ppc64le
qemu-kvm: qemu-kvm-rhev-2.10.0-13.el7


Step1:
[Host]# qemu-img info  iscsi://10.0.0.7/iqn.2017-08.com.yilzhang:libiscsi/2
image: json:{"driver": "raw", "file": {"lun": "2", "portal": "10.0.0.7", "driver": "iscsi", "transport": "tcp", "target": "iqn.2017-08.com.yilzhang:libiscsi"}}
file format: raw
virtual size: 31G (33285996544 bytes)
disk size: unavailable

[Host]# qemu-img create -f qcow2 -o compat=1.1,lazy_refcounts=on   iscsi://10.0.0.7/iqn.2017-08.com.yilzhang:libiscsi/2   20G
Formatting 'iscsi://10.0.0.7/iqn.2017-08.com.yilzhang:libiscsi/2', fmt=qcow2 size=21474836480 compat=1.1 cluster_size=65536 lazy_refcounts=on refcount_bits=16

... ...
Step5:
[Host]#  qemu-img check -r all  iscsi://10.0.0.7/iqn.2017-08.com.yilzhang:libiscsi/2
ERROR cluster 65536 refcount=0 reference=1
... ...
ERROR cluster 72685 refcount=0 reference=1
ERROR cluster 72686 refcount=0 reference=1
ERROR cluster 72687 refcount=0 reference=1
Rebuilding refcount structure
qemu-img: iSCSI Failure: SENSE KEY:ILLEGAL_REQUEST(5) ASCQ:LBA_OUT_OF_RANGE(0x2100)
ERROR writing refblock: No space left on device
qemu-img: Check failed: No space left on device

Comment 10 Tingting Mao 2018-08-13 11:49:26 UTC
Reproduced this issue with qemu-kvm-rhev-2.12.0-9.el7 package.

Steps:
1.Prepare the libiscsi file in the server
# targetcli
/backstores/fileio> create lun1 /home/iscsi/lun1.img 30G
/backstores/fileio> cd /iscsi/iqn.2018-07.com.example:t1/tpg1/luns/
/iscsi/iqn.20...:t1/tpg1/luns> create /backstores/fileio/lun1

2.operate the libiscsi file in the client
2.1 create qcow2 file on the libiscsi backend with the value of lazy_refcounts option is “on”
# qemu-img create -f qcow2 -o lazy_refcounts=on,compat=1.1 iscsi://10.66.11.19/iqn.2018-07.com.example:t1/1 30G
Formatting 'iscsi://10.66.11.19/iqn.2018-07.com.example:t1/1', fmt=qcow2 size=32212254720 compat=1.1 cluster_size=65536 lazy_refcounts=on refcount_bits=16

2.2 install rhel7.6 with the cache mode is writethrough
/usr/libexec/qemu-kvm \
        -name 'rhel7.6' \
        -machine q35 \
        -nodefaults \
        -vga qxl \
        -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=writethrough,format=qcow2,file=$1 \
        -device virtio-blk-pci,id=virtio_blk_pci0,drive=drive_image1,bus=pcie.0,addr=05 \
        -drive id=drive_cd1,if=none,snapshot=off,aio=threads,cache=unsafe,media=cdrom,file=$2 \
        -device ide-cd,id=cd1,drive=drive_cd1,bus=ide.0,unit=0 \
        -monitor stdio \
        -vnc :1 \
        -m 8192 \
        -smp 8 \
        -device virtio-net-pci,mac=9a:b5:b6:b1:b5:b5,id=idMmq1jH,vectors=4,netdev=idxgXAlm,bus=pcie.0,addr=0x9  \
        -netdev tap,id=idxgXAlm \
        -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/timao/monitor-qmpmonitor1-20180220-094308-h9I6hRsI,server,nowait \
        -mon chardev=qmp_id_qmpmonitor1,mode=control  \

2.3 After installed, dd a file in the guest
# dd if=/dev/urandom of=/home/ftest bs=1M count=2048

2.4 shutdown the guest immediately after the dd finished

2.5 check the image
# qemu-img check iscsi://10.66.11.19/iqn.2018-07.com.example:t1/1
ERROR cluster 41455 refcount=0 reference=1
ERROR cluster 41456 refcount=0 reference=1
ERROR cluster 41457 refcount=0 reference=1
ERROR cluster 41458 refcount=0 reference=1
ERROR cluster 41459 refcount=0 reference=1
…...
…...
ERROR OFLAG_COPIED data cluster: l2_entry=80000000b5400000 refcount=0
ERROR OFLAG_COPIED data cluster: l2_entry=80000000b5410000 refcount=0
ERROR OFLAG_COPIED data cluster: l2_entry=80000000b5420000 refcount=0
ERROR OFLAG_COPIED data cluster: l2_entry=80000000b5430000 refcount=0
ERROR OFLAG_COPIED data cluster: l2_entry=80000000b5440000 refcount=0
9900 errors were found on the image.
Data may be corrupted, or further writes to the image may corrupt it.
46388/491520 = 9.44% allocated, 6.63% fragmented, 0.00% compressed clusters
Image end offset: 3041198080

2.6 repair the image
# qemu-img check -r all iscsi://10.66.11.19/iqn.2018-07.com.example:t1/1
ERROR cluster 41455 refcount=0 reference=1
ERROR cluster 41456 refcount=0 reference=1
ERROR cluster 41457 refcount=0 reference=1
ERROR cluster 41458 refcount=0 reference=1
ERROR cluster 41459 refcount=0 reference=1
……
…...
ERROR cluster 46401 refcount=0 reference=1
ERROR cluster 46402 refcount=0 reference=1
ERROR cluster 46403 refcount=0 reference=1
ERROR cluster 46404 refcount=0 reference=1
Rebuilding refcount structure
qemu-img: iSCSI WRITE10/16 failed at lba 62914688: SENSE KEY:ILLEGAL_REQUEST(5) ASCQ:LBA_OUT_OF_RANGE(0x2100)
ERROR writing refblock: No space left on device
qemu-img: Check failed: No space left on device


Note You need to log in before you can comment on or make changes to this bug.