Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1685388 - Failed to create a internal snapshot with nbd network disk
Summary: Failed to create a internal snapshot with nbd network disk
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: qemu-kvm
Version: 8.0
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: rc
: 8.0
Assignee: Eric Blake
QA Contact: Tingting Mao
URL:
Whiteboard:
Depends On: 1695888
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-05 05:42 UTC by gaojianan
Modified: 2019-04-04 05:42 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)
This is the qemu-log for this bug. (deleted)
2019-03-05 05:42 UTC, gaojianan
no flags Details

Description gaojianan 2019-03-05 05:42:34 UTC
Created attachment 1540850 [details]
This is the qemu-log for this bug.

Description of problem:
Failed to create a snapshot for a domain with nbd network disk

Version-Release number of selected component (if applicable):
libvirt-5.0.0-4.virtcov.el8.x86_64
qemu-kvm-3.1.0-18.module+el8+2834+fa8bb6e2.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Setup a nbd server:
# qemu-nbd -t -p 10809  --format=raw /tmp/scsi

2.prepare a guest xml with nbd image:
# virsh dumpxml rhel8.0|grep 'disk t' -A8
    <disk type='network' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source protocol='nbd' tls='no'>
        <host name='localhost' port='10809'/>
      </source>
      <target dev='sda' bus='scsi'/>
      <alias name='scsi0-0-0-0'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>

3.Start the guest and take a snapshot
# virsh start rhel8.0 
Domain rhel8.0 started

# virsh qemu-monitor-command rhel8.0 --hmp savevm s1
Error while writing VM state: Invalid argument


Actual results:
Get errors when create snapshots for the guest

Expected results:
Can create the snapshot successfully

Additional info:

Comment 1 Eric Blake 2019-03-06 16:32:44 UTC
(In reply to gaojianan from comment #0)

> 3.Start the guest and take a snapshot
> # virsh start rhel8.0 
> Domain rhel8.0 started
> 
> # virsh qemu-monitor-command rhel8.0 --hmp savevm s1

qemu-monitor-command is an unsupported libvirt backdoor. Can you reproduce the problem using 'virsh snapshot-create[-as]'? If not, customers shouldn't be hitting this case.

Comment 2 gaojianan 2019-03-07 01:59:04 UTC
(In reply to Eric Blake from comment #1)
> (In reply to gaojianan from comment #0)
> 
> > 3.Start the guest and take a snapshot
> > # virsh start rhel8.0 
> > Domain rhel8.0 started
> > 
> > # virsh qemu-monitor-command rhel8.0 --hmp savevm s1
> 
> qemu-monitor-command is an unsupported libvirt backdoor. Can you reproduce
> the problem using 'virsh snapshot-create[-as]'? If not, customers shouldn't
> be hitting this case.

ok,verify it with the snapshot-create[-as],i get the same question like this:
# virsh start rhel8.0
Domain rhel8.0 started

# virsh snapshot-create-as rhel8.0 s1
error: operation failed: Failed to take snapshot: Error while writing VM state: Invalid argument

So i think customers should be hitting this case

Comment 3 Eric Blake 2019-03-07 05:11:37 UTC
(In reply to gaojianan from comment #0)

> Steps to Reproduce:
> 1.Setup a nbd server:
> # qemu-nbd -t -p 10809  --format=raw /tmp/scsi

You are exposing raw bytes over NBD,

> 
> 2.prepare a guest xml with nbd image:
> # virsh dumpxml rhel8.0|grep 'disk t' -A8
>     <disk type='network' device='disk'>
>       <driver name='qemu' type='qcow2'/>

...and asking the guest to interpret those bytes as qcow2. Are you ABSOLUTELY sure that /tmp/scsi is sized large enough? NBD does not support resizing, and the attempt to store an internal snapshot may end up requiring quite a lot of space.

> # virsh qemu-monitor-command rhel8.0 --hmp savevm s1
> Error while writing VM state: Invalid argument

Internal snapshots are not something we recommend downstream, but I suspect that the root cause here is that you are running out of space in /tmp/scsi to hold the full internal snapshot, and that the problem would be fixed once NBD gains resize support. (Upstream has mentioned the idea, but no one has submitted a formal design or patches).

Comment 4 Eric Blake 2019-03-07 05:14:03 UTC
(In reply to Eric Blake from comment #3)
> (In reply to gaojianan from comment #0)
> 
> > Steps to Reproduce:
> > 1.Setup a nbd server:
> > # qemu-nbd -t -p 10809  --format=raw /tmp/scsi
> 
> You are exposing raw bytes over NBD,
> 
> > 
> > 2.prepare a guest xml with nbd image:
> > # virsh dumpxml rhel8.0|grep 'disk t' -A8
> >     <disk type='network' device='disk'>
> >       <driver name='qemu' type='qcow2'/>
> 
> ...and asking the guest to interpret those bytes as qcow2.

Generally, until NBD gains resize, we recommend that you do the opposite:

# qemu-nbd -t -p 10809 --format=qcow2 /tmp/scsi
$ virsh ...
<disk type='network' device='disk'>
  <driver name='qemu' type='raw'/>

(that is, let qemu-nbd parse the qcow2 metadata, since it CAN resize locally, and expose only the guest-visible bytes over the network so that libvirt only deals with a raw image instead of a qcow2 image).

Comment 5 Eric Blake 2019-03-07 05:18:33 UTC
> Generally, until NBD gains resize, we recommend that you do the opposite:
> 
> # qemu-nbd -t -p 10809 --format=qcow2 /tmp/scsi
> $ virsh ...
> <disk type='network' device='disk'>
>   <driver name='qemu' type='raw'/>
> 
> (that is, let qemu-nbd parse the qcow2 metadata, since it CAN resize
> locally, and expose only the guest-visible bytes over the network so that
> libvirt only deals with a raw image instead of a qcow2 image).

Of course, if you do that, you CAN'T take an internal snapshot (since internal snapshots require qcow2 disks).

Comment 6 gaojianan 2019-03-07 05:42:41 UTC
(In reply to Eric Blake from comment #3)
> (In reply to gaojianan from comment #0)
> 
> > Steps to Reproduce:
> > 1.Setup a nbd server:
> > # qemu-nbd -t -p 10809  --format=raw /tmp/scsi
> 
> You are exposing raw bytes over NBD,
> 
> > 
> > 2.prepare a guest xml with nbd image:
> > # virsh dumpxml rhel8.0|grep 'disk t' -A8
> >     <disk type='network' device='disk'>
> >       <driver name='qemu' type='qcow2'/>
> 
> ...and asking the guest to interpret those bytes as qcow2. Are you
> ABSOLUTELY sure that /tmp/scsi is sized large enough? NBD does not support
> resizing, and the attempt to store an internal snapshot may end up requiring
> quite a lot of space.
> 
> > # virsh qemu-monitor-command rhel8.0 --hmp savevm s1
> > Error while writing VM state: Invalid argument
> 
> Internal snapshots are not something we recommend downstream, but I suspect
> that the root cause here is that you are running out of space in /tmp/scsi
> to hold the full internal snapshot, and that the problem would be fixed once
> NBD gains resize support. (Upstream has mentioned the idea, but no one has
> submitted a formal design or patches).


I think the space is enough for the internal snapshot:
[root@nssguest ~]# qemu-img info /tmp/scsi 
image: /tmp/scsi
file format: qcow2
virtual size: 10G (10737418240 bytes)
disk size: 256K
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

So maybe it's other problem.

Comment 7 Eric Blake 2019-03-07 06:01:43 UTC
(In reply to gaojianan from comment #6)

> > You are exposing raw bytes over NBD,

> > ...and asking the guest to interpret those bytes as qcow2. Are you
> > ABSOLUTELY sure that /tmp/scsi is sized large enough? NBD does not support
> > resizing, and the attempt to store an internal snapshot may end up requiring
> > quite a lot of space.

> I think the space is enough for the internal snapshot:
> [root@nssguest ~]# qemu-img info /tmp/scsi 
> image: /tmp/scsi
> file format: qcow2
> virtual size: 10G (10737418240 bytes)
> disk size: 256K

256k? How much memory did you allocate to your guest? I'm QUITE certain that is not enough space - you are telling the guest that it has 10G of space to play with, but telling qemu that it can only use 256k of space to store BOTH the guest data AND the internal snapshot.

Even using 'qemu-img create -f qcow2 --preallocation=falloc, you would have enough space for one copy of the guest data, but probably not enough for the internal snapshot (that would create an image with a disk size slightly larger than 10G - enough for the 10G the guest will eventually write and the qcow2 metadata pointing to that 10G allocation - but the internal snapshot needs ADDITIONAL space for the domain's live memory capture, at least as large as the amount of memory you allocated to your guest + the overhead of qemu's migration stream).  And that's with just one internal snapshot - if you are creating multiple internal snapshots, you start having multiple copies of guest data; it's very easy to come up with a qcow2 file with multiple internal snapshots where the guest sees only 10G but the qcow2 file consumes more than 40G.  There might be situations where you get lucky with just a 10G disk size (if the guest hasn't fully used the disk, then your live snapshot can use those clusters) - but then your guest risks an ENOSPC error later.  In short, internal snapshots are space-hungry; on regular files, qcow2 can grow the file to match, but without support for file resize in the NBD protocol, qemu is very likely to run out of space on the FIXED size that NBD advertised.

If you have qemu.git qemu-nbd available (will be formally released in upcoming qemu 4.0), you can use 'qemu-nbd --list' in between step 1 and step 2 to see the file size that you are forcing qemu to stay within; but I suspect it would be the same 256k as the disk size that qemu-img info is reporting.

Comment 8 Eric Blake 2019-03-07 06:10:30 UTC
(In reply to Eric Blake from comment #7)

> Even using 'qemu-img create -f qcow2 --preallocation=falloc, you would have

Typo'd that line, it is probably more like:

qemu-img create -f qcow2 -o preallocation=falloc /tmp/scsi 10G

But based on qemu-img reporting a disk size of 256k, I'm guessing you created /tmp/scsi without any preallocation, and didn't even try to truncate it to be larger. Perhaps you can get away with:

truncate --size=20G /tmp/scsi

prior to step 2, and then retry your internal snapshot with much more space available for the qcow2 file to expand into (because NBD will now advertise 20G instead of 256k as the fixed file size that qemu has to live within).

Comment 9 Eric Blake 2019-03-07 06:18:58 UTC
actually, the truncate is needed prior to step 1. I just checked that qemu-nbd remembers the size that the file had when it was first open()d, and does not advertise a larger size to clients even if the client connects after another process has resized the file in the meantime.

Comment 10 Tingting Mao 2019-03-07 07:20:25 UTC
Hi Eric,

I tried in QEMU layer, still hit this issue. 


Tested with:
qemu-kvm-3.1.0-18.module+el8+2834+fa8bb6e2
kernel-4.18.0-67.el8


Steps:

1. Create qcow2 base file with preallocation=falloc
# qemu-img create -f qcow2 -o preallocation=falloc base.qcow2 20G
Formatting 'base.qcow2', fmt=qcow2 size=21474836480 cluster_size=65536 preallocation=falloc lazy_refcounts=off refcount_bits=16

2. Expose this image by qemu-nbd
# qemu-nbd -f raw base.qcow2 -p 9000 -t

3. Install guest from this image
# /usr/libexec/qemu-kvm \
        -name 'guest' \
        -machine pc \
        -nodefaults \
        -vga qxl \
        -drive id=drive_cd1,if=none,snapshot=off,aio=threads,cache=unsafe,media=cdrom,file=RHEL8.0-BaseOS-x86_64.iso \
        -device ide-cd,id=cd1,drive=drive_cd1,bus=ide.0,unit=0 \
        -drive id=drive_image1,if=none,snapshot=off,aio=threads,cache=unsafe,format=qcow2,file=nbd:localhost:9000 \
        -device virtio-blk-pci,id=virtio_blk_pci0,drive=drive_image1,bus=pci.0,addr=05,bootindex=0 \
        -vnc :0 \
        -monitor stdio \
        -m 8192 \
        -smp 8 \
        -device virtio-net-pci,mac=9a:b5:b6:b1:b2:b3,id=idMmq1jH,vectors=4,netdev=idxgXAlm,bus=pci.0,addr=0x9  \
        -netdev tap,id=idxgXAlm \
        -chardev socket,id=qmp_id_qmpmonitor1,path=/var/tmp/timao/monitor-qmpmonitor1-20180220-094308-h9I6hRsI,server,nowait \
        -mon chardev=qmp_id_qmpmonitor1,mode=control  \

4. After installation, create internal snapshot with QMP
# nc -U monitor-qmpmonitor1-20180220-094308-h9I6hRsI
{"QMP": {"version": {"qemu": {"micro": 0, "minor": 1, "major": 3}, "package": "qemu-kvm-3.1.0-18.module+el8+2834+fa8bb6e2"}, "capabilities": []}}
{"execute": "qmp_capabilities"}
{"return": {}}

{"execute":"human-monitor-command","arguments":{"command-line":"savevm sn1"}}
{"timestamp": {"seconds": 1551942369, "microseconds": 437925}, "event": "STOP"}
{"timestamp": {"seconds": 1551942369, "microseconds": 527404}, "event": "RESUME"}
{"return": "Error while writing VM state: Invalid argument\r\n"} --------------------------------- Hit the issue!

5. Check image info
# qemu-img info base.qcow2 -U
image: base.qcow2
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 20G
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

Comment 11 Eric Blake 2019-03-07 15:37:34 UTC
(In reply to Tingting Mao from comment #10)

> 1. Create qcow2 base file with preallocation=falloc
> # qemu-img create -f qcow2 -o preallocation=falloc base.qcow2 20G
> Formatting 'base.qcow2', fmt=qcow2 size=21474836480 cluster_size=65536
> preallocation=falloc lazy_refcounts=off refcount_bits=16

Big enough for the guest's data, but not necessarily big enough for the guest's data AND additional metadata.  But assuming the guest is sparse, this is better than before; but may still be hitting ENOSPC. Let's investigate:

> {"execute":"human-monitor-command","arguments":{"command-line":"savevm sn1"}}
> {"timestamp": {"seconds": 1551942369, "microseconds": 437925}, "event":
> "STOP"}
> {"timestamp": {"seconds": 1551942369, "microseconds": 527404}, "event":
> "RESUME"}
> {"return": "Error while writing VM state: Invalid argument\r\n"}
> --------------------------------- Hit the issue!


That error message is printed only by migration/savevm.c in qemu_savevm_state(), if an earlier qemu_file_get_error() returns non-zero, and that appears to happen only when qemu_file_set_error() is called.

The next obvious thing is to trace the earlier calls: savevm.c is registering block_writev_buffer() as its callback function in qemu_fopen_bdrv() at the start of save_snapshot(). As soon as the migration process tries to write to the underlying file, it triggers a call to block_writev_buffer(), which calls bdrv_writev_vmstate(), then bdrv_rw_vmstate(), then bdrv_co_rw_vmstate(), then checks whether drv->bdrv_save_vmstate() exists.  We are using a qcow2 file, and qcow2.c DOES have a .bdrv_save_vmstate() callback, that function in turn calls bs->drv->bdrv_co_pwritev() which wants to write to the NBD layer - but note that it uses an offset of qcow2_vm_state_offset() as its starting point, which refers to the variable l1_vm_state_index.  Looking at qcow2_do_open(), this variable is initialized based header.size, which stores the guest-visible size.  That is, the internal snapshot WANTS to store internal snapshot information AS IF it resided at a guest-visible address BEYOND the end of the last actual possible guest-accessible address. Then, when qemu goes to actually write data at that "beyond-guest-visible" offset, it has to find a free cluster. With preallocation=falloc, ALL guest-visible addresses have already had clusters allocated, so qcow2 HAS to allocate a NEW cluster - but allocating a new cluster requires that NBD support resizes, which it does not. Since the resize fails, block_writev_buffer() ends up setting qemu_file_set_error().

Short of using truncate --size= to make the qcow2 image LARGER than anything that the guest can allocate (and thus giving qemu somewhere where it can allocate the clusters to hold the internal snapshot in addition to all the clusters it allocated during preallocation=falloc), you are failing because NBD does not permit resizes, but your request for an internal snapshot won't fit in the size currently allocated.

Meanwhile, as this is using internal snapshots, and we do not recommend that to customers, improving the situation is NOT a high priority.  Really, the only thing that could be done here is improving the error message to be saner to make it obvious that you've hit ENOSPC, or for you to fix your setup to give a large enough file to qemu so that it isn't running out of space since NBD can't resize.


Note You need to log in before you can comment on or make changes to this bug.