Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1516505 - savevm/internal snapshot: Hang/infinite disk usage with some sizes
Summary: savevm/internal snapshot: Hang/infinite disk usage with some sizes
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.4
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Kevin Wolf
QA Contact: Ping Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-22 19:02 UTC by Dr. David Alan Gilbert
Modified: 2017-12-01 11:42 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-01 11:42:50 UTC


Attachments (Terms of Use)

Description Dr. David Alan Gilbert 2017-11-22 19:02:30 UTC
Description of problem:
(This problem is fixed in 7.5 and doesn't seem to happen in 1.5.3 for reasons we don't quite understand).
dyasny originally hit this, taking a snapshot of a large VM, but it'll only happen for some VM sizes

Version-Release number of selected component (if applicable):
2.9.*

How reproducible:
100% ish?

Steps to Reproduce:
1. Create a linux guest on a qcow2
2. boot a vm with /usr/libexec/qemu-kvm -M pc --enable-kvm -m 3800M -monitor stdio -vnc :0   [Note the RAM size must be a size that truncates to a -ve value, i.e. between 2 and 4 G but not including)
3. install 'stress'
4. run stress --vm-bytes 3300M --vm 1 --vm-keep &
5. Let it think for a minute or two
6. at the qemu prompt do    savevm "boo"

Actual results:
It sits there for ever with the qcow2 growing, a gdb shows strange huge offsets in qcow2 code

Expected results:
A complete snapshot in finite time (but it takes a while to save 3G)

Additional info:
Probably worth closing sinze 7.5 and 1.5.3 work but just thought it best to record it anyway

Comment 2 Kevin Wolf 2017-11-22 19:10:07 UTC
To add some technical detail:

The problem seems to be a 32 bit truncation when qcow2_snapshot_create() calls
qcow2_discard_clusters(). The latter function takes a parameter int nb_sectors
and converts it to a signed 32 bit byte count, but the passed sn->vm_state_size
can be much larger than just 2 GB.

This is fixed in upstream and qemu-kvm-rhev in 7.5 because all calculations are
done in byte granularity and 64 bit variables now.

1.5.3 is not quite clear, but it seems that additional 32 bit truncations where
it uses unsigned int instead of uint64_t might prevent the catastrophic
behaviour. It's still likely that the result isn't completely as intended.

Comment 4 Ademar Reis 2017-11-28 13:56:24 UTC
(In reply to Kevin Wolf from comment #2)
> To add some technical detail:
> 
> The problem seems to be a 32 bit truncation when qcow2_snapshot_create()
> calls
> qcow2_discard_clusters(). The latter function takes a parameter int
> nb_sectors
> and converts it to a signed 32 bit byte count, but the passed
> sn->vm_state_size
> can be much larger than just 2 GB.
> 
> This is fixed in upstream and qemu-kvm-rhev in 7.5 because all calculations
> are
> done in byte granularity and 64 bit variables now.
> 
> 1.5.3 is not quite clear, but it seems that additional 32 bit truncations
> where
> it uses unsigned int instead of uint64_t might prevent the catastrophic
> behaviour. It's still likely that the result isn't completely as intended.

I suggest closing it as CURRENTRELEASE given it's fixed in 7.5/upstream.

Besides, we discourage the use of savevm in the RHEL documentation and RHV and OSP don't make use of it.

Comment 5 Kevin Wolf 2017-11-30 18:14:42 UTC
I agree that it's probably not worth fixing in the 7.4 qemu-kvm-rhev.

Before I close it, just to be sure, you also think the scenario is not relevant for base RHEL qemu-kvm because savevm is discouraged?

Comment 6 Dr. David Alan Gilbert 2017-11-30 18:32:39 UTC
(In reply to Kevin Wolf from comment #5)
> I agree that it's probably not worth fixing in the 7.4 qemu-kvm-rhev.
> 
> Before I close it, just to be sure, you also think the scenario is not
> relevant for base RHEL qemu-kvm because savevm is discouraged?

I think it is relevant on base qemu-kvm because savevm seems to be more common there with virt-manager than RHEV - HOWEVER - I never managed to trigger it on 1.5.3 so it doesn't seem to be a problem.

Comment 7 Dan Yasny 2017-11-30 18:42:51 UTC
I ran into this bug because I was using virt-manager and took a snapshot via the GUI, and my host, being subscribed to the RHOS channels, uses qemu-kvm-rhev

Comment 8 Ademar Reis 2017-11-30 18:46:20 UTC
(In reply to Kevin Wolf from comment #5)
> I agree that it's probably not worth fixing in the 7.4 qemu-kvm-rhev.
> 
> Before I close it, just to be sure, you also think the scenario is not
> relevant for base RHEL qemu-kvm because savevm is discouraged?

Given we haven't seen any reports of this problem in RHEL, I would say it's OK to close it. If users complain (or QE reports it), then we can re-evaluate.


Note You need to log in before you can comment on or make changes to this bug.