Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1517055 - balloon stat returned value is 18446744073709551615 rather than -1 when guest balloon service is not running
Summary: balloon stat returned value is 18446744073709551615 rather than -1 when guest...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.5
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: Markus Armbruster
QA Contact: Yumei Huang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-24 05:59 UTC by lijin
Modified: 2017-12-22 16:34 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-22 16:34:52 UTC


Attachments (Terms of Use)

Description lijin 2017-11-24 05:59:05 UTC
Description of problem:


Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.10.0-7.el7.x86_64.rpm
kernel-3.10.0-785.el7.x86_64


How reproducible:
100%

Steps to Reproduce:
1.boot win2016 guest with balloon device:
/usr/libexec/qemu-kvm \
-M q35 \
-drive file=ovmf/OVMF_CODE.secboot.verbose.fd,if=pflash,format=raw,unit=0,readonly=on \
-drive file=ovmf/OVMF_VARS.fd,if=pflash,format=raw,unit=1 \
-debugcon file:./ovmf.log \
-drive file=ovmf/UefiShell.iso,if=none,cache=none,snapshot=off,aio=native,media=cdrom,id=cdrom1 \
-device ide-cd,drive=cdrom1,id=ide-cd1 \
-drive file=en_windows_server_2016_x64_dvd_9718492.iso,if=none,cache=none,aio=native,media=cdrom,id=cdrom2 \
-device ide-cd,drive=cdrom2,id=ide-cd2,bus=ide.1,unit=0 \
-cpu SandyBridge,hv_time,hv_relaxed,hv_vapic,hv_spinlocks=0x1fff \
-enable-kvm \
-m 6G \
-smp 4 \
-nodefconfig \
-rtc base=localtime,driftfix=slew \
-drive file=win2016-during.qcow2,if=none,format=qcow2,cache=none,werror=stop,rerror=stop,id=drive-disk0,aio=native \
-device ide-drive,drive=drive-disk0,id=virtio-disk0,bus=ide.0,unit=0,bootindex=1 \
-device piix3-usb-uhci,id=usb -device usb-tablet,id=tablet0 \
-cdrom virtio-win-prewhql-0.1-143.iso \
-fda virtio-win-prewhql-0.1-143_amd64.vfd \
-vnc 0.0.0.0:0 \
-k en-us \
-vga std \
-monitor stdio \
-qmp tcp:0:4445,server,nowait \
-boot menu=on \
-device ioh3420,bus=pcie.0,id=root1.0,slot=1 \
-netdev tap,script=/etc/qemu-ifup,id=hostnet0,vhost=on,queues=64 \
-device e1000e,netdev=hostnet0,id=net0,mac=00:52:4c:20:4d:68,bus=root1.0 \
-device ioh3420,bus=pcie.0,id=root2.0,slot=2 \
-device virtio-balloon-pci,id=balloon0,bus=root2.0 \

2.do NOT install balloon service in guest so guest memory statistics will not be returned.

3.check balloon stat from qmp:
{"execute":"qmp_capabilities"} 
{"execute": "qom-set","arguments": { "path": "/machine/peripheral/balloon0","property": "guest-stats-polling-interval", "value": 2 } }
{ "execute": "qom-get","arguments": { "path": "/machine/peripheral/balloon0","property": "guest-stats" } }

Actual results:
all balloon stat is 18446744073709551615
{"return": {"stats": {"stat-swap-out": 18446744073709551615, "stat-available-memory": 18446744073709551615, "stat-free-memory": 18446744073709551615, "stat-minor-faults": 18446744073709551615, "stat-major-faults": 18446744073709551615, "stat-total-memory": 18446744073709551615, "stat-swap-in": 18446744073709551615}, "last-update": 1511501782}}

Expected results:
all balloon stat should be -1:
{"return": {"stats": {"stat-swap-out": -1, "stat-available-memory": -1, "stat-free-memory": -1, "stat-minor-faults": -1, "stat-major-faults": -1, "stat-total-memory": -1, "stat-swap-in": -1}, "last-update": 1511502235}}


Additional info:
Can NOT reproduce with rhel7.4 qemu-kvm-rhev-2.9.0-14.el7.x86_64.rpm,balloon stat is -1 when 
{ "execute": "qom-get","arguments": { "path": "/machine/peripheral/balloon0","property": "guest-stats" } }
{"return": {"stats": {"stat-swap-out": -1, "stat-available-memory": -1, "stat-free-memory": -1, "stat-minor-faults": -1, "stat-major-faults": -1, "stat-total-memory": -1, "stat-swap-in": -1}, "last-update": 1511502235}}

Comment 4 Ladi Prosek 2017-11-27 12:42:55 UTC
This bisects to:

  commit 5923f85fb82df7c8c60a89458a5ae856045e5ab1
  Author: Marc-André Lureau <marcandre.lureau@redhat.com>
  Date:   Wed Jun 7 20:36:03 2017 +0400

      qapi: update the qobject visitor to use QNUM_U64
    
      Switch to use QNum/uint where appropriate to remove i64 limitation.
    
      The input visitor will cast i64 input to u64 for compatibility
      reasons (existing json QMP client already use negative i64 for large
      u64, and expect an implicit cast in qemu).
    
      Note: before the patch, uint64_t values above INT64_MAX are sent over
      json QMP as negative values, e.g. UINT64_MAX is sent as -1. After the
      patch, they are sent unmodified.  Clearly a bug fix, but we have to
      consider compatibility issues anyway.  libvirt should cope fine,
      because its parsing of unsigned integers accepts negative values
      modulo 2^64.  There's hope that other clients will, too.
    
      Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
      Reviewed-by: Markus Armbruster <armbru@redhat.com>
      Message-Id: <20170607163635.17635-12-marcandre.lureau@redhat.com>
      [check_native_list() tweaked for consistency with signed case]
      Signed-off-by: Markus Armbruster <armbru@redhat.com>


In particular, this change:

--- a/qapi/qobject-output-visitor.c
+++ b/qapi/qobject-output-visitor.c
@@ -150,9 +150,8 @@ static void qobject_output_type_int64(Visitor *v, const char *name,
 static void qobject_output_type_uint64(Visitor *v, const char *name,
                                        uint64_t *obj, Error **errp)
 {
-    /* FIXME values larger than INT64_MAX become negative */
     QObjectOutputVisitor *qov = to_qov(v);
-    qobject_output_add(qov, name, qnum_from_int(*obj));
+    qobject_output_add(qov, name, qnum_from_uint(*obj));
 }


The balloon device has always exposed the stats as unsigned 64-bit integers. Prior to the change above they were (incorrectly!?) printed as signed numbers.

So this looks like a by design change, although potentially breaking. Adding Marc-André to CC.

-1 (or 18446744073709551615 when represented as a uint64) is a special value used by the device to mark stats that have not been provided by the guest. It is also used by the Windows driver for a similar purpose, as the driver internally gets the stats from a user space service which may or may not be running. The -1 has no support in the spec, it's just a value that the implementation had chosen to use.

If reverting 5923f85fb is not desirable, I'd suggest to fix the balloon driver like so:

--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -130,7 +130,9 @@ static void balloon_stats_get_all(Object *obj, Visitor *v, const char *name,
         goto out_end;
     }
     for (i = 0; i < VIRTIO_BALLOON_S_NR; i++) {
-        visit_type_uint64(v, balloon_stat_names[i], &s->stats[i], &err);
+        if (s->stats[i] != -1) {
+            visit_type_uint64(v, balloon_stat_names[i], &s->stats[i], &err);
+        }
         if (err) {
             goto out_nested;
         }

This way the magic -1 (or whatever other marker it chooses to use; might be a separate flag) stays internal to the implementation. It's also potentially breaking but the result is much less broken than what we have now ;)

Comment 5 Ladi Prosek 2017-11-27 12:44:15 UTC
(In reply to Ladi Prosek from comment #4)
> I'd suggest to fix the balloon driver like so:

Correction: balloon *device*

Comment 6 Marc-Andre Lureau 2017-11-27 22:23:15 UTC
(In reply to Ladi Prosek from comment #4)
> This bisects to:
> 
>   commit 5923f85fb82df7c8c60a89458a5ae856045e5ab1
>   Author: Marc-André Lureau <marcandre.lureau@redhat.com>
>   Date:   Wed Jun 7 20:36:03 2017 +0400
> 
>       qapi: update the qobject visitor to use QNUM_U64
>     
>       Switch to use QNum/uint where appropriate to remove i64 limitation.
>     
>       The input visitor will cast i64 input to u64 for compatibility
>       reasons (existing json QMP client already use negative i64 for large
>       u64, and expect an implicit cast in qemu).
>     
>       Note: before the patch, uint64_t values above INT64_MAX are sent over
>       json QMP as negative values, e.g. UINT64_MAX is sent as -1. After the
>       patch, they are sent unmodified.  Clearly a bug fix, but we have to
>       consider compatibility issues anyway.  libvirt should cope fine,
>       because its parsing of unsigned integers accepts negative values
>       modulo 2^64.  There's hope that other clients will, too.
>     
>       Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
>       Reviewed-by: Markus Armbruster <armbru@redhat.com>
>       Message-Id: <20170607163635.17635-12-marcandre.lureau@redhat.com>
>       [check_native_list() tweaked for consistency with signed case]
>       Signed-off-by: Markus Armbruster <armbru@redhat.com>
> 


Adding Markus, maintainer & reviewer

> In particular, this change:
> 
> --- a/qapi/qobject-output-visitor.c
> +++ b/qapi/qobject-output-visitor.c
> @@ -150,9 +150,8 @@ static void qobject_output_type_int64(Visitor *v, const
> char *name,
>  static void qobject_output_type_uint64(Visitor *v, const char *name,
>                                         uint64_t *obj, Error **errp)
>  {
> -    /* FIXME values larger than INT64_MAX become negative */
>      QObjectOutputVisitor *qov = to_qov(v);
> -    qobject_output_add(qov, name, qnum_from_int(*obj));
> +    qobject_output_add(qov, name, qnum_from_uint(*obj));
>  }
> 
> 
> The balloon device has always exposed the stats as unsigned 64-bit integers.
> Prior to the change above they were (incorrectly!?) printed as signed
> numbers.
> 
> So this looks like a by design change, although potentially breaking. Adding
> Marc-André to CC.
> 
> -1 (or 18446744073709551615 when represented as a uint64) is a special value
> used by the device to mark stats that have not been provided by the guest.
> It is also used by the Windows driver for a similar purpose, as the driver
> internally gets the stats from a user space service which may or may not be
> running. The -1 has no support in the spec, it's just a value that the
> implementation had chosen to use.

> 
> If reverting 5923f85fb is not desirable, I'd suggest to fix the balloon
> driver like so:
> 
> --- a/hw/virtio/virtio-balloon.c
> +++ b/hw/virtio/virtio-balloon.c
> @@ -130,7 +130,9 @@ static void balloon_stats_get_all(Object *obj, Visitor
> *v, const char *name,
>          goto out_end;
>      }
>      for (i = 0; i < VIRTIO_BALLOON_S_NR; i++) {
> -        visit_type_uint64(v, balloon_stat_names[i], &s->stats[i], &err);
> +        if (s->stats[i] != -1) {
> +            visit_type_uint64(v, balloon_stat_names[i], &s->stats[i], &err);
> +        }
>          if (err) {
>              goto out_nested;
>          }
> 
> This way the magic -1 (or whatever other marker it chooses to use; might be
> a separate flag) stays internal to the implementation. It's also potentially
> breaking but the result is much less broken than what we have now ;)

Why not switch the property & visitor to a i64 instead if you need -1? Obviously, this will limit the max value to int64_max, which may or may not be a problem..

Comment 7 Ladi Prosek 2017-11-28 07:18:03 UTC
(In reply to Marc-Andre Lureau from comment #6)
> Why not switch the property & visitor to a i64 instead if you need -1?
> Obviously, this will limit the max value to int64_max, which may or may not
> be a problem..

That would definitely be the safest fix, 9 exabytes ought to be enough for anybody.

Not sure if it would change anything else, aside from the int->string conversion. Does QAPI have a formal schema where this would be discoverable? If so, do clients tend to depend on it? Looking forward to Markus's take on this.

Comment 8 Markus Armbruster 2017-11-28 07:32:08 UTC
Two perspectives, QMP interface and virtio-balloon device interface.

Until upstream commit 5923f85, QMP sent large positive integers as
negative ones.  Long-standing bug, which commit 5923f85 finally fixed.
The commit message discusses compatibility ramifications:

    Note: before the patch, uint64_t values above INT64_MAX are sent over
    json QMP as negative values, e.g. UINT64_MAX is sent as -1. After the
    patch, they are sent unmodified.  Clearly a bug fix, but we have to
    consider compatibility issues anyway.  libvirt should cope fine,
    because its parsing of unsigned integers accepts negative values
    modulo 2^64.  There's hope that other clients will, too.

The hope is based on the fact that the "obvious" way to convert
decimal integers to int64_t is strtoll(), which silently maps large
positive values to negative ones, masking the change.

This change affects all QMP interfaces that return unsigned integers
>= 2^63.

The specific case here is qom-get of virtio-balloon's property
"guest-stats".  QOM properties are not a stable interface.  The only
documentation we seem to have for the balloon stats is the virtio
specification.  More on that below.

Of course, that doesn't mean we can just go ahead and break stuff with
abandon.  It does mean changes should only be considered regressions
when they actually break something.

Question: what exactly does this break besides failing our own QE
tests?

Ladi suggested to special-case UINT64_MAX, so QMP delivers it as -1
instead of 18446744073709551615, as it did before commit 5923f85.

Marc-André suggested to make the stats values signed, so QMP delivers
all large positive values as negative ones, as it did before commit
5923f85.  Makes more sense to me than special-casing -1.  However, it
kind of clashes with the virtio specification.

"Virtual I/O Device (VIRTIO) Version 1.0"[*], section 5.5.6.3 Memory
Statistics defines the stats values as unsigned 64 bit integers.  It
does not provide an error value.  In particular, UINT64_MAX is *not*
special in any way.

QEMU's virtio-balloon.c initializes stats to UINT64_MAX on device
reset, in reset_stats().  Initial UINT64_MAX get overwritten when the
device pushes stat values, in virtio_balloon_receive_stats().

The QOM interface is poorly designed: if the device ever pushes a
UINT64_MAX value, the QOM interface incorrectly signals "device hasn't
provided" instead of "device provided 18446744073709551615".

Options:

1. Do nothing.  Stats values >= 2^63 remain fixed, i.e. QMP delivers
them as positive values.

2. Ladi's hack.  Stats values >= 2^63 remain fixed, except UINT64_MAX
becomes -1.  Kind of right when UINT64_MAX genuinely means "device
hasn't provided a value".  Wrong when the device has provided a
UINT64_MAX value, but then we're screwed anyway.

3. Marc-André's partial revert.  Stats values >= 2^63 become broken
again.

I vote for 1. unless we can reasonably expect actual breakage outside
our own tests.



[*] http://docs.oasis-open.org/virtio/virtio/v1.0/virtio-v1.0.html

Comment 9 Ademar Reis 2017-11-28 12:48:55 UTC
(In reply to Markus Armbruster from comment #8)
> Marc-André suggested to make the stats values signed, so QMP delivers
> all large positive values as negative ones, as it did before commit
> 5923f85.  Makes more sense to me than special-casing -1.  However, it
> kind of clashes with the virtio specification.
> 
> "Virtual I/O Device (VIRTIO) Version 1.0"[*], section 5.5.6.3 Memory
> Statistics defines the stats values as unsigned 64 bit integers.  It
> does not provide an error value.  In particular, UINT64_MAX is *not*
> special in any way.
> 
> QEMU's virtio-balloon.c initializes stats to UINT64_MAX on device
> reset, in reset_stats().  Initial UINT64_MAX get overwritten when the
> device pushes stat values, in virtio_balloon_receive_stats().
> 
> The QOM interface is poorly designed: if the device ever pushes a
> UINT64_MAX value, the QOM interface incorrectly signals "device hasn't
> provided" instead of "device provided 18446744073709551615".
> 
> Options:
> 
> 1. Do nothing.  Stats values >= 2^63 remain fixed, i.e. QMP delivers
> them as positive values.
> 
> 2. Ladi's hack.  Stats values >= 2^63 remain fixed, except UINT64_MAX
> becomes -1.  Kind of right when UINT64_MAX genuinely means "device
> hasn't provided a value".  Wrong when the device has provided a
> UINT64_MAX value, but then we're screwed anyway.
> 
> 3. Marc-André's partial revert.  Stats values >= 2^63 become broken
> again.
> 
> I vote for 1. unless we can reasonably expect actual breakage outside
> our own tests.

Unless Marc-André or Ladi have strong opnions about this, we can probably turn your vote into a decision, as I'm reassigning the BZ to you. :-)

Comment 11 Markus Armbruster 2017-12-22 16:34:52 UTC
As explained in comment#8, the change is actually a bug fix, and the changed interface is not stable.  The change was caught by QE's tests (good).  Nothing else is known to have been affected.  As long as that's the case, we don't have to unfix the bug.  Closing out accordingly.  Please reopen if you can point to actual breakage.


Note You need to log in before you can comment on or make changes to this bug.