Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.

Bug 1061875

Summary: MOM not working properly with multiple VMs
Product: Red Hat Enterprise Virtualization Manager Reporter: Lukas Svaty <lsvaty>
Component: momAssignee: Martin Sivák <msivak>
Status: CLOSED INSUFFICIENT_DATA QA Contact: Lukas Svaty <lsvaty>
Severity: urgent Docs Contact: Cheryn Tan <chetan>
Priority: urgent    
Version: 3.3.0CC: dfediuck, gklein, iheim, mavital, rlandman, sherold, yeylon
Target Milestone: ---Keywords: Triaged
Target Release: 3.3.3   
Hardware: Unspecified   
OS: Linux   
Whiteboard: sla
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-02-13 08:40:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1078909, 1142926    
Attachments:
Description Flags
MOM log of of started VMs
none
VDSM log of actions none

Description Lukas Svaty 2014-02-05 19:42:45 UTC
Created attachment 859827 [details]
MOM log of of started VMs

Description of problem:
MOM is not counting appropriate statistics from all VMs, if there is bigger amount of VMs on the system. In my tests my statistics show only first 13Vms running.

Version-Release number of selected component (if applicable):
is32
mom-0.3.2-8.el6ev.noarch
vdsm-4.13.2-0.7.el6ev.x86_64
libvirt-0.10.2-29.el6_5.2.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Enable Balloon optimization on cluster
2. Create 16 or more VMs
3. get Stats from mom.getStatistics() form XMLRPC


Actual results:
MOM stop counting statistics from VMs run after 13th VM

Expected results:
MOM should consider statistics of all VMs in the system. In case of overloading the system some VMs balloon won't be deflated/inflated or KSM won't be working properly with all VMs. 

Additional info:
XMLRPC output from MOM:
{'guests': {'balloon-1': {'balloon_cur': 524288,
                          'balloon_max': 524288,
                          'balloon_min': 262144,
                          'host_major_faults': 6,
                          'host_minor_faults': 2,
                          'major_fault': 0,
                          'mem_available': 502256,
                          'mem_free': 402664,
                          'mem_unused': 334888,
                          'minor_fault': 124,
                          'rss': 76286,
                          'swap_in': 0,
                          'swap_out': 0,
                          'swap_total': 2097144,
                          'swap_usage': 0},
            'balloon-10': {'balloon_cur': 524288,
                           'balloon_max': 524288,
                           'balloon_min': 262144,
                           'host_major_faults': 0,
                           'host_minor_faults': 3,
                           'major_fault': 0,
                           'mem_available': 502256,
                           'mem_free': 403436,
                           'mem_unused': 338616,
                           'minor_fault': 131,
                           'rss': 73302,
                           'swap_in': 0,
                           'swap_out': 0,
                           'swap_total': 2097144,
                           'swap_usage': 0},
            'balloon-11': {'balloon_cur': 524288,
                           'balloon_max': 524288,
                           'balloon_min': 262144,
                           'host_major_faults': 0,
                           'host_minor_faults': 0,
                           'major_fault': 0,
                           'mem_available': 502256,
                           'mem_free': 403428,
                           'mem_unused': 338608,
                           'minor_fault': 131,
                           'rss': 73303,
                           'swap_in': 0,
                           'swap_out': 0,
                           'swap_total': 2097144,
                           'swap_usage': 0},
            'balloon-12': {'balloon_cur': 524288,
                           'balloon_max': 524288,
                           'balloon_min': 262144,
                           'host_major_faults': 0,
                           'host_minor_faults': 9,
                           'major_fault': 0,
                           'mem_available': 502256,
                           'mem_free': 401440,
                           'mem_unused': 336624,
                           'minor_fault': 13,
                           'rss': 73842,
                           'swap_in': 0,
                           'swap_out': 0,
                           'swap_total': 2097144,
                           'swap_usage': 0},
            'balloon-13': {'balloon_cur': 524288,
                           'balloon_max': 524288,
                           'balloon_min': 262144,
                           'host_major_faults': 0,
                           'host_minor_faults': 4,
                           'major_fault': 0,
                           'mem_available': 502256,
                           'mem_free': 403384,
                           'mem_unused': 338680,
                           'minor_fault': 13,
                           'rss': 73456,
                           'swap_in': 0,
                           'swap_out': 0,
                           'swap_total': 2097144,
                           'swap_usage': 0},
            'balloon-2': {'balloon_cur': 524288,
                          'balloon_max': 524288,
                          'balloon_min': 262144,
                          'host_major_faults': 3,
                          'host_minor_faults': 21,
                          'major_fault': 0,
                          'mem_available': 502256,
                          'mem_free': 402792,
                          'mem_unused': 336076,
                          'minor_fault': 137,
                          'rss': 75057,
                          'swap_in': 0,
                          'swap_out': 0,
                          'swap_total': 2097144,
                          'swap_usage': 0},
            'balloon-3': {'balloon_cur': 524288,
                          'balloon_max': 524288,
                          'balloon_min': 262144,
                          'host_major_faults': 0,
                          'host_minor_faults': 0,
                          'major_fault': 0,
                          'mem_available': 502256,
                          'mem_free': 398856,
                          'mem_unused': 332044,
                          'minor_fault': 137,
                          'rss': 76621,
                          'swap_in': 0,
                          'swap_out': 0,
                          'swap_total': 2097144,
                          'swap_usage': 0},
            'balloon-4': {'balloon_cur': 524288,
                          'balloon_max': 524288,
                          'balloon_min': 262144,
                          'host_major_faults': 0,
                          'host_minor_faults': 0,
                          'major_fault': 0,
                          'mem_available': 502256,
                          'mem_free': 400344,
                          'mem_unused': 333524,
                          'minor_fault': 137,
                          'rss': 75570,
                          'swap_in': 0,
                          'swap_out': 0,
                          'swap_total': 2097144,
                          'swap_usage': 0},
            'balloon-5': {'balloon_cur': 524288,
                          'balloon_max': 524288,
                          'balloon_min': 262144,
                          'host_major_faults': 0,
                          'host_minor_faults': 31,
                          'major_fault': 0,
                          'mem_available': 502256,
                          'mem_free': 400884,
                          'mem_unused': 334076,
                          'minor_fault': 138,
                          'rss': 74121,
                          'swap_in': 0,
                          'swap_out': 0,
                          'swap_total': 2097144,
                          'swap_usage': 0},
            'balloon-6': {'balloon_cur': 524288,
                          'balloon_max': 524288,
                          'balloon_min': 262144,
                          'host_major_faults': 1,
                          'host_minor_faults': 9,
                          'major_fault': 0,
                          'mem_available': 502256,
                          'mem_free': 400732,
                          'mem_unused': 334020,
                          'minor_fault': 137,
                          'rss': 76436,
                          'swap_in': 0,
                          'swap_out': 0,
                          'swap_total': 2097144,
                          'swap_usage': 0},
            'balloon-7': {'balloon_cur': 524288,
                          'balloon_max': 524288,
                          'balloon_min': 262144,
                          'host_major_faults': 0,
                          'host_minor_faults': 6,
                          'major_fault': 0,
                          'mem_available': 502256,
                          'mem_free': 402732,
                          'mem_unused': 335936,
                          'minor_fault': 14,
                          'rss': 74988,
                          'swap_in': 0,
                          'swap_out': 0,
                          'swap_total': 2097144,
                          'swap_usage': 0},
            'balloon-8': {'balloon_cur': 524288,
                          'balloon_max': 524288,
                          'balloon_min': 262144,
                          'host_major_faults': 0,
                          'host_minor_faults': 2,
                          'major_fault': 0,
                          'mem_available': 502256,
                          'mem_free': 402508,
                          'mem_unused': 335696,
                          'minor_fault': 13,
                          'rss': 76063,
                          'swap_in': 0,
                          'swap_out': 0,
                          'swap_total': 2097144,
                          'swap_usage': 0},
            'balloon-9': {'balloon_cur': 524288,
                          'balloon_max': 524288,
                          'balloon_min': 262144,
                          'host_major_faults': 1,
                          'host_minor_faults': 13,
                          'major_fault': 0,
                          'mem_available': 502256,
                          'mem_free': 402652,
                          'mem_unused': 335936,
                          'minor_fault': 2525,
                          'rss': 75218,
                          'swap_in': 0,
                          'swap_out': 0,
                          'swap_total': 2097144,
                          'swap_usage': 0}},
 'host': {'anon_pages': 7429488,
          'ksm_full_scans': 2029,
          'ksm_pages_shared': 44387,
          'ksm_pages_sharing': 564636,
          'ksm_pages_to_scan': 200,
          'ksm_pages_unshared': 218736,
          'ksm_pages_volatile': 56585,
          'ksm_run': 1,
          'ksm_shareable': 33204704,
          'ksm_sleep_millisecs': 20,
          'ksmd_cpu_usage': 3,
          'mem_available': 8030296,
          'mem_free': 259812,
          'mem_unused': 147340,
          'swap_in': 87,
          'swap_out': 0,
          'swap_total': 2097144,
          'swap_usage': 63792}}

Comment 1 Lukas Svaty 2014-02-05 19:43:32 UTC
Created attachment 859828 [details]
VDSM log of actions

Comment 2 Lukas Svaty 2014-02-06 11:15:59 UTC
Tried this with 16 smaller VMs 256MB/128MB (memory/guaranteed memory) and it seems to be working fine. However problem with 512/256MB VMs still persists.

Comment 3 Martin Sivák 2014-02-10 16:12:27 UTC
Can you attach the full mom.log? I suspect that some of your VMs are not running the quest agent..

Comment 4 Lukas Svaty 2014-02-13 08:40:50 UTC
After installing new environment (setup and hosts) for this bug it seems  it's working now. Since I can't provide any new logs. I'm closing this to INSUFFICIENT_DATA. If must have been something with misconfiguration as Martin suggested. If the bug appears again I'll reopen this with appropriate logs.