Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1362557 - [RFE] Use the improved VM overhead calculation in VM scheduling
Summary: [RFE] Use the improved VM overhead calculation in VM scheduling
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: Backend.Core
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
high
medium vote
Target Milestone: ovirt-4.2.0
: 4.2.0
Assignee: Martin Sivák
QA Contact: Artyom
URL:
Whiteboard: FutureFeature
Depends On: 1304346
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-02 13:43 UTC by Martin Sivák
Modified: 2017-12-22 06:50 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
The accuracy of the algorithm that estimates the overhead of QEMU and VM devices has been improved. The old algorithm allowed the system to run in a hidden overcommit mode. Now that VM memory requirements are properly tracked, the cluster may appear not to support as many VMs as in the past. It is possible to get the same behavior (properly tracked) by enabling a slight memory overcommit using the REST API or the UI.
Clone Of:
Environment:
Last Closed: 2017-12-22 06:50:45 UTC
oVirt Team: SLA
rule-engine: ovirt-4.2+
gklein: testing_plan_complete-
mgoldboi: planning_ack+
msivak: devel_ack+
mavital: testing_ack+


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
oVirt gerrit 57390 master MERGED Use the new VmOverheadCalculator in scheduler 2017-10-04 13:45:26 UTC
oVirt gerrit 72141 master MERGED Use the overhead calculator to get commited memory in HostMonitoring 2017-10-04 15:35:56 UTC

Description Martin Sivák 2016-08-02 13:43:38 UTC
Description of problem:

The virt team added new method of computing a VM memory overhead (the extra memory needed by Qemu and the host on top of the configured memory of the VM).

We should start utilizing it.


This requires some QA and maybe even community involvement to make sure we are not rejecting VMs when there is still plenty of RAM.

Comment 1 Martin Sivák 2016-12-20 08:52:56 UTC
We have the patch ready (well it will probably have to be rebased), but we need to see whether we have QE capacity for normal and scale testing of the updated behaviour.

Comment 4 eberman 2017-01-05 13:24:36 UTC
before i can ack
i need to understand what exactly we need to do here

like:

what should the VM template hold?
what the VM template parameters?
what exactly should i monitor?
what is limit expected on what host HW?

etc....

thx

Comment 5 Martin Sivák 2017-01-05 13:29:32 UTC
Test scenario (both before and after the fix):

- Disable host swap (optional)
- Disable overcommit, KSM and balloon
- Start as many VMs (1) as possible to fill up the cluster
- Consume all possible memory inside many (or all) of the VMs
- Check for host memory and swap consumption


(1) Make the VM configuration slightly non trivial
- add multiple disks (can be small, the count is more important than size)
- configure multiple displays and/or high resolution display device
- enable memory hotplug (especially on PPC!)


Now, the assumption is that all VMs can use their memory fully when there is no overcommit (memory utilization 100%, no KSM and no balloon). I believe it won't be possible before this bug is fixed, because the Qemu overhead will consume some of that memory. The swap will probably be used if it is enabled.

You (probably) won't be able to start as many VMs once this is fixed, but all of those VMs should be able to utilize their full memory allocation. There should be no significant swapping in this case.

Comment 6 Moran Goldboim 2017-06-26 11:07:20 UTC
This implantation is very relevant for vGPU use cases.

Comment 7 Ilan Zuckerman 2017-08-29 10:39:36 UTC
(In reply to Martin Sivák from comment #5)
> Test scenario (both before and after the fix):
> 
> - Disable host swap (optional)
> - Disable overcommit, KSM and balloon
> - Start as many VMs (1) as possible to fill up the cluster
> - Consume all possible memory inside many (or all) of the VMs
> - Check for host memory and swap consumption
> 
> 
> (1) Make the VM configuration slightly non trivial
> - add multiple disks (can be small, the count is more important than size)
> - configure multiple displays and/or high resolution display device
> - enable memory hotplug (especially on PPC!)
> 
> 
> Now, the assumption is that all VMs can use their memory fully when there is
> no overcommit (memory utilization 100%, no KSM and no balloon). I believe it
> won't be possible before this bug is fixed, because the Qemu overhead will
> consume some of that memory. The swap will probably be used if it is enabled.
> 
> You (probably) won't be able to start as many VMs once this is fixed, but
> all of those VMs should be able to utilize their full memory allocation.
> There should be no significant swapping in this case.

Hi Martin, just a few questions regarding this issue:
1. Which two engine versions this needs to be validated on (before fix and after the fix)?
2. Do we need to manually enable the fix or is it automatically applied on fixed engine version?

Comment 8 Martin Sivák 2017-08-29 12:18:08 UTC
The patches are not merged yet.

Comment 9 Ilan Zuckerman 2017-10-03 13:05:45 UTC
(In reply to Martin Sivák from comment #8)
> The patches are not merged yet.

Hi Martin, can you please relate to the questions asked by me in comment7?
We need this information in order to validate the scenario, and we cannot plan testing before that.

Comment 10 Martin Sivák 2017-10-04 08:38:40 UTC
Ilan, I can't give you the engine version (except saying it will be 4.2.x), because it is not merged yet (it is ready, but we want to do some preliminary tests first). The second answer is yes, it is automatically applied and does not need to be enabled.

Comment 11 Ilan Zuckerman 2017-10-16 14:41:35 UTC
(In reply to Martin Sivák from comment #5)
Hi Martin, please see my questions inline.
> 
> (1) Make the VM configuration slightly non trivial
> - add multiple disks (can be small, the count is more important than size)
Which disks? size? provision (iscsi/nfs/thin)?

> - configure multiple displays and/or high resolution display device
Can you elaborate? i am not familiar with this feature.

> - enable memory hotplug (especially on PPC!)
Same thing here. Please elaborate / refer me.


Currently i have the following environment up and ready. please tell me whether it is suitable:
- 50 nested hosts
- 200 vms on one actual host. So overall 250 vms (including nested hosts)
- Each vm has one Thin disk 100GB

Comment 13 Martin Sivák 2017-10-17 13:43:58 UTC
> > - add multiple disks (can be small, the count is more important than size)
> Which disks? size? provision (iscsi/nfs/thin)?

Not important actually. The amount of attached disks can influence memory consumption, the type and size do not as I already said..

> > - configure multiple displays and/or high resolution display device
> Can you elaborate? i am not familiar with this feature.

Just configure the amount of displays for the VMs to be bigger than 1.

> > - enable memory hotplug (especially on PPC!)
> Same thing here. Please elaborate / refer me.

Define maximum memory to be 4x the configured memory for a VM (do not touch minimum guaranteed). This is probably even the default.

> Currently i have the following environment up and ready. please tell me
> whether it is suitable:
> - 50 nested hosts
> - 200 vms on one actual host. So overall 250 vms (including nested hosts)
> - Each vm has one Thin disk 100GB

The goal is to find out the VMs per host  density we can achieve and compare the numbers before and after the change (both with disabled memory overcommit).

Comment 16 Martin Sivák 2017-10-30 12:17:54 UTC
Some additional info I got from libvirt folks:

The QEMU process will for example eat some memory for every active VNC connection and we can't really tell how many users will be connected to a VM when we are starting it. But it increases the overhead.

Comment 17 Artyom 2017-11-09 07:03:59 UTC
Can you please provide feature page?

Comment 18 Artyom 2017-12-21 05:56:09 UTC
Verified on rhvm-4.2.0.2-0.1.el7.noarch

1) Create 4 VM's when at least one VM has additional 10 disks and 4 monitors and total VM's memory consume all host scheduling memory.
2) Start memory load on all VM's(VM memory consumption near 100%, you can use memtester for this purpose)

4.1 - one of VM's crashed after 1-2 minutes
4.2 - I ran memtester for 6 hours when the host had 97% of memory consumption, no VM's crashed on it.

Comment 19 Sandro Bonazzola 2017-12-22 06:50:45 UTC
This bugzilla is included in oVirt 4.2.0 release, published on Dec 20th 2017.

Since the problem described in this bug report should be
resolved in oVirt 4.2.0 release, published on Dec 20th 2017, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.