Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1080521 - Violating hard constraint positive Affinity rule can prevent fixing the violated rule forever
Summary: Violating hard constraint positive Affinity rule can prevent fixing the viola...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.4.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 3.5.0
Assignee: Gilad Chaplik
QA Contact: Artyom
URL:
Whiteboard: sla
Depends On: 1080515
Blocks: 1122446 rhev3.5beta 1156165
TreeView+ depends on / blocked
 
Reported: 2014-03-25 15:29 UTC by Gilad Chaplik
Modified: 2016-02-10 20:17 UTC (History)
12 users (show)

Fixed In Version: ovirt-3.5.0-beta2
Doc Type: Bug Fix
Doc Text:
Clone Of: 1080515
: 1122446 (view as bug list)
Environment:
Last Closed: 2015-02-17 17:07:14 UTC
oVirt Team: SLA
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
oVirt gerrit 26619 master MERGED core: Change how Affinity behaves if the assumptions are invalid Never
oVirt gerrit 30627 ovirt-engine-3.4 MERGED core: Change how Affinity behaves if the assumptions are invalid Never

Description Gilad Chaplik 2014-03-25 15:29:29 UTC
+++ This bug was initially created as a clone of Bug #1080515 +++

Description of problem:

I found out that it is possible to get to a state when it is not possible to fix affinity group violation.

Take a look at the following code snippets:

// Group all hosts for VMs with positive affinity
for (Guid id : allVmIdsPositive) {
    VM runVm = runningVMsMap.get(id);
    if (runVm != null && runVm.getRunOnVds() != null) {
         acceptableHosts.add(runVm.getRunOnVds());
    }
}

In the above snippet the allVmIdsPositive holds a list of VMs that are supposed to run on the same host (Positive affinity).

The acceptableHosts set then ends up with all hosts that are used to run the VMs from the allVmIdsPositive list. The assumption here is that it should be either single host or empty set if no other VM from the Affinity group is running.

The following snippet checks that:

if (acceptableHosts.isEmpty()) {
    acceptableHosts.addAll(hostMap.keySet());
} else if (acceptableHosts.size() == 1 &&  
           hostMap.containsKey(acceptableHosts.iterator().next())) {
    hasPositiveConstraint = true;
    // Only one host is allowed for positive affinity, i.e. if the VM
    // contained in a positive affinity group he must run on the host
    // that all the other members are running, if the VMs spread across
    // hosts, the affinity rule isn't applied.
} else {
    ...
    return null;
}

Now focus on the last else clause. If for any reason there are VMs that

1) belong to the same positive affinity group
2) run on different hosts

then the filter returns null meaning no host can be used to run the currently scheduled VM.

The same scheduling algorithm is used when the user starts a new VM, when the user tries to migrate a VM manually and when the load balancing job tries to rebalance the cluster. In all of those cases any VM belonging to the affinity group is prevented to run or migrate.

Now, how can this happen?

The user is free to change cluster policies and the Affinity Filter module can be disabled at first.

Version-Release number of selected component (if applicable):

ovirt-engine master as of 25th of Mar 2014, 16:13 CET

Steps to Reproduce:
1. Disable affinity modules from cluster policy
2. Create at least 2 VMs
3. Add all VMs from step 2 to hard constraint positive affinity group
4. Start the VMs on different hosts
5. Enable the affinity modules in cluster policy
6. Try to fix the issue or watch the cluster as it tries to rebalance

Actual results:

The VMs are stuck on their hosts and no VM from the affinity group can be started.

Expected results:

The VMs are automatically rebalanced to run on a single host.

Additional info:

I believe the logic for hard constraint positive affinity should be changed to:

1) use any host if there is no VM from that group running (already there)
2) leave only hosts that already have VMs from that group running (instead of filtering out all hosts)

Comment 2 Artyom 2014-07-27 08:50:11 UTC
Verified on ovirt-engine-3.5.0-0.0.master.20140722232058.git8e1babc.el6.noarch
If I have two vms in hard positive affinity group on different hosts(started in without affinity policy) and start third vm(in same affinity group), vm randomly check host from two hosts, where runs vms.
If after it I manually migrate vms on one host, engine start other vms from affinity group also on this host

Comment 3 Eyal Edri 2015-02-17 17:07:14 UTC
rhev 3.5.0 was released. closing.


Note You need to log in before you can comment on or make changes to this bug.