Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1601387 - Preferred anti-affinity not scheduling pods when the anti-affinity criteria can't be satisfied
Summary: Preferred anti-affinity not scheduling pods when the anti-affinity criteria c...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Pod
Version: 3.9.0
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: ---
Assignee: Avesh Agarwal
QA Contact: DeShuai Ma
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-16 09:13 UTC by Sam Marland
Modified: 2018-08-13 11:24 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-07-25 13:16:02 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Sam Marland 2018-07-16 09:13:21 UTC
Description of problem:

With 2 application nodes and two pods deployed using the preferredDuringSchedulingIgnoredDuringExecution ant-affinity. No more pods can be scheduled and they fail with the following error: 0/10 nodes are available: 2 ExistingPodsAntiAffinityRulesNotMatch, 2 MatchInterPodAffinity, 8 MatchNodeSelector.


Version-Release number of selected component (if applicable):
3.9.31


How reproducible:
Create a basic DC from one of the inbuilt templates (.Net core). Scale it to equal the number of schedulable nodes. Try and rollout a new DC using a rolling upgrade and it will fail as it tries to scale up before scaling down.


Steps to Reproduce:
1. Create basic app from template, such as .Net core
2. Patch the DC with

podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 1
        podAffinityTerm:
        labelSelector:
            matchExpressions:
            - key: affinity
            operator: In
            values:
            - anti
        topologyKey: kubernetes.io/hostname

3. scale up the dc to match number of app nodes
4. Check they have been scheduled on different nodes
5. Rollout a new version of the DC, and the deployment will fail because it can't find a suitable place to schedule the new pod.

Actual results:
Failed to rollout a new rolling DC as can't schedule the new pod.


Expected results:
Should be able to schedule a new pod given we're using preferredDuringSchedulingIgnoredDuringExecution and the 'preferred' bit suggests that if it can't find a suitable host it will put it someowhere else


Additional info:

Comment 3 Avesh Agarwal 2018-07-23 21:14:48 UTC
I have finally setup a 3.9 environment, and will try to reproduce this issue and will get back with my findings.

Comment 4 Avesh Agarwal 2018-07-24 03:24:14 UTC
Hi Sam,

Today, I had created a cluster on aws and tried various things related to pod antiaffinity but can not reproduce the issue. Everything seems to be working as expected. With preferred PodAntiAffinity, pods were able to schedule always. 

I will keep my cluster running for couple of days. So if you have some time tomorrow, we could setup a bluejeans session and I would be happy to show that what steps I took to reproduce the issue and how is everything working as expected.


In case you also have your cluster running, I would be happy to see why preferred pod anti-affinity is not working on your cluster.

Thanks
Avesh

Comment 5 Avesh Agarwal 2018-07-25 13:15:42 UTC
I am closing this bug after discussion with Sam as the issue can not be reproduced in 3.9.31 and upto 3.9.37 and latest HEAD of 3.9.


Note You need to log in before you can comment on or make changes to this bug.