Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1694929 - [cloud] "--max-nodes-total" doesn't work in 3.11
Summary: [cloud] "--max-nodes-total" doesn't work in 3.11
Keywords:
Status: NEW
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 3.11.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.11.z
Assignee: Jan Chaloupka
QA Contact: sunzhaohua
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-02 05:57 UTC by sunzhaohua
Modified: 2019-04-02 07:02 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)
autoscaler log (deleted)
2019-04-02 05:57 UTC, sunzhaohua
no flags Details

Description sunzhaohua 2019-04-02 05:57:59 UTC
Created attachment 1550872 [details]
autoscaler log

Description of problem:
"--max-nodes-total" doesn't work in 3.11, autoscaler can scale up nodes to a number greater than the intended value. This is similar with https://bugzilla.redhat.com/show_bug.cgi?id=1670695.


Version-Release number of selected component (if applicable):
openshift v3.11.100
kubernetes v1.11.0+d4cacc0

How reproducible:
Always

Steps to Reproduce:
1. Deploy autoscaler in 3.11 aws cluster, set "--max-nodes-total=7"
    spec:
      containers:
      - args:
        - /bin/cluster-autoscaler
        - --alsologtostderr
        - --v=4
        - --skip-nodes-with-local-storage=False
        - --leader-elect-resource-lock=configmaps
        - --namespace=cluster-autoscaler
        - --cloud-provider=aws
        - --nodes=0:3:zhsun-ASG6
        - --nodes=0:3:zhsun-ASG7
        - --scale-down-delay-after-failure=10s
        - --scale-down-unneeded-time=10s
        - --scale-down-delay-after-add=10s
        - --max-nodes-total=7
2. Create pod to scale up the cluster.
3. Check logs and node

Actual results:
Node number is greater than the expected value.

$ oc get node
NAME                                           STATUS    ROLES     AGE       VERSION
ip-172-31-107-134.us-east-2.compute.internal   Ready     compute   1m        v1.11.0+d4cacc0
ip-172-31-130-73.us-east-2.compute.internal    Ready     compute   1h        v1.11.0+d4cacc0
ip-172-31-138-220.us-east-2.compute.internal   Ready     compute   1m        v1.11.0+d4cacc0
ip-172-31-18-71.us-east-2.compute.internal     Ready     compute   1m        v1.11.0+d4cacc0
ip-172-31-201-207.us-east-2.compute.internal   Ready     compute   1m        v1.11.0+d4cacc0
ip-172-31-35-186.us-east-2.compute.internal    Ready     compute   1m        v1.11.0+d4cacc0
ip-172-31-63-168.us-east-2.compute.internal    Ready     compute   1m        v1.11.0+d4cacc0
ip-172-31-79-172.us-east-2.compute.internal    Ready     infra     1h        v1.11.0+d4cacc0
ip-172-31-95-223.us-east-2.compute.internal    Ready     master    1h        v1.11.0+d4cacc0

Expected results:
Node number is less than the expected value.

Additional info:


Note You need to log in before you can comment on or make changes to this bug.