Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1360794 - RFE: Enable features to make OpenShift more resilient to resource constraint
Summary: RFE: Enable features to make OpenShift more resilient to resource constraint
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Documentation
Version: 3.1.0
Hardware: Unspecified
OS: Unspecified
unspecified
low
Target Milestone: ---
: ---
Assignee: Vikram Goyal
QA Contact: Vikram Goyal
Vikram Goyal
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-07-27 13:44 UTC by Miheer Salunke
Modified: 2017-12-26 12:18 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-26 12:18:05 UTC


Attachments (Terms of Use)

Comment 1 Miheer Salunke 2016-07-27 13:48:01 UTC
1. Title of the request
 Enable features to make OpenShift more resilient to resource constraint

3. What is the nature and description of the request?  
Customer is testing OSE in a POC environment with 3 minimally sized node.  In testing, they tried to scale app to 100 pods, which caused the application nodes in the cluster to become unresponsive. We are currently recommending setting aside resources for system process as described in the documentation, and setting the max-pods setting to a more sensible value based on their node sizing.  It would be good if these sort of safety measures in the product were enabled by default.    
  
4. Why does the customer need this? (List the business requirements here)  
Having these setting enabled by default would make OpenShift much more resilient to load testing scenarios - it would be better to see the cluster not able to scale an application up due to resource constraints rather than see applicaton nodes go down because they tried to take on more load than they can handle.  

5. How would the customer like to achieve this? (List the functional requirements here)  
1.  Ansible installer set  a value for the max-pods option under node-config.yaml based on the node's memory/CPU paramaters. 
2.  Certain amount of system resources be reserved by default for system pmrocesses as documented in https://docs.openshift.com/enterprise/3.1/admin_guide/overcommit.html?
  
6. For each functional requirement listed, specify how Red Hat and the customer can test to confirm the requirement is successfully implemented.  
With these settings enabled, set an application to a replica count that is far beyond what the nodes should be able to handle, and verify the application nodes stay up and responsive
  
7. Is there already an existing RFE upstream or in Red Hat Bugzilla? Not that I have found

8. Does the customer have any specific timeline dependencies and which release would they like to target (i.e. RHEL5, RHEL6)?  
N/A - Currently testing on 3.1, but will likely go to 3.2 soon.
  
  
10. List any affected packages or components.  
atomic-openshift-node, openshift-ansible installer
  
11. Would the customer be able to assist in testing this functionality if implemented?  
Yes

Comment 3 Dan McPherson 2016-07-28 11:29:10 UTC
max-pods is not the typical way you would change the amount that should run on a node.  The better way is to set request and/or limit values on the pods you are deploying (in addition to setting the allocatable space on a node).  Is that happening in this case?  More details:

https://docs.openshift.org/latest/admin_guide/allocating_node_resources.html
https://docs.openshift.org/latest/admin_guide/overcommit.html
https://docs.openshift.org/latest/admin_guide/limits.html
https://docs.openshift.org/latest/dev_guide/compute_resources.html


Note You need to log in before you can comment on or make changes to this bug.