Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1361559 - 300 node install OOMs on a system with 64GB RAM
Summary: 300 node install OOMs on a system with 64GB RAM
Keywords:
Status: CLOSED DUPLICATE of bug 1360440
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 3.3.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: ---
Assignee: Andrew Butcher
QA Contact: Johnny Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-07-29 12:15 UTC by Mike Fiedler
Modified: 2016-08-04 18:48 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-08-04 18:48:15 UTC


Attachments (Terms of Use)
Ansible log (deleted)
2016-07-29 12:15 UTC, Mike Fiedler
no flags Details

Description Mike Fiedler 2016-07-29 12:15:16 UTC
Created attachment 1185501 [details]
Ansible log

Description of problem:

3.3.0.10

Trying to run the node scaleup playbook to install 300 nodes runs for 8.5 hours and then OOMs.   I had vmstat running during the install and the OOM was real.


Version-Release number of selected component (if applicable):

3.3.0.10


How reproducible:  1 occurrence in 1 attempt


Steps to Reproduce:
1.  Installed a core cluster of 3x master, 3x etcd, 2x registry/router, 1x master load balancer, 2x test nodes.   Install successful and verified the cluster was operational with conformance tests
2.  Created 298 new node instances and added them to the inventory. 
3.  Ansible forks=100, timeout=30.   This config has installed 300 nodes successfully in about 2.5 hours in Ansible 1.9.4
4.  Run the byo/openshift-nodes/scaleup.yml playbook on the inventory

Actual results:

Install runs for 8.5 hours on a system with 16 vCPU/64GB RAM and then dies with the message ERROR! Unexpected Exception: [Errno 12] Cannot allocate memory 

vmstat confirms the system was low on memory

Expected results:

Successful install in a timeframe similar to 3.2.

Additional info:

Ansible log attached

Comment 1 Mike Fiedler 2016-07-29 12:37:00 UTC
Possible duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1360440 but getting it out there for the sake of completeness.

Comment 2 Scott Dodson 2016-08-04 18:48:15 UTC

*** This bug has been marked as a duplicate of bug 1360440 ***


Note You need to log in before you can comment on or make changes to this bug.