Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1364193 - Ceilometer Collector grows in memory with gnocchi or mongo as backend
Summary: Ceilometer Collector grows in memory with gnocchi or mongo as backend
Status: CLOSED DUPLICATE of bug 1336664
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-ceilometer
Version: 9.0 (Mitaka)
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: ---
Assignee: Julien Danjou
QA Contact: Yurii Prokulevych
Depends On:
TreeView+ depends on / blocked
Reported: 2016-08-04 16:39 UTC by Alex Krzos
Modified: 2017-04-03 07:32 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2016-09-02 14:51:45 UTC
Target Upstream Version:

Attachments (Terms of Use)
System resource graphs (deleted)
2016-08-04 16:45 UTC, Alex Krzos
no flags Details

System ID Priority Status Summary Last Updated
Launchpad 1551667 None None None 2016-08-28 07:38:48 UTC

Description Alex Krzos 2016-08-04 16:39:46 UTC
Description of problem:
Ceilometer-collector grows in memory rapidly with mongo or gnocchi as a backend

Version-Release number of selected component (if applicable):
Openstack Mitaka (OSPd deployed overcloud)


How reproducible:
With large enough environment/deployment (200 instances)

Steps to Reproduce:
1. Deploy ha-overcloud (3 controllers) with 2 compute nodes
2. Tune nova allocation ratios to allow for more overcommitting of the compute nodes (if needed for your hardware)
3. Tune ceilometer for backend gnocchi (If desired to see ceilometer-collector memory growth with gnocchi)
4. Tune ceilometer to poll more often (default polling is 600s, I have tested 5s, 10s, 60s)
5. Boot small isntances on overcloud at rate of 20 every 1200s or so until you have 200 total instances

Actual results:
Ceilometer collector was witnessed growing in memory from a ~100MiB to over 5GiB and as high as 65GiB.  Eventually this leads to the entire cloud collapsing as there is no swap space for relief on both controllers and computes and the Linux OOM kills processes which causes pacemaker to restart services.

Expected results:
Ceilometer collector to not spike in memory usage.

Additional info:

I understand that mongo as a backend is going away, however this behavior is witnessed with both ceilometer backends (mongo and gnocchi).

View attached screen shots of graphs of tests:

Test 1: 1 OSPd, 3 Controllers, 2 computes - mongo ceilometer backend, 10s interval, 200 instances booted
Test 2: 1 OSPd, 3 Controllers, 2 computes - gnocchi ceilometer backend, 10s interval, 200 instances booted
Test 3: 1 OSPd, 3 Controllers, 2 computes - gnocchi ceilometer backend, 60s interval, 200 instances booted

Comment 2 Alex Krzos 2016-08-04 16:45:28 UTC
Created attachment 1187576 [details]
System resource graphs

System Resource Graphs of Overcloud with 200 instances and ceilometer with mongo and then gnocchi as a backend

Comment 10 Julien Danjou 2016-09-02 14:51:45 UTC

*** This bug has been marked as a duplicate of bug 1336664 ***

Note You need to log in before you can comment on or make changes to this bug.