Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1057183 - Backport HAproxy auto-scaling enhancements
Summary: Backport HAproxy auto-scaling enhancements
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Image
Version: 2.0.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Brenton Leanhardt
QA Contact: libra bugs
URL:
Whiteboard:
: 1056700 (view as bug list)
Depends On:
Blocks: 990500 1036728
TreeView+ depends on / blocked
 
Reported: 2014-01-23 15:24 UTC by Luke Meyer
Modified: 2017-03-08 17:36 UTC (History)
5 users (show)

Fixed In Version: openshift-origin-cartridge-haproxy-1.17.3.2-1.el6op
Doc Type: Enhancement
Doc Text:
Previously, the way the HAProxy cartridge determined when to scale an application was not optimal because it checked the number of connections against a fixed threshold, which could impact stability or performance. This enhancement improves the HAProxy cartridge so that it uses a moving average of the number of current connections and provides a configurable threshold. The following command must be run after applying this fix: # oo-admin-upgrade upgrade-node --version=2.0.3 See the Solution section in the errata advisory for full details.
Clone Of:
Environment:
Last Closed: 2014-02-25 15:43:47 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2014:0209 normal SHIPPED_LIVE Red Hat OpenShift Enterprise 2.0.3 bugfix and enhancement update 2014-02-25 20:40:32 UTC

Description Luke Meyer 2014-01-23 15:24:49 UTC
Description of problem:
HAproxy's current method of determining when to scale is suboptimal. It scales based on a spot-check of the number of current connections, with unconfigurable thresholds.

This should be improved by backporting upstream PR https://github.com/openshift/origin-server/pull/4438 which introduces a moving average of the current connections (which should be a lot more stable than the spot check) and also the ability to configure thresholds.

commit 47428027c64d7e282e1b91b682e1d05ea11fceb3
Author: Dan McPherson <dmcphers@redhat.com>
Date:   Wed Jan 8 21:24:58 2014 -0500

Comment 1 Luke Meyer 2014-01-23 15:26:44 UTC

*** This bug has been marked as a duplicate of bug 1056700 ***

Comment 2 Luke Meyer 2014-01-23 19:32:13 UTC
Moving notes from 1056700 here to be publicly available.

-------------------------------------------------------

This include the following upstream commits:

commit 47428027c64d7e282e1b91b682e1d05ea11fceb3
Author: Dan McPherson <dmcphers@redhat.com>
Date:   Wed Jan 8 21:24:58 2014 -0500

    Make sessions per gear configurable and use moving average for num sessions

commit 05d52c0b06301d6e24b5ac1a31cd107ff16e31c2
Author: Dan McPherson <dmcphers@redhat.com>
Date:   Fri Jan 10 14:14:20 2014 -0500

    Bug 1051446

commit f116af85558db377d849246c089824c670215c15
Author: Dan McPherson <dmcphers@redhat.com>
Date:   Mon Jan 13 13:09:12 2014 -0500

    Bisect the scale up/down threshold more evenly for lower scale numbers

commit 8dd6002c5578930efce2d19a0fe486f8d309940a
Author: Dan McPherson <dmcphers@redhat.com>
Date:   Wed Jan 22 11:06:35 2014 -0500

    Bug 1056483 - Better error messaging with direct usage of haproxy_ctld

commit c362b17efe2037ee6b5a84e0dabe5aa6b447362c
Author: Ben Parees <bparees@redhat.com>
Date:   Fri Nov 15 17:11:46 2013 -0500

    Bug 1029679: handle connection refused error with clean error message



----------------------------------------------------------------

PR: https://github.com/openshift/enterprise-server/pull/204


After applying this update and restarting mcollective admins will need to run the following _on the broker_:

rm -rf /tmp/oo-upgrade
oo-admin-upgrade upgrade-node --version 2.0.3

Comment 3 Luke Meyer 2014-01-23 19:33:11 UTC
*** Bug 1056700 has been marked as a duplicate of this bug. ***

Comment 6 John W. Lamb 2014-01-23 21:55:34 UTC
Just FYI, no puddle has been created with this bug yet, even though it is marked ON_QA. Will create the new puddle tomorrow.

Comment 7 Gaoyun Pei 2014-01-26 04:07:31 UTC
verify this bug with package openshift-origin-cartridge-haproxy-1.17.3.2-1.el6op.noarch

Auto scaling up/down works well with the moving average algorithm.

...
D, [2014-01-25T22:18:38.029551 #30094] DEBUG -- : Local sessions 4
D, [2014-01-25T22:18:38.029656 #30094] DEBUG -- : Got stats from 0 remote proxies.
D, [2014-01-25T22:18:38.030687 #30094] DEBUG -- : Local sessions 8
D, [2014-01-25T22:18:38.030748 #30094] DEBUG -- : Got stats from 0 remote proxies.
D, [2014-01-25T22:18:38.030830 #30094] DEBUG -- : GEAR_INFO - capacity: 50.0% gear_count: 1 sessions: 8 up/remove_thresh: 90.0%/1.0% sec_left_til_remove: 0 gear_remove_thresh: 0/20
D, [2014-01-25T22:18:43.041988 #30094] DEBUG -- : Local sessions 12
D, [2014-01-25T22:18:43.042125 #30094] DEBUG -- : Got stats from 0 remote proxies.
D, [2014-01-25T22:18:43.052249 #30094] DEBUG -- : Local sessions 17
D, [2014-01-25T22:18:43.052371 #30094] DEBUG -- : Got stats from 0 remote proxies.
D, [2014-01-25T22:18:43.052460 #30094] DEBUG -- : GEAR_INFO - capacity: 106.25% gear_count: 1 sessions: 17 up/remove_thresh: 90.0%/1.0% sec_left_til_remove: 0 gear_remove_thresh: 0/20
D, [2014-01-25T22:18:48.053394 #30094] DEBUG -- : Local sessions 17
D, [2014-01-25T22:18:48.053535 #30094] DEBUG -- : Got stats from 0 remote proxies.
I, [2014-01-25T22:18:48.053630 #30094]  INFO -- : add-gear - capacity: 106.25% gear_count: 1 sessions: 17 up_thresh: 90.0%
I, [2014-01-25T22:19:38.080141 #30094]  INFO -- : add-gear - exit_code: 0  output:
D, [2014-01-25T22:19:38.081117 #30094] DEBUG -- : Local sessions 17
D, [2014-01-25T22:19:38.081172 #30094] DEBUG -- : Got stats from 0 remote proxies.
D, [2014-01-25T22:19:38.081255 #30094] DEBUG -- : GEAR_INFO - capacity: 53.125% gear_count: 2 sessions: 17 up/remove_thresh: 90.0%/31.5% sec_left_til_remove: 550 gear_remove_thresh: 0/20
...

----------------------------------------------------------------------
...
D, [2014-01-25T22:28:48.559652 #30094] DEBUG -- : GEAR_INFO - capacity: 0.0% gear_count: 2 sessions: 0 up/remove_thresh: 90.0%/31.5% sec_left_til_remove: 0 gear_remove_thresh: 20/20
D, [2014-01-25T22:28:53.560616 #30094] DEBUG -- : Local sessions 0
D, [2014-01-25T22:28:53.560736 #30094] DEBUG -- : Got stats from 0 remote proxies.
I, [2014-01-25T22:28:53.560882 #30094]  INFO -- : remove-gear - capacity: 0.0% gear_count: 2 sessions: 0 remove_thresh: 31.5%
I, [2014-01-25T22:29:04.925786 #30094]  INFO -- : remove-gear - exit_code: 0  output:
D, [2014-01-25T22:29:04.927707 #30094] DEBUG -- : Local sessions 0
D, [2014-01-25T22:29:04.927767 #30094] DEBUG -- : Got stats from 0 remote proxies.
D, [2014-01-25T22:29:04.927839 #30094] DEBUG -- : GEAR_INFO - capacity: 0.0% gear_count: 1 sessions: 0 up/remove_thresh: 90.0%/1.0% sec_left_til_remove: 0 gear_remove_thresh: 20/20
...

And the issue BZ#1051446 mentioned didn't appear.


while the app is topped:
[app3-yes.ose-0116.com 52e47aa60ca874772c000007]\> haproxy_ctld -u
An error occurred; try again later: Could not connect to the application.  Check if the application is stopped.


[root@broker openshift]# rhc cartridge scale -c python-2.7 -a app3 --min 2 --max 2
This operation will run until the application is at the minimum scale and may take several minutes.
Setting scale range for python-2.7 ... done

[app3-yes.ose-0116.com 52e47aa60ca874772c000007]\> haproxy_ctld -d
Cannot remove gear because min limit '2' reached.
[app3-yes.ose-0116.com 52e47aa60ca874772c000007]\> echo $?
1
[app3-yes.ose-0116.com 52e47aa60ca874772c000007]\> haproxy_ctld -u
Cannot add gear because max limit '2' reached.
[app3-yes.ose-0116.com 52e47aa60ca874772c000007]\> echo $?
1

so move this bug to verified

Comment 9 errata-xmlrpc 2014-02-25 15:43:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0209.html


Note You need to log in before you can comment on or make changes to this bug.