Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1059374 - nodejs cartridge control script should sanity check pkill argument
Summary: nodejs cartridge control script should sanity check pkill argument
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Image
Version: 2.x
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Michal Fojtik
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-01-29 18:02 UTC by Andy Grimm
Modified: 2016-11-08 03:47 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-02-26 19:10:46 UTC


Attachments (Terms of Use)

Description Andy Grimm 2014-01-29 18:02:08 UTC
Description of problem:

If the scl command in lib/nodejs_context fails (such as due to hitting the gear's nproc limit), it is possible that this pkill command in the control script's "stop" function will be passed an empty string:

pkill -f "$(cartridge_bin)" 2>&1 || :

An empty string with -f matches _all_ processes, so this line could kill all processes in a gear, not just those belonging to the nodejs cartridge.

Version-Release number of selected component (if applicable):

openshift-origin-cartridge-nodejs-1.19.3-1.el6oso.noarch

How reproducible:

uncertain

Steps to Reproduce:
1. Run a number of processes/threads close to the ulimit (the soft limit in openshift online is 250, so maybe 247?), some of which are not nodejs threads
2. Attempt to run "gear restart"
3. Verify that all processes in the gear were killed, not just the nodejs processes.

Additional info:

I have not attempted to reproduce this yet.  This report comes from an error we saw in production where a process appears not to have been properly cgrouped, and so the restart script attempted to kill hundreds of processes on the system.  It got "Operation not permitted" errors, of course.  This bug report is my best guess as to why it was trying to kill non-nodejs processes.

Comment 1 Michal Fojtik 2014-01-29 20:07:04 UTC
I will add this to the control script revamp I'm currently working on.

Comment 2 openshift-github-bot 2014-01-30 16:08:49 UTC
Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/ba738de4ec275ab6108bb867a2ad1b8313e3daff
Bug 1059374 - Sanity check pkill in nodejs control script

Comment 3 Andy Grimm 2014-02-03 14:13:22 UTC
The above commit does not solve the problem.  You can't call $(cartridge_bin) again after you check the value; that defeats the purpose.  You need to call it once and save the result.

Comment 4 Michal Fojtik 2014-02-03 19:48:22 UTC
Fixed here:

https://github.com/openshift/origin-server/commit/56f78eb38f6182afd2a302badffae084a65b29f0

Somehow this commit must sneak out from the original commit.

Comment 6 Michal Fojtik 2014-02-10 10:40:23 UTC
Yan Du: I reworked this completely, so this code is not longer valid. The solution is now this:

https://github.com/openshift/origin-server/commit/57ae7d0db14a115c5bb2935bedb562af7243d8e9

Comment 7 Andy Grimm 2014-02-10 13:14:36 UTC
Michal, how does this code address the original issue?  I still see things like:

if is_supervisor_running; then
    pkill -f "$(supervisor_bin)" 2>&1 || :
...

The point of this bug is that you must first execute the command (such as $(supervisor_bin) ), store the result in a variable, check the variable, and _then_ pkill.  This is the only way that you can be sure pkill won't be called with an empty string.  So, you can do:

if is_supervisor_running; then
    supervisor=$(supervisor_bin)
    if [ -n "$supervisor" ]; then
        pkill -f "$supervisor" 2>&1 || :
    fi
elif is_node_running; then
    node=$(node_bin)
    if [ -n "$node" ]; then
        pkill -f "$node" 2>&1 || :
    fi
fi

Comment 8 openshift-github-bot 2014-02-10 15:21:59 UTC
Commit pushed to master at https://github.com/openshift/origin-server

https://github.com/openshift/origin-server/commit/94226013a1edfdc425bec5755dee03e9d74d37ff
Bug 1059374 - Sanitize supervisor_bin/node_bin before pkill

Comment 11 Yan Du 2014-02-12 06:42:07 UTC
Test on latest devenv, issue can't be reproduced, and the app works normally, then move to verified.


Note You need to log in before you can comment on or make changes to this bug.