Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1064570 - Cannot change the starting UID for gears to a high number
Summary: Cannot change the starting UID for gears to a high number
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Containers
Version: 2.x
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: ---
Assignee: Jhon Honce
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks: 1064631
TreeView+ depends on / blocked
 
Reported: 2014-02-12 21:02 UTC by Luke Meyer
Modified: 2015-05-14 23:34 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1064631 (view as bug list)
Environment:
Last Closed: 2014-03-24 18:45:17 UTC
Target Upstream Version:


Attachments (Terms of Use)
platform-trace.log segment from trying to create undistricted gear (deleted)
2014-02-12 21:02 UTC, Luke Meyer
no flags Details

Description Luke Meyer 2014-02-12 21:02:59 UTC
Created attachment 862535 [details]
platform-trace.log segment from trying to create undistricted gear

Description of problem:
I have a policy where UIDs below 10,000,000 on my hosts need to be reserved for corporate logins. So, I would like gear UIDs on my nodes to start at 10000000. This apparently cannot be done.

Steps to Reproduce:
1. On my one-node system without districts, set the following values in /etc/openshift/node.conf:
GEAR_MIN_UID=1000000
UID_BEGIN=1000000     # note: setting both, see bug 1051015
GEAR_MAX_UID=1005999  # range is still the same size
2. service ruby193-mcollective restart
3. Try to create a scaled php+mysql app. It fails.
4. Create a district and put the node in it.
5. Try to create a scaled php+mysql app.

Results:
3. App creation fails with node execution error. There isn't much in the logs to indicate what went wrong, but I suspect it has to do with the port calculation coming up with something bogus. The platform-trace.log for this operation is attached.
5. The app creation succeeds! However, the gears have UIDs in the normal range, not in the range I want.

Expected results:
Both should work, both should end up with gear UIDs in the specified range (map from district UIDs to the range), and everything should route properly between gears via the external ports.

Additional info:
By the way, once this is working, if I already have nodes with gears in the usual range, I expect to be able to move those gears to new nodes with the higher range so that I can get rid of the old nodes.

Comment 1 Luke Meyer 2014-02-12 21:27:39 UTC
1M or 10M, either way it fails.

And I meant to note that at step 3, it does actually try to create the gear user with UID_BEGIN. So the node is reading the values.

Actually, I just tried the districted case again and it fails much the same. It may have succeeded on my first try only because I actually had another "normal" node in the district, which is where the PHP cart landed, and only the mysql gear landed on the the test node.

And I should note the log was actually from an Enterprise server. But I don't think this works in Origin either.

Comment 2 Luke Meyer 2014-02-13 01:53:47 UTC
The reason the second try at the districted case "failed the same" is because I failed to actually put the node in the district. Never mind that bit.

Comment 3 Luke Meyer 2014-02-13 21:14:25 UTC
Just to further confuse things, I noticed that there's a parameter /etc/openshift/plugins.d/openshift-origin-msg-broker-mcollective.conf:DISTRICTS_FIRST_UID that needs to be set. I set this to match the 1000000 that's on the node, regenerated the district, and tried this again. I also lowered the mcollective server log level to "info" so I wouldn't be drowned in debug.

The gear create still failed. However there's a useful error message in the mcollective log from the app-create action:

INFO -- : openshift.rb:134:in `execute_action' Executing action [app-create] using method oo_app_create with args [{"--with-app-uuid"=>"52fd2e4c215df2ccd300002c", "--with-app-name"=>"phps", "--with-container-uuid"=>"52fd2e4c215df2ccd3000030", "--with-container-name"=>"52fd2e4c215df2ccd3000030", "--with-namespace"=>"demo", "--with-uid"=>1002766, [...] "--cart-name"=>"openshift-origin-node"}]
INFO -- : openshift.rb:340:in `rescue in oo_app_create' Argument resolved to a UID too large for MCS set parameters: 1002766

Comment 4 Josep 'Pep' Turro Mauri 2014-03-14 12:12:55 UTC
Besides the SELinux context limit, there seems to be a [far lower] limit for UIDs in the cgroups code:

node/lib/openshift-origin-node/utils/cgroups/libcgroup.rb:

313           # Compute the network class id
314           # Major = 1
315           # Minor = UID
316           # Caveat: 0 <= Minor <= 0xFFFF (65535)
317           def net_cls
318             major = 1
319             if (uid < 1) or (uid > 0xFFFF)
320               raise RuntimeError, "Cannot assign network class id for: #{uid}"
321             end
322             (major << 16) + uid
323           end

With GEAR_MIN_UID=70000 this results in:

I, [2014-03-10T18:38:53.791053 #1298]  INFO -- : openshift.rb:150:in `execute_action' Finished executing action [app-create] (1)
I, [2014-03-10T18:38:53.807180 #1298]  INFO -- : openshift.rb:114:in `cartridge_do_action' cartridge_do_action failed (1)
------
Cannot assign network class id for: 70000
------)

Comment 5 Brenton Leanhardt 2014-03-17 20:32:23 UTC
I spent some time looking at this upstream.  See:
https://github.com/openshift/origin-server/blob/4d64ff89098486ff14eb6fedcf65ccf5ccc623b7/node/lib/openshift-origin-node/utils/selinux.rb#L116

"uid_offset + set_size * ( set_size - 1) / 2" means we're at least computing mcs labels for uids under 523,776 by default.  However, I lowered my uid range from 1M to just below that number and immediately hit the problem Pep mentions in Comment #4.

On digging in to tc as best I can tell this is not technically a limitation of tc (although some mention it being a bad api: http://permalink.gmane.org/gmane.linux.network/204978).

Take a look at 9.5.2.1 @ http://www.lartc.org/lartc.html#AEN882.  Our implementation today seems to have all child rules directly under root with the classid == the uid of the gear.  Since classid major and minor number are both limited to 16 bits we have an artificial limit.  I think the correct way to solve this would be restructuring the tc hierarchy.

I tried setting my max uid under 16k but I'm still hitting an selinux error where the gear can't access a port it's trying to bind to.  I think there may be an additional mcs labeling problem or port calculation problem.  I'll have to continue debugging.

Comment 7 zhaozhanqi 2014-03-19 06:31:10 UTC
This bug can be reproduced on devenv_4533, so it also a online issue

Comment 9 Jhon Honce 2014-03-24 18:45:17 UTC
Please follow https://trello.com/c/8ZF7nWXE for tracking this issue.


Note You need to log in before you can comment on or make changes to this bug.