Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1056712 - [new relic] ssh_authorized_keys#modify has race condition during gear delete
Summary: [new relic] ssh_authorized_keys#modify has race condition during gear delete
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Online
Classification: Red Hat
Component: Pod
Version: 2.x
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: ---
Assignee: Abhishek Gupta
QA Contact: libra bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-01-22 18:36 UTC by Jhon Honce
Modified: 2015-05-15 00:24 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-02-26 19:10:07 UTC


Attachments (Terms of Use)

Description Jhon Honce 2014-01-22 18:36:49 UTC
Description of problem:
 Errno::ENOENT: No such file or directory - /var/lib/openshift/52df8f166cec0ef18900001f/.ssh/authorized_keys 

…9/lib/openshift-origin-common/utils/file_needs_sync.rb:  36:in `initialize'
…9/lib/openshift-origin-common/utils/file_needs_sync.rb:  36:in `open'
…9/lib/openshift-origin-common/utils/file_needs_sync.rb:  36:in `open'
…model/application_container_ext/ssh_authorized_keys.rb: 260:in `block (2 levels) in modify'
…1.18.9/lib/openshift-origin-common/utils/path_utils.rb:  93:in `block in flock'
…9/lib/openshift-origin-common/utils/file_needs_sync.rb:  38:in `block in open'
…9/lib/openshift-origin-common/utils/file_needs_sync.rb:  36:in `open'
…9/lib/openshift-origin-common/utils/file_needs_sync.rb:  36:in `open'
…1.18.9/lib/openshift-origin-common/utils/path_utils.rb:  88:in `flock'
…model/application_container_ext/ssh_authorized_keys.rb: 259:in `block in modify'
…model/application_container_ext/ssh_authorized_keys.rb: 258:in `modify'
…model/application_container_ext/ssh_authorized_keys.rb: 135:in `remove_keys'
…in-node/model/application_container_ext/environment.rb: 226:in `remove_ssh_keys'
…usr/libexec/mcollective/mcollective/agent/openshift.rb: 458:in `block in oo_authorized_ssh_key_batch_remove'
…usr/libexec/mcollective/mcollective/agent/openshift.rb: 318:in `with_container_from_args'
…usr/libexec/mcollective/mcollective/agent/openshift.rb: 457:in `oo_authorized_ssh_key_batch_remove'
…usr/libexec/mcollective/mcollective/agent/openshift.rb: 144:in `execute_action'
…usr/libexec/mcollective/mcollective/agent/openshift.rb: 201:in `block in execute_parallel_action'
…usr/libexec/mcollective/mcollective/agent/openshift.rb: 194:in `each'
…usr/libexec/mcollective/mcollective/agent/openshift.rb: 194:in `execute_parallel_action'
…h/ruby193/root/usr/share/ruby/mcollective/rpc/agent.rb:  86:in `handlemsg'
…t/rh/ruby193/root/usr/share/ruby/mcollective/agents.rb: 126:in `block (2 levels) in dispatch'
         /opt/rh/ruby193/root/usr/share/ruby/timeout.rb:  69:in `timeout'
…t/rh/ruby193/root/usr/share/ruby/mcollective/agents.rb: 125:in `block in dispatch'

Version-Release number of selected component (if applicable):


How reproducible:
hard


Steps to Reproduce:
1. 
2.
3.

Actual results:
Node reports exception

Expected results:
Log issue, no exception returned

Additional info:

Comment 2 Lili Nader 2014-01-23 02:38:12 UTC
agupta, do you think this is happening because the the pending op that is removing the key is being executed after gear is deleted.  In which case the code should check for existence of the authorized_keys file and skip if it is not there?

Comment 3 Dan McPherson 2014-01-23 16:06:57 UTC
Jhon,

  It looks like you changed these perms.  Was it related to the bug or did it just seem like the right thing to do?

-Dan

Comment 4 Dan McPherson 2014-01-23 17:24:03 UTC
Jhon,

  Ignore the last comment.  Wrong bug.


-Dan

Comment 5 Abhishek Gupta 2014-01-23 17:56:43 UTC
Since we have modified the functionality to not create ssh keys for each gear, we will no longer have this issue moving forward. However, for existing scalable applications with ssh keys for each gear, we will need to either fix this bug or clean out the ssh keys for the non-haproxy gear. 

Lowering the severity since this does not bubble up to the user and is just confined to the logs.

Comment 6 Abhishek Gupta 2014-02-05 21:33:53 UTC
This was being caused by the fact that the gear server_identity was not updated after a gear move. This caused subsequent gear operations to go to the wrong node and hence encounter this issue. We have fixed the gears that had the incorrect server_identity and will be monitoring for this error and revisit the move code if it happens again.

Comment 7 Jianwei Hou 2014-02-07 05:07:43 UTC
Mark this bug as verfied according to the last comment.


Note You need to log in before you can comment on or make changes to this bug.