Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 454456 - Unable to start carod immediately after stopped
Summary: Unable to start carod immediately after stopped
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: grid
Version: 1.0
Hardware: All
OS: Linux
medium
high
Target Milestone: 1.1
: ---
Assignee: Robert Rati
QA Contact: Kim van der Riet
URL:
Whiteboard:
Depends On:
Blocks: 454430
TreeView+ depends on / blocked
 
Reported: 2008-07-08 16:19 UTC by Robert Rati
Modified: 2009-02-04 16:06 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-02-04 16:06:04 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2009:0036 normal SHIPPED_LIVE Red Hat Enterprise MRG Grid 1.1 Release 2009-02-04 16:03:49 UTC

Description Robert Rati 2008-07-08 16:19:13 UTC
Description of problem:
Starting carod immediately after stopping it will fail.  This is because the
port that carod is listening on doesn't seem to be closing cleanly and is going
into the CLOSE_WAIT state.  The code attempts to cleaning shut down all sockets,
but something must be missed.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Robert Rati 2008-09-19 13:40:12 UTC
A sock was found to be missed being closed, and along with daemonizing carod the problem seems to have been solved.  I have been able to stop and restart carod many times back-to-back.

Comment 3 Jeff Needle 2008-12-05 21:38:06 UTC
Dec  5 15:37:34 north-11 carod: socket error 98: Address already in use
Dec  5 15:37:34 north-11 carod: Failed to listen on 127.0.0.1:10000
Dec  5 15:37:37 north-11 hook_fetch_work.py: socket error 107: Transport endpoint is not connected
Dec  5 15:37:38 north-11 hook_fetch_work.py: socket error 107: Transport endpoint is not connected
Dec  5 15:37:38 north-11 carod: socket error 98: Address already in use
Dec  5 15:37:38 north-11 carod: Failed to listen on 127.0.0.1:10000

with -8.  So still happening.

Comment 4 Robert Rati 2008-12-05 23:08:29 UTC
Needed to set SO_REUSESOCKET on the listen socket carod uses.  Fixed in:
condor-job-hooks-1.0-4
condor-low-latency-1.0-5

Comment 6 errata-xmlrpc 2009-02-04 16:06:04 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0036.html


Note You need to log in before you can comment on or make changes to this bug.