Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 162078 - ccsd performance problems
Summary: ccsd performance problems
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: ccs
Version: 4
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Lon Hohberger
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-06-29 18:44 UTC by Lon Hohberger
Modified: 2009-04-16 20:17 UTC (History)
6 users (show)

Fixed In Version: RHEL4 U2
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-10-04 17:33:48 UTC


Attachments (Terms of Use)
ccsd local socket patch (deleted)
2005-06-29 23:00 UTC, Lon Hohberger
no flags Details | Diff

Description Lon Hohberger 2005-06-29 18:44:37 UTC
Description of problem:

ccsd uses reserved ports to authenticate that the local user is, in fact, root.
 This is good for security purposes.

A client handshake / set of gets operates like this:

        foo = ccs_connect();
        while (ccs_get(foo, "query", &response) == 0) {
                handle_response(response);
        }
        ccs_disconnect(foo);

For large numbers of queries, however, the connect() will wait for a long time
sometimes -- several seconds.  My guess is that this is related to the fact that
for each ccs_connect(), ccs_disconnect() and ccs_get() call, we're binding to a
reserved port and subsequently connect()ing to ccsd.  My simple cluster
configuration does 531 connect() calls on reserved ports when starting up - and
it pauses every few seconds.  In that time period, the setup_socket_ipv6() call
hangs several times for around 3 seconds.

Version-Release number of selected component (if applicable): RHEL4 GA


How reproducible: Sometimes.

Steps to Reproduce:
1. Create a cluster with lots of services.
2. Start rgmanager with "clurgmgrd -fd".  Sometimes, it can take whole minutes
to "build resource trees".  In this instance, it's simply querying ccsd for
information in a systematic fashion.
  
Actual results:
rgmanager (and probably other apps) take a long time to read the configuration
information from ccsd.

Expected results:
Fast response time from ccsd.


Known workarounds:

* This does not happen with "ccsd -4".  Rgmanager starts up *very* quickly with
the -4 option.


Additional info:

* There's no specific behavior as to how frequent the connect code hangs. 
Sometimes it's after 20 connections, sometimes it's after 300.  I suspect it's
related to running out of reserved ports.
* This might be a case of the socket getting SOREUSEADDR in libccs for ipv4, but
not ipv6

Comment 1 Lon Hohberger 2005-06-29 18:46:41 UTC
Correction: SOREUSEADDR is set, but the way we do port selection might not be
appropriate.

Comment 3 Lon Hohberger 2005-06-29 23:00:29 UTC
Created attachment 116155 [details]
ccsd local socket patch

This patch allows libccs/ccsd to use local (UNIX domain) sockets for
communication, which obviates the TIME_WAIT and limited count of available
ports we have with IP protocols.  The permissions on the socket are &~077 when
created, so only root should be allowed to communicate over that socket.

This patch is compatible with existing installations:

* All applications built statically against the older libccs.a (which only uses
IP for communications) are forward-compatible with the new ccsd, and
* All apps built against the new libccs (with UNIX domain socket support) will
fall back to IPv6/IPv4 if local socket communication with ccsd is unavailable.
* Administrators may disable ccsd's use of UNIX domain sockets by running it
with the new -I option.

Comment 4 Lon Hohberger 2005-06-29 23:03:13 UTC
Note: Existing users of linux-cluster will only benefit from this patch after a
rebuild of each affected application, as most are (currently) statically built
against libccs.

Comment 6 Jonathan Earl Brassow 2005-10-04 17:33:48 UTC
In RHEL4 U2


Note You need to log in before you can comment on or make changes to this bug.