Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 230830 - 2Node cluster - dlm_recvd consumes resources, one node has bigger load than another
Summary: 2Node cluster - dlm_recvd consumes resources, one node has bigger load than a...
Status: CLOSED DUPLICATE of bug 212634
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: rgmanager
Version: 4
Hardware: All
OS: Linux
Target Milestone: ---
Assignee: Lon Hohberger
QA Contact: Cluster QE
Depends On:
TreeView+ depends on / blocked
Reported: 2007-03-03 09:52 UTC by Tomasz Jaszowski
Modified: 2009-04-16 20:22 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2007-03-05 16:40:07 UTC

Attachments (Terms of Use)

Description Tomasz Jaszowski 2007-03-03 09:52:28 UTC
Description of problem:

We've observed that dlm_recvd consumes much of CPU time on only one node

First 2node cluster
pro1a::> ps axo pid,start,time,args|egrep "clurgmgrd|dlm_recvd"
11740   Feb 19 00:00:00 clurgmgrd
11741   Feb 19 1-07:32:39 clurgmgrd
13915   Feb 19 1-02:22:35 [dlm_recvd]
20510 09:42:36 00:00:00 egrep clurgmgrd|dlm_recvd

pro2a::~> ps axo pid,start,time,args|egrep "clurgmgrd|dlm_recvd"
 9207   Feb 25 00:09:00 clurgmgrd -d
11100   Feb 25 00:15:02 [dlm_recvd]
32249 09:42:39 00:00:00 egrep clurgmgrd|dlm_recvd
Sat Mar  3 09:42:39 CET 2007

Second 2node cluster
pro1b::~> ps axo pid,start,time,args|egrep "clurgmgrd|dlm_recvd"
 9310   Feb 14 00:00:00 clurgmgrd
 9312   Feb 14 2-14:12:38 clurgmgrd
11589   Feb 14 2-05:39:19 [dlm_recvd]
26396 09:42:41 00:00:00 egrep clurgmgrd|dlm_recvd
Sat Mar  3 09:42:41 CET 2007

pro2b::~> ps axo pid,start,time,args|egrep "clurgmgrd|dlm_recvd"
27767   Feb 22 00:00:00 clurgmgrd
27768   Feb 22 00:15:23 clurgmgrd
29929   Feb 22 00:01:59 [dlm_recvd]
 6200 09:42:42 00:00:00 egrep clurgmgrd|dlm_recvd
Sat Mar  3 09:42:42 CET 2007

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
Actual results:
Much higher load on node1 than on node2

Expected results:
dlm_recvd should consume less of CPU time...

Please explain this disproportion in dlm_recvd CPU usage on node1 and node2

Additional info:
(regarding both cluster) In our configuration we use those nodes to work as
loadbalanced service so load should not differ to much, but we are observing
load 10 on node1 and load <1 on node2 ... 

We have checked on non production cluster (services enabled, but no data were
computed) and on node1 we saw load 10(max), while node2 had load ~0.5 (usually less)

we have 4gfs, and 12ext3 mounted from SAN storage, this cluster acts as
loadbalanced smtp/pop3/smb/mysql(ndbd) server. Balance is made by external
device. (resource usage of services are similar on both nodes)

Comment 1 Lon Hohberger 2007-03-05 16:40:07 UTC
If you aren't using them yet, please install the ones here:

Or the ones from bug #212634.

Ok, that out of the way, there will always be more CPU time consumed on a given
dlm_recvd simply due to the architecture of the DLM.  However, it shouldn't be a
large amount - the amount you're seeing is probably related to bug #212634 -
what happens is that in some cases, rgmanager will (obviously incorrectly) leak
DLM locks.  On the node mastering the locks, this will cause dlm_recvd to have
to traverse a longer list of locks - ending up with more and more system time
being used.

*** This bug has been marked as a duplicate of 212634 ***

Comment 2 Lon Hohberger 2007-03-05 16:43:58 UTC
The packages in #228823 should also fix this.

Note You need to log in before you can comment on or make changes to this bug.