Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 212634 - rgmanager times out when using clustat
Summary: rgmanager times out when using clustat
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: rgmanager
Version: 4
Hardware: All
OS: Linux
Target Milestone: ---
Assignee: Lon Hohberger
QA Contact: Cluster QE
: 230830 (view as bug list)
Depends On:
TreeView+ depends on / blocked
Reported: 2006-10-27 19:50 UTC by Lenny Maiorani
Modified: 2009-04-16 20:21 UTC (History)
6 users (show)

Fixed In Version: RHBA-2007-0149
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2007-05-10 21:19:27 UTC

Attachments (Terms of Use)
dlm debug and lock info from /proc (deleted)
2006-10-27 19:50 UTC, Lenny Maiorani
no flags Details
/proc/cluster/dlm_debug (deleted)
2006-11-27 18:32 UTC, Lenny Maiorani
no flags Details
/proc/meminfo (deleted)
2006-11-27 18:32 UTC, Lenny Maiorani
no flags Details
ps -auwwx (deleted)
2006-11-27 18:33 UTC, Lenny Maiorani
no flags Details
/proc/slabinfo (deleted)
2006-11-27 18:33 UTC, Lenny Maiorani
no flags Details
Fixes subtle dlm lock leak created by rgmanager (deleted)
2006-12-12 20:40 UTC, Lon Hohberger
no flags Details | Diff
Source RPM with this patch + patch for 213312 (deleted)
2006-12-12 20:45 UTC, Lon Hohberger
no flags Details

System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2007:0149 normal SHIPPED_LIVE rgmanager bug fix update 2007-05-10 21:16:41 UTC

Description Lenny Maiorani 2006-10-27 19:50:49 UTC
Description of problem:
rgmanager times out when attempting to get service list via clustat. locks also
are in an odd state

also, 'cat /proc/cluster/dlm_locks' reports "Cannot allocate memory" and node03
has dlm_recvd using about 50-95% of the CPU.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. unknown
Additional info:

will attach /proc/cluster/dlm_debug and /proc/cluster/dlm_locks (Magma) info

Comment 1 Lenny Maiorani 2006-10-27 19:50:49 UTC
Created attachment 139610 [details]
dlm debug and lock info from /proc

Comment 2 Lon Hohberger 2006-11-03 16:14:32 UTC
Lenny, when it can't allocate memory, is it userspace?  E.g. is there any
process obviously soaking up all memory on the system ?

Comment 3 Lenny Maiorani 2006-11-03 16:36:04 UTC
Memory usage was normal. Not sure if it is the 'cat' complaining about memory or
the /proc fs.

Comment 4 Lon Hohberger 2006-11-03 16:41:56 UTC
Can you get /proc/slabinfo from the nodes, and if possible, 'ps -auwwx'  ?

Comment 5 Lenny Maiorani 2006-11-03 16:51:19 UTC
We do not have a way of reproducing this, but if it comes up again I will get
this info.

Comment 6 Lenny Maiorani 2006-11-27 18:31:56 UTC
Lon, I am seeing this now on several clusters. They are all complaining in
/proc/cluster/dlm_debug from clvmd.

I will attach some logs...

Comment 7 Lenny Maiorani 2006-11-27 18:32:28 UTC
Created attachment 142198 [details]

Comment 8 Lenny Maiorani 2006-11-27 18:32:59 UTC
Created attachment 142199 [details]

Comment 9 Lenny Maiorani 2006-11-27 18:33:31 UTC
Created attachment 142200 [details]
ps -auwwx

Comment 10 Lenny Maiorani 2006-11-27 18:33:58 UTC
Created attachment 142201 [details]

Comment 11 Lon Hohberger 2006-12-11 20:02:38 UTC
Lenny, I am pretty sure this is a bug in rgmanager which is produced by the
clu_lock_verbose() function.

I'll have a build ready soon.  

Comment 12 Lon Hohberger 2006-12-11 20:08:17 UTC
Since the clu_lock_verbose() function does nothing useful, I'm removing it from
RHCS4 (it's already been removed in RHCS5).

Comment 13 Lon Hohberger 2006-12-12 20:40:56 UTC
Created attachment 143442 [details]
Fixes subtle dlm lock leak created by rgmanager

Comment 14 Lon Hohberger 2006-12-12 20:45:51 UTC
Created attachment 143443 [details]
Source RPM with this patch + patch for 213312

Comment 16 Lon Hohberger 2006-12-13 18:21:00 UTC
Fixes in CVS.

Comment 17 Lenny Maiorani 2007-01-03 17:27:02 UTC
Ok, I am running with this now. Let me get some bake time on it before declaring
this the fix.

Comment 18 Lon Hohberger 2007-01-09 21:14:10 UTC
Same fix(es), based on the 1.9.54 errata (exactly the same as .53, except it
includes an NFS fix)

Comment 23 Lon Hohberger 2007-03-05 16:40:32 UTC
*** Bug 230830 has been marked as a duplicate of this bug. ***

Comment 27 Lon Hohberger 2007-03-21 18:43:30 UTC
Alternatively, we will be calling it 'beta' pretty soon.

Comment 34 Katriel Traum 2007-04-17 06:07:59 UTC
Can you specify what are "bad" values or increments to dlm_lkb in /proc/slabinfo?

Comment 39 Red Hat Bugzilla 2007-05-10 21:19:27 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

Note You need to log in before you can comment on or make changes to this bug.