Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 454355 - dlm: 3 nodes looking for a lock which does not exist?
Summary: dlm: 3 nodes looking for a lock which does not exist?
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.3
Hardware: All
OS: Linux
low
low
Target Milestone: rc
: 5.5
Assignee: David Teigland
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-07-07 21:57 UTC by Lon Hohberger
Modified: 2018-10-20 03:18 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-09-02 21:13:09 UTC
Target Upstream Version:


Attachments (Terms of Use)
Backtrace of rgmanager (deleted)
2008-07-07 21:58 UTC, Lon Hohberger
no flags Details
debugfs DLM information on the rgmanager lockspace (deleted)
2008-07-07 21:59 UTC, Lon Hohberger
no flags Details

Description Lon Hohberger 2008-07-07 21:57:13 UTC
Description of problem:

I did some support with a community user on #linux-cluster of what started out
seeming like an rgmanager problem, but ended up looking very much like a DLM bug.

* rgmanager-2.0.38-2.el5_2.1
* kernel-2.6.18-92.1.1.el5xen in Xen domU 

(All cluster nodes are domU)

Clustat (rgmanager utility to get info about running services) was timing out. 
In the past, this has been caused by a number of things.

Comment 1 Lon Hohberger 2008-07-07 21:58:12 UTC
Created attachment 311207 [details]
Backtrace of rgmanager

Thread 8 is stuck waiting for a reply from the DLM.

Comment 2 Lon Hohberger 2008-07-07 21:59:18 UTC
Created attachment 311208 [details]
debugfs DLM information on the rgmanager lockspace

This is from all 4 nodes.  Several are looking for the master holder of the
"usrm::vf" lock.  None are reported to be the master.

Comment 3 Chris St. Pierre 2008-07-07 22:03:44 UTC
As requested, group_tool -v on all nodes.

Since it sounds like we'll be doing some detailed troubleshooting on this, might 
as well use actual node names.

# Node: Chico.  Status: Functional.
type             level name       id       state node id local_done
fence            0     default    00010002 none
[1 2 3 4]
dlm              1     rgmanager  00010001 none
[1 2 3 4]

# Node: Zeppo.  Status: Functional.
type             level name       id       state node id local_done
fence            0     default    00010002 none
[1 2 3 4]
dlm              1     rgmanager  00010001 none
[1 2 3 4]

# Node: Harpo.  Status: Functional.  /sys/kernel/debug/dlm/rgmanager* was empty
type             level name       id       state node id local_done
fence            0     default    00010002 none
[1 2 3 4]
dlm              1     rgmanager  00010001 none
[1 2 3 4]

# Node: Groucho.  Status: rgmanager hosed.
type             level name       id       state node id local_done
fence            0     default    00010002 none
[1 2 3 4]
dlm              1     rgmanager  00010001 none
[1 2 3 4]



Note You need to log in before you can comment on or make changes to this bug.