Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 235675 - LSPP: INFO: possible recursive locking detected
Summary: LSPP: INFO: possible recursive locking detected
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Eric Sandeen
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks: RHEL5LSPPCertTracker
TreeView+ depends on / blocked
 
Reported: 2007-04-09 15:07 UTC by Linda Knippers
Modified: 2009-06-19 16:35 UTC (History)
6 users (show)

Fixed In Version: RHBA-2007-0959
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-11-07 19:46:26 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2007:0959 normal SHIPPED_LIVE Updated kernel packages for Red Hat Enterprise Linux 5 Update 1 2007-11-08 00:47:37 UTC
Linux Kernel 8130 None None None Never

Description Linda Knippers 2007-04-09 15:07:26 UTC
Description of problem:
I had several systems configured for lspp running the lspp .72 kernel
sitting idle over the weekend.  Two of them had 
"INFO: possible recursive locking detected" 
messages on the console at about 4 PM Saturday morning.

Version-Release number of selected component (if applicable):
2.6.18-8.1.1.lspp.72.el5 kernel with selinux .50 policy and
the other packages from the lspp repo.

How reproducible:
I'm not sure it is reproducible.

Steps to Reproduce:
1.
2.
3.
  
Actual results:

These are the messages from each system's messages file.

This system is an x86_64 system that was refreshly installed on
Friday using the lastest ks.

Apr  7 04:04:41 cert-e3 kernel:
Apr  7 04:04:41 cert-e3 kernel: =============================================
Apr  7 04:04:41 cert-e3 kernel: [ INFO: possible recursive locking detected ]
Apr  7 04:04:41 cert-e3 kernel: 2.6.18-8.1.1.lspp.72.el5 #1
Apr  7 04:04:41 cert-e3 init: Trying to re-exec init
Apr  7 04:04:41 cert-e3 kernel: ---------------------------------------------
Apr  7 04:04:41 cert-e3 kernel: do_mq_unlink/15758 is trying to acquire lock:
Apr  7 04:04:41 cert-e3 kernel:  (&inode->i_mutex){--..}, at:
[<ffffffff80067ad2>] mutex_lock+0x2a/0x2e
Apr  7 04:04:41 cert-e3 kernel:
Apr  7 04:04:41 cert-e3 kernel: but task is already holding lock:
Apr  7 04:04:41 cert-e3 kernel:  (&inode->i_mutex){--..}, at:
[<ffffffff80067ad2>] mutex_lock+0x2a/0x2e
Apr  7 04:04:41 cert-e3 kernel:
Apr  7 04:04:41 cert-e3 kernel: other info that might help us debug this:
Apr  7 04:04:41 cert-e3 kernel: 1 lock held by do_mq_unlink/15758:
Apr  7 04:04:41 cert-e3 kernel:  #0:  (&inode->i_mutex){--..}, at:
[<ffffffff80067ad2>] mutex_lock+0x2a/0x2e
Apr  7 04:04:41 cert-e3 kernel:
Apr  7 04:04:41 cert-e3 kernel: stack backtrace:
Apr  7 04:04:41 cert-e3 kernel:
Apr  7 04:04:41 cert-e3 kernel: Call Trace:
Apr  7 04:04:41 cert-e3 kernel:  [<ffffffff800a9dea>]
__lock_acquire+0x135/0x9d9Apr  7 04:04:41 cert-e3 kernel:  [<ffffffff800aac31>]
lock_acquire+0x4b/0x69
Apr  7 04:04:41 cert-e3 kernel:  [<ffffffff80067ad2>] mutex_lock+0x2a/0x2e
Apr  7 04:04:41 cert-e3 kernel:  [<ffffffff8006792c>]
__mutex_lock_slowpath+0xe5/0x261
Apr  7 04:04:41 cert-e3 kernel:  [<ffffffff80067ad2>] mutex_lock+0x2a/0x2e
Apr  7 04:04:41 cert-e3 kernel:  [<ffffffff8004ca97>] vfs_unlink+0x8c/0x114
Apr  7 04:04:41 cert-e3 kernel:  [<ffffffff801258ea>] sys_mq_unlink+0xb9/0x103
Apr  7 04:04:41 cert-e3 kernel:  [<ffffffff80060dda>] tracesys+0xd1/0xdb
Apr  7 04:04:41 cert-e3 kernel:


This system is an i386 box freshly installed on Thursday.

Apr  7 04:05:09 kipper kernel: security:  5 users, 12 roles, 1794 types, 93
bools, 16 sens, 1024 cats
Apr  7 04:05:09 kipper kernel: security:  59 classes, 169756 rules
Apr  7 04:05:09 kipper kernel: security:  5 users, 12 roles, 1794 types, 93
bools, 16 sens, 1024 cats
Apr  7 04:05:09 kipper kernel: security:  59 classes, 169756 rules
Apr  7 04:05:09 kipper kernel: security:  5 users, 12 roles, 1794 types, 93
bools, 16 sens, 1024 cats
Apr  7 04:05:09 kipper kernel: security:  59 classes, 169756 rules
Apr  7 04:05:09 kipper kernel: security:  5 users, 12 roles, 1794 types, 93
bools, 16 sens, 1024 cats
Apr  7 04:05:09 kipper kernel: security:  59 classes, 169756 rules
Apr  7 04:05:09 kipper kernel:
Apr  7 04:05:09 kipper kernel: =============================================
Apr  7 04:05:09 kipper kernel: [ INFO: possible recursive locking detected ]
Apr  7 04:05:09 kipper kernel: 2.6.18-8.1.1.lspp.72.el5 #1
Apr  7 04:05:09 kipper kernel: ---------------------------------------------
Apr  7 04:05:09 kipper kernel: do_mq_unlink/14933 is trying to acquire lock:
Apr  7 04:05:09 kipper kernel:  (&inode->i_mutex){--..}, at: [<c0612b7d>]
mutex_lock+0x21/0x24
Apr  7 04:05:09 kipper kernel:
Apr  7 04:05:09 kipper init: Trying to re-exec init
Apr  7 04:05:09 kipper kernel: but task is already holding lock:
Apr  7 04:05:09 kipper kernel:  (&inode->i_mutex){--..}, at: [<c0612b7d>]
mutex_lock+0x21/0x24
Apr  7 04:05:10 kipper kernel:
Apr  7 04:05:10 kipper kernel: other info that might help us debug this:
Apr  7 04:05:10 kipper kernel: 1 lock held by do_mq_unlink/14933:
Apr  7 04:05:10 kipper kernel:  #0:  (&inode->i_mutex){--..}, at: [<c0612b7d>]
mutex_lock+0x21/0x24
Apr  7 04:05:10 kipper kernel:
Apr  7 04:05:10 kipper kernel: stack backtrace:
Apr  7 04:05:10 kipper kernel:  [<c04051ff>] show_trace_log_lvl+0x12/0x25
Apr  7 04:05:10 kipper kernel:  [<c040570d>] show_trace+0xd/0x10
Apr  7 04:05:10 kipper kernel:  [<c0405826>] dump_stack+0x19/0x1b
Apr  7 04:05:10 kipper kernel:  [<c043bc4d>] __lock_acquire+0x6ea/0x90d
Apr  7 04:05:10 kipper kernel:  [<c043c3e1>] lock_acquire+0x4b/0x6a
Apr  7 04:05:10 kipper kernel:  [<c0612a0e>] __mutex_lock_slowpath+0xbc/0x20a
Apr  7 04:05:10 kipper kernel:  [<c0612b7d>] mutex_lock+0x21/0x24
Apr  7 04:05:10 kipper kernel:  [<c047f96c>] vfs_unlink+0x73/0xe3
Apr  7 04:05:10 kipper kernel:  [<c04bdb72>] sys_mq_unlink+0x7b/0xd8
Apr  7 04:05:10 kipper kernel:  [<c0403fd7>] syscall_call+0x7/0xb
Apr  7 04:05:10 kipper kernel:  =======================

I have a third system which is an ia64 box that hasn't shown
this.  The ia64 box was installed a long time ago but is running
the same kernel.

Expected results:
No messages.

Additional info:

Since both systems showed the warning at about the same time I looked
to see what is running out of cron about then.  I'm guessing its
related to the prelink cron job running since it will signal init,
and both systems had a "Trying to re-exec init" message.  Another
clue is that the ia64 box, which didn't show the messages, isn't
configured for prelink.

On the two systems that did, I've seen the "Trying to re-exec init"
messages at about the same time on other days without the recursive
lock warning.

Comment 1 George C. Wilson 2007-04-09 20:50:46 UTC
eparis:  I will have somebody look at this.

Comment 2 Eric Sandeen 2007-04-09 21:51:09 UTC
Odds are this locking annotation will fix it:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=7a434814c7a6500b08bf4419ba8712b152d08d08

--- a/ipc/mqueue.c
+++ b/ipc/mqueue.c
@@ -731,7 +731,8 @@ asmlinkage long sys_mq_unlink(const char __user *u_name)
 	if (IS_ERR(name))
 		return PTR_ERR(name);
 
-	mutex_lock(&mqueue_mnt->mnt_root->d_inode->i_mutex);
+	mutex_lock_nested(&mqueue_mnt->mnt_root->d_inode->i_mutex,
+			I_MUTEX_PARENT);
 	dentry = lookup_one_len(name, mqueue_mnt->mnt_root, strlen(name));
 	if (IS_ERR(dentry)) {
 		err = PTR_ERR(dentry);

Comment 3 Eric Sandeen 2007-04-09 21:52:16 UTC
See also
http://bugzilla.kernel.org/show_bug.cgi?id=8130

Comment 4 George C. Wilson 2007-04-12 00:03:43 UTC
Linda, any way you can you verify that this is fixed in the updated package?

Comment 5 Linda Knippers 2007-04-12 00:24:52 UTC
Did this get pulled into the lspp kernel?  I see references to the upstream
bug fix but I wasn't sure it was built into one of our kernels.

This problem wasn't reproducible so I'm not sure I can verify it but I'll
run the latest kernel and see what happens.  The upstream bug report looks
to be a good match.

Comment 6 Steve Grubb 2007-04-12 01:10:46 UTC
Yes, this was put into lspp.73.

Comment 8 Steve Grubb 2007-04-16 20:24:45 UTC
Removing lspp tracker. Seems fixed. Thanks.

Comment 9 RHEL Product and Program Management 2007-04-17 19:42:40 UTC
This request was evaluated by Red Hat Kernel Team for inclusion in a Red
Hat Enterprise Linux maintenance release, and has moved to bugzilla 
status POST.

Comment 10 Don Zickus 2007-07-19 21:12:13 UTC
in kernel-2.6.18-16.el5

Comment 12 John Poelstra 2007-08-14 19:41:16 UTC
A fix for this issue has been included in the packages contained in the beta
(RHN channel) or most recent snapshot (partners.redhat.com) for RHEL5.1.  Please
verify that your issue is fixed.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to ASSIGNED.

Comment 13 John Poelstra 2007-08-14 20:52:12 UTC
A fix for this issue has been included in the packages contained in the beta
(RHN channel) or most recent snapshot (partners.redhat.com) for RHEL5.1.  Please
verify that your issue is fixed.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to ASSIGNED.

Comment 14 Linda Knippers 2007-08-14 22:29:40 UTC
The problem seems to be fixed in the 5.1 beta.

Comment 16 errata-xmlrpc 2007-11-07 19:46:26 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0959.html



Note You need to log in before you can comment on or make changes to this bug.