Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 154262 - slab error in kmem_cache_destroy(): cache `dlm_conn': Can't free all objects when clvmd exits
Summary: slab error in kmem_cache_destroy(): cache `dlm_conn': Can't free all objects ...
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: Red Hat Cluster Suite
Classification: Retired
Component: dlm
Version: 4
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Christine Caulfield
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-04-08 20:02 UTC by Dean Jansa
Modified: 2009-04-16 19:59 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-11-29 21:55:48 UTC


Attachments (Terms of Use)

Description Dean Jansa 2005-04-08 20:02:40 UTC
Description of problem:

At times while stopping clvmd I hit:
slab error in kmem_cache_destroy(): cache `dlm_conn': Can't free all objects
 [<c0142cff>] kmem_cache_destroy+0x99/0x132
 [<f8a9934b>] lowcomms_stop+0xd4/0xdb [dlm]
 [<f8a9704e>] threads_stop+0x5/0xa [dlm]
 [<f8a97147>] dlm_release+0x83/0xa0 [dlm]
 [<f8a97983>] release_lockspace+0x199/0x1cf [dlm]
 [<f8a912f3>] unregister_lockspace+0xa/0x5c [dlm]
 [<f8a91a9c>] do_user_remove_lockspace+0x7d/0x94 [dlm]
 [<f8a92574>] dlm_write+0x169/0x1ae [dlm]
 [<c01561a8>] vfs_write+0xb6/0xe2
 [<c0156272>] sys_write+0x3c/0x62
 [<c02c746b>] syscall_call+0x7/0xb
kmem_cache_create: duplicate cache dlm_conn
------------[ cut here ]------------
kernel BUG at mm/slab.c:1453!
invalid operand: 0000 [#1]
SMP
Modules linked in: gnbd(U) lock_nolock(U) gfs(U) lock_dlm(U) dlm(U) cman(U) lock
_harness(U) md5 ipv6 parport_pc lp parport autofs4 sunrpc button battery ac uhci
_hcd hw_random e1000 floppy dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod qla230
0 qla2xxx scsi_transport_fc sd_mod scsi_mod
CPU:    1
EIP:    0060:[<c0142a8e>]    Not tainted VLI
EFLAGS: 00010202   (2.6.9-6.37.ELsmp)
EIP is at kmem_cache_create+0x4b3/0x526


Upon restart I hit duplicate cache (which is reasonable seeing as we didn't
clear it up above, but thought the stack may help at any rate)
:
Mar 30 14:33:08 morph-04 kernel: kmem_cache_create: duplicate cache dlm_conn
Mar 30 14:33:08 morph-04 kernel: ------------[ cut here ]------------
Mar 30 14:33:08 morph-04 kernel: kernel BUG at mm/slab.c:1453!
Mar 30 14:33:08 morph-04 kernel: invalid operand: 0000 [#1]
Mar 30 14:33:08 morph-04 kernel: SMP
Mar 30 14:33:08 morph-04 kernel: Modules linked in: gnbd(U) lock_nolock(U) gfs(U
) lock_dlm(U) dlm(U) cman(U) lock_harness(U) md5 ipv6 parport_pc lp parport auto
fs4 sunrpc button battery ac uhci_hcd hw_random e1000 floppy dm_snapshot dm_zero
 dm_mirror ext3 jbd dm_mod qla2300 qla2xxx scsi_transport_fc sd_mod scsi_mod
Mar 30 14:33:08 morph-04 kernel: CPU:    1
Mar 30 14:33:08 morph-04 kernel: EIP:    0060:[<c0142a8e>]    Not tainted VLI
Mar 30 14:33:08 morph-04 kernel: EFLAGS: 00010202   (2.6.9-6.37.ELsmp)
Mar 30 14:33:08 morph-04 kernel: EIP is at kmem_cache_create+0x4b3/0x526
Mar 30 14:33:08 morph-04 kernel: eax: 0000002c   ebx: f4554a74   ecx: c042530c
 edx: c02dad97
Mar 30 14:33:08 morph-04 kernel: esi: f8aa282c   edi: f8aa2835   ebp: f4554880
 esp: f2e4bec8
Mar 30 14:33:08 morph-04 kernel: ds: 007b   es: 007b   ss: 0068
Mar 30 14:33:08 morph-04 kernel: Process clvmd (pid: 14464, threadinfo=f2e4b000
task=f43f7730)
Mar 30 14:33:08 morph-04 kernel: Stack: c201cd60 c0000000 00000000 f8aa282c 0000
0050 00000000 fffffff4 f5f2ba00
Mar 30 14:33:08 morph-04 kernel:        f5f2b118 f8a994a1 00000000 00000000 0000
0000 00000000 f5f2b900 00000005
Mar 30 14:33:08 morph-04 kernel:        f8a9702b f8aab850 f8a9706a f8a97721 f5f2
b900 00000000 f5f2b11e ffffffff
Mar 30 14:33:08 morph-04 kernel: Call Trace:
Mar 30 14:33:08 morph-04 kernel:  [<f8a994a1>] lowcomms_start+0x14f/0x1f6 [dlm]
Mar 30 14:33:08 morph-04 kernel:  [<f8a9702b>] threads_start+0x20/0x3e [dlm]
Mar 30 14:33:08 morph-04 kernel:  [<f8a9706a>] init_internal+0x17/0x30 [dlm]
Mar 30 14:33:08 morph-04 kernel:  [<f8a97721>] dlm_new_lockspace+0x39/0x61 [dlm]
Mar 30 14:33:08 morph-04 kernel:  [<f8a91242>] register_lockspace+0xa3/0x14a [dl
m]
Mar 30 14:33:08 morph-04 kernel:  [<f8a91a0e>] do_user_create_lockspace+0x21/0x3
2 [dlm]
Mar 30 14:33:08 morph-04 kernel:  [<f8a92561>] dlm_write+0x156/0x1ae [dlm]
Mar 30 14:33:08 morph-04 kernel:  [<c01561a8>] vfs_write+0xb6/0xe2
Mar 30 14:33:08 morph-04 kernel:  [<c0156272>] sys_write+0x3c/0x62
Mar 30 14:33:08 morph-04 kernel:  [<c02c746b>] syscall_call+0x7/0xb
Mar 30 14:33:09 morph-04 kernel: Code: 04 19 c0 0c 01 85 c0 75 2a ff 74 24 0c 68
 97 ad 2d c0 e8 50 ef fd ff 59 b9 0c 53 42 c0 5e f0 ff 05 0c 53 42 c0 0f 8e eb 1
4 00 00 <0f> 0b ad 05 14 ad 2d c0 8b 1b eb 84 8b 54 24 04 b8 00 f0 ff ff
Mar 30 14:33:09 morph-04 kernel:  <0>Fatal exception: panic in 5 seconds




Version-Release number of selected component (if applicable):

2.6.9-6.37.ELsmp

DLM 2.6.9-30.1 (built Mar 29 2005 18:29:33) installed
Lock_DLM (built Mar 29 2005 18:33:25) installed

How reproducible:

Sometimes


Steps to Reproduce:
1. start clmvd
2. create/vols
3. tear down vols
4. stop clvmd

Comment 1 Christine Caulfield 2005-04-11 13:32:29 UTC
grief, the locking in nodeid2con is well broken, there's a read lock protecting
a write! Which explains why two connections to the same node can be created at
the same time. Of course, only one of them gets freed; hence this bug.

Changed the RW semaphore into a simple semaphore protecting the whole operation
rather than a rw_semaphore that is upped & downed all over the place in the one
routine.

Checking in lowcomms.c;
/cvs/cluster/cluster/dlm-kernel/src/lowcomms.c,v  <--  lowcomms.c
new revision: 1.22.2.8; previous revision: 1.22.2.7
done


Comment 2 Dean Jansa 2005-11-29 21:55:48 UTC
Have not seen this after the fix went in.


Note You need to log in before you can comment on or make changes to this bug.