Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 155447 - dm-multipath oopses when the last path fails
Summary: dm-multipath oopses when the last path fails
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Fedora
Classification: Fedora
Component: device-mapper-multipath
Version: rawhide
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Alasdair Kergon
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-04-20 13:22 UTC by Lars Marowsky-Bree
Modified: 2007-11-30 22:11 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-04-21 10:25:44 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Novell 78986 None None None Never

Description Lars Marowsky-Bree 2005-04-20 13:22:30 UTC
Description of problem:

dm-multipath oopses when the last path fails; IO is requeued, but
dispatch_queued_ios() doesn't seem able to cope if map_io fails; but I don't
understand why, it seems to oops somewhere in the endio path:

Unable to handle kernel NULL pointer dereference at virtual address 00000010
 printing eip:
f8e048ed
*pde = 00102001
Oops: 0000 [#1]
SMP 
CPU:    0
EIP:    0060:[<f8e048ed>]    Tainted: G  U
EFLAGS: 00010206   (2.6.5-7.165-bigsmp SLES9_SP2_BRANCH-200504201212570000) 
EIP is at multipath_end_io+0x5d/0x260 [dm_multipath]
eax: 00000000   ebx: cdcde280   ecx: fffffffb   edx: 00000000
esi: e8b80bc0   edi: ffffffa1   ebp: f0a035e4   esp: f4029eac
ds: 007b   es: 007b   ss: 0068
Process kmpathd/0 (pid: 23278, threadinfo=f4028000 task=f3524000)
Stack: e8b80c40 00001000 0004aa90 00000000 00000001 cdcde290 00000000 fffffffb 
       e5f3a960 f0f7e960 e8b80bc0 f889b580 e5f3a968 f8d79080 f8e04890 00000000 
       e8b80bc0 00001000 f889b510 fffffffb c017be38 00000001 00000296 00000296 
Call Trace: 
 [<f889b580>] clone_endio+0x70/0x120 [dm_mod]
 [<f8e04890>] multipath_end_io+0x0/0x260 [dm_multipath]
 [<f889b510>] clone_endio+0x0/0x120 [dm_mod]
 [<c017be38>] bio_endio+0x68/0xa0
 [<f8e04e0c>] process_queued_ios+0x12c/0x140 [dm_multipath]
 [<c013c066>] worker_thread+0x186/0x230
 [<f8e04ce0>] process_queued_ios+0x0/0x140 [dm_multipath]
 [<c01238f0>] default_wake_function+0x0/0x10
 [<c01238f0>] default_wake_function+0x0/0x10
 [<c013bee0>] worker_thread+0x0/0x230
 [<c013fdd9>] kthread+0xf9/0x12d
 [<c013fce0>] kthread+0x0/0x12d
 [<c0107005>] kernel_thread_helper+0x5/0x10

Code: 8b 40 10 c7 04 24 58 5b e0 f8 83 c0 14 89 44 24 04 e8 1d 70  
 Dumping to block device (104,5) on CPU 0 ...


Version-Release number of selected component (if applicable):

2.6.5 but with latest DM patches applied.

This happens all the time and is 100% reproducible.

Comment 1 Lars Marowsky-Bree 2005-04-20 17:46:51 UTC
It's independent of the workqueue patch, just to add a data point.

Comment 2 Lars Marowsky-Bree 2005-04-20 20:55:30 UTC
Slightly better trace from a kernel compiled with framepointers etc:

Unable to handle kernel NULL pointer dereference at virtual address 00000010
 printing eip:
f8dedd48
*pde = 330bc001
Oops: 0000 [#1]
SMP 
CPU:    0
EIP:    0060:[<f8dedd48>]    Tainted: G  U
EFLAGS: 00010206   (2.6.5-7.165-biglmb ) 
EIP is at multipath_end_io+0x58/0x380 [dm_multipath]
eax: 00000000   ebx: f0621a38   ecx: fffffffb   edx: 00000000
esi: f0629cec   edi: f3b11084   ebp: f39d5ea0   esp: f39d5e64
ds: 007b   es: 007b   ss: 0068
Process kmpathd/0 (pid: 22795, threadinfo=f39d4000 task=f3cb4c60)
Stack: 27fc854a c045e800 00000088 00000000 f3cb4c60 00000001 f3b11098 00000000 
       f0629cec 00000046 f39d5ef0 c029269d fffffffb f0627914 f0626914 f39d5ec8 
       f88f754d f062791c f8dac080 f8dedcf0 00000000 f0621a38 f0621a38 00001000 
Call Trace: 
 [<c029269d>] generic_make_request+0x10d/0x1f0
 [<f88f754d>] clone_endio+0x6d/0x110 [dm_mod]
 [<f8dedcf0>] multipath_end_io+0x0/0x380 [dm_multipath]
 [<f88f74e0>] clone_endio+0x0/0x110 [dm_mod]
 [<c018d25b>] bio_endio+0x5b/0x90
 [<f8dee3ec>] process_queued_ios+0x19c/0x220 [dm_multipath]
 [<c0142ba0>] worker_thread+0x1a0/0x2e0
 [<f8dee250>] process_queued_ios+0x0/0x220 [dm_multipath]
 [<c01260c0>] default_wake_function+0x0/0x10
 [<c01260c0>] default_wake_function+0x0/0x10
 [<c01476dc>] kthread+0xec/0x11c
 [<c0142a00>] worker_thread+0x0/0x2e0
 [<c01475f0>] kthread+0x0/0x11c
 [<c0107005>] kernel_thread_helper+0x5/0x10

Code: 8b 40 10 c7 04 24 7c f1 de f8 83 c0 14 89 44 24 04 e8 f2 16  

Comment 3 Lars Marowsky-Bree 2005-04-21 10:25:44 UTC
Cough, cough. Looks like I introduced this bug myself in my extended logging
patch. I'll attach a cleaned up version of that one to the respective bug soon.


Note You need to log in before you can comment on or make changes to this bug.