Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 161160 - Reproducable panic in mdadm multipathing
Summary: Reproducable panic in mdadm multipathing
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: All
OS: Linux
Target Milestone: ---
Assignee: Doug Ledford
QA Contact:
Depends On:
Blocks: 168424
TreeView+ depends on / blocked
Reported: 2005-06-20 21:20 UTC by Wendy Cheng
Modified: 2007-11-30 22:07 UTC (History)
5 users (show)

Fixed In Version: RHSA-2006-0144
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2006-03-15 16:07:10 UTC
Target Upstream Version:

Attachments (Terms of Use)
patch submitted by IBM (deleted)
2005-07-21 20:09 UTC, Wendy Cheng
no flags Details | Diff

System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2006:0144 qe-ready SHIPPED_LIVE Moderate: Updated kernel packages available for Red Hat Enterprise Linux 3 Update 7 2006-03-15 05:00:00 UTC

Description Wendy Cheng 2005-06-20 21:20:08 UTC
Description of problem:

Two recreatable kernel oops have been reported with mdadm multpathing - one on
i686 and one on IPF machines. With the 2.4.21-32.0.1.ELsmp kernel, the panic route:

md0: former device sdi is unavailable, removing from array!
Unable to handle kernel NULL pointer dereference at virtual address 00000040
printing eip:
*pde = 35779001
*pte = 3c01c067
Oops: 0000
multipath netconsole usbserial lp parport autofs4 audit pool e1000 floppy sg
microcode loop lvm-mod keybdev mousedev hid input usb-uhci usbcore ext3 jbd qla
CPU:    1
EIP:    0060:[<f8b8859a>]    Not tainted
EFLAGS: 00010246

EIP is at multipath_run [multipath] 0x1ea (2.4.21-32.0.1.ELsmp/i686)
eax: d1210000   ebx: 00000000   ecx: 00000000   edx: f7caa294
esi: 00000000   edi: f7caa294   ebp: f7caa294   esp: f57cbd94
ds: 0068   es: 0068   ss: 0068
Process mdadm (pid: 4381, stackpage=f57cb000)
Stack: d1210000 00000000 000002c4 cf940000 c043fc80 c0440054 f7caa294 c043fc80
      f57cbde8 00000086 00000000 00000000 cf940000 f57ca000 f5c43000 00000001
      0000000a d1210000 00000000 c048135f 00007ca3 c0129553 00000282 00007ca3
Call Trace:   [<c0129553>] call_console_drivers [kernel] 0x63 (0xf57cbde8)
[<c0129883>] printk [kernel] 0x153 (0xf57cbe20)
[<c0217594>] device_size_calculation [kernel] 0x154 (0xf57cbe40)
[<c021786d>] do_md_run [kernel] 0x1dd (0xf57cbe6c)
[<c0129883>] printk [kernel] 0x153 (0xf57cbe88)
[<c0215a45>] bind_rdev_to_array [kernel] 0xa5 (0xf57cbea8)
[<c02186ed>] add_new_disk [kernel] 0x24d (0xf57cbec8)
[<c021928c>] md_ioctl [kernel] 0x38c (0xf57cbeec)
[<c0126154>] context_switch [kernel] 0xa4 (0xf57cbf60)
[<c01b2a3f>] tty_write [kernel] 0x14f (0xf57cbf68)
[<c016dbfe>] blkdev_ioctl [kernel] 0x3e (0xf57cbf80)
[<c0178756>] sys_ioctl [kernel] 0xf6 (0xf57cbf94)

Code: 8b 49 40 85 c9 0f 85 5f 02 00 00 8b 44 24 38 bf 01 00 00 00

Version-Release number of selected component (if applicable):
All versions of RHEL 3 kernels up to the current RHN distribution

How reproducible:
Each time and every time

Steps to Reproduce:
1. connect linux box to SAN storage with multipath.
2. create a lun on SAN storage, and start up with SAN boot.
3. create two more luns on SAN storage, then reboot.

/dev/sda:  50GB (including /, /boot, swap partition)
/dev/sdb:  12GB
/dev/sdc:  1GB
/dev/sdd:  multipath device for /dev/sda
/dev/sde:  multipath device for /dev/sdb
/dev/sdf:  multipath device for /dev/sdc

4. create a partition on /dev/sdc (multipath /dev/sdf) by parted, then assign
them to /dev/md0
5. On shell> mdadm -C -lmp -n2 /dev/md0 /dev/sdc1 /dev/sdf1
6. removing /dev/sdb and /dev/sde on SAN storage, then reboot.

now the device names have changed:
previous /dev/sdc becomes /dev/sdb, and previous /dev/sdf becomes /dev/sdd.
/dev/sda:  50GB (including /, /boot, swap partition)
/dev/sdb:  1GB (previous /dev/sdc)
/dev/sdd:  multipath device for /dev/sda
/dev/sde:  multipath device for /dev/sdb (previous /dev/sdf)

7. after editing /etc/mdadm.conf, does a "mdadm -As /dev/md0"

Actual result:
kernel oops.

Expected result:
no oops.

Additional Info:

--- /etc/mdadm.conf ---
DEVICE /dev/sd[abcdef][0-9]
ARRAY /dev/md0 devices=/dev/sdb1,/dev/sdd1

Comment 1 Wendy Cheng 2005-06-20 21:53:01 UTC
Sorry, typo in the device names have changed lines - should be:

now the device names have changed:
previous /dev/sdc becomes /dev/sdb, and previous /dev/sdf becomes /dev/sdd.
/dev/sda:  50GB (including /, /boot, swap partition)
/dev/sdb:  1GB (previous /dev/sdc)
/dev/sdc:  multipath device for /dev/sda
/dev/sdd:  multipath device for /dev/sdb (previous /dev/sdf)

Comment 11 Rene Klootwijk 2005-09-26 14:25:53 UTC
This same problem is happening when creating a multipath device on one system,
and activating the mulitpath device on another system which has assigned other
device names for these LUN's. We require several multipath devices activated on
multiple system for a Oracle10g RAC environment.

Comment 12 Doug Ledford 2005-09-26 21:28:00 UTC
This patch has passed my internal testing and the patch has been submitted
internally for review and possible inclusion in the next RHEL3 update release. 
I've also built a test kernel that has this patch included.  RPMs can be found
at and the kernel version that
includes this patch is 2.4.21-37.1.EL_st_tape_test3.

Comment 13 Rene Klootwijk 2005-09-27 07:16:08 UTC
Can you compile a hugemem version of the kernel?

Comment 14 Doug Ledford 2005-09-27 14:24:01 UTC
One is already present in the i686 directory.

Comment 17 Ernie Petrides 2005-10-08 02:12:57 UTC
A fix for this problem has just been committed to the RHEL3 U7
patch pool this evening (in kernel version 2.4.21-37.5.EL).

Comment 27 Red Hat Bugzilla 2006-03-15 16:07:11 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

Note You need to log in before you can comment on or make changes to this bug.