Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.

Bug 456154

Summary: mdadm usage bug in /sbin/mkdumprd may cause dumps to be lost
Product: Red Hat Enterprise Linux 5 Reporter: Charlotte Richardson <charlotte.richardson>
Component: kexec-toolsAssignee: Neil Horman <nhorman>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: low    
Version: 5.2CC: caiqian, cward, mgahagan
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 600604 (view as bug list) Environment:
Last Closed: 2009-01-20 21:00:32 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Attachments:
Description Flags
fix for /sbin/mkdumprd to correct minor node numbers of /dev/mdnn none

Description Charlotte Richardson 2008-07-21 19:33:47 UTC
Description of problem:
If /ext/mdadm.conf creates disk mirrors whose names have more than one final
number, the mknod command created in the init script in the kdump initrd are
incorrect. This resulted in /var/crash not being accessible anf resulting in the
loss of the crash dump. The mdadm.conf file looked like:

ARRAY /dev/md0 ...
ARRAY /dev/md2 ...
ARRAY /dev/md1 ...
ARRAY /dev/md10 ...
ARRAY /dev/md11 ...
ARRAY /dev/md12 ...
ARRAY /dev/md13 ...
... up to ARRAY/md37 ...
ARRAY /dev/md3 ...

where /var/crash is /dev/md3.

Version-Release number of selected component (if applicable):
kexec-tools-1.102pre-21.el5

How reproducible:
100%


Steps to Reproduce:
1. Create disk partitions as above in that order.
2. echo 'c' > /proc/sysrq-trigger
3. Observe what happens...
  
Actual results:
Either the dump is written to only one device in the /var/crash (corrupting it)
or is lost completely. Customer had the first scenario; I got both while testing
this.


Expected results:
Should work.


Additional info:
The problem is in the init script for the kdump kernel that is created by
/sbin/mkdumprd. The sed macro that is used to extract the minor numbers for the
mknod commands before the mdadm -A -s command incorrectly eats the first
trailing number into piece 1 instead of piece 2 if the device nane has more than
one trailing digit. In this particular case, /dev/md13 was created with minor
number 3 (as were /dev/md23 and /dev/md33 as well as the real /dev/md3 which was
/var/crash). The attached patch fixes the sed macro so that it will work for
these default mdadm names by consuming only non-digits into piece 1 and all the
trailing digits into piece 2. This solves the problem in the case of default
device names.

The whole mechanism really ought to be rethought, however, since you are not
restricted to using only default names by mdadm, and there is at any rate no
real need to start up any devices other than the ones the dump is being written
out to.

Because of where we are in our testing cycle here at Stratus, instead of
replacing kexec-tools-1.102pre-21.el5 with a fixed version, we are planning on
working around this problem by shipping a kdump_pre script that deletes the
erroneous /dev/mdnn devices and stops mdadm, then recreates the devices with the
correct minor numbers and restarts mdadm. (I've tested both fixes.) This also
only works for device names of the default format, though.

Comment 1 Charlotte Richardson 2008-07-21 19:33:47 UTC
Created attachment 312293 [details]
fix for /sbin/mkdumprd to correct minor node numbers of /dev/mdnn

Comment 2 Neil Horman 2008-07-21 20:16:21 UTC
yep, looks good, I'll apply it shortly, thanks!

Comment 3 RHEL Product and Program Management 2008-07-21 20:40:07 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 4 Neil Horman 2008-07-22 15:56:29 UTC
fixed in -28.el5.  Thanks!

Comment 8 errata-xmlrpc 2009-01-20 21:00:32 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0105.html