|Summary:||mdadm usage bug in /sbin/mkdumprd may cause dumps to be lost|
|Product:||Red Hat Enterprise Linux 5||Reporter:||Charlotte Richardson <charlotte.richardson>|
|Component:||kexec-tools||Assignee:||Neil Horman <nhorman>|
|Status:||CLOSED ERRATA||QA Contact:||Red Hat Kernel QE team <kernel-qe>|
|Version:||5.2||CC:||caiqian, cward, mgahagan|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|:||600604 (view as bug list)||Environment:|
|Last Closed:||2009-01-20 21:00:32 UTC||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
Description Charlotte Richardson 2008-07-21 19:33:47 UTC
Description of problem: If /ext/mdadm.conf creates disk mirrors whose names have more than one final number, the mknod command created in the init script in the kdump initrd are incorrect. This resulted in /var/crash not being accessible anf resulting in the loss of the crash dump. The mdadm.conf file looked like: ARRAY /dev/md0 ... ARRAY /dev/md2 ... ARRAY /dev/md1 ... ARRAY /dev/md10 ... ARRAY /dev/md11 ... ARRAY /dev/md12 ... ARRAY /dev/md13 ... ... up to ARRAY/md37 ... ARRAY /dev/md3 ... where /var/crash is /dev/md3. Version-Release number of selected component (if applicable): kexec-tools-1.102pre-21.el5 How reproducible: 100% Steps to Reproduce: 1. Create disk partitions as above in that order. 2. echo 'c' > /proc/sysrq-trigger 3. Observe what happens... Actual results: Either the dump is written to only one device in the /var/crash (corrupting it) or is lost completely. Customer had the first scenario; I got both while testing this. Expected results: Should work. Additional info: The problem is in the init script for the kdump kernel that is created by /sbin/mkdumprd. The sed macro that is used to extract the minor numbers for the mknod commands before the mdadm -A -s command incorrectly eats the first trailing number into piece 1 instead of piece 2 if the device nane has more than one trailing digit. In this particular case, /dev/md13 was created with minor number 3 (as were /dev/md23 and /dev/md33 as well as the real /dev/md3 which was /var/crash). The attached patch fixes the sed macro so that it will work for these default mdadm names by consuming only non-digits into piece 1 and all the trailing digits into piece 2. This solves the problem in the case of default device names. The whole mechanism really ought to be rethought, however, since you are not restricted to using only default names by mdadm, and there is at any rate no real need to start up any devices other than the ones the dump is being written out to. Because of where we are in our testing cycle here at Stratus, instead of replacing kexec-tools-1.102pre-21.el5 with a fixed version, we are planning on working around this problem by shipping a kdump_pre script that deletes the erroneous /dev/mdnn devices and stops mdadm, then recreates the devices with the correct minor numbers and restarts mdadm. (I've tested both fixes.) This also only works for device names of the default format, though.
Comment 1 Charlotte Richardson 2008-07-21 19:33:47 UTC
Created attachment 312293 [details] fix for /sbin/mkdumprd to correct minor node numbers of /dev/mdnn
Comment 2 Neil Horman 2008-07-21 20:16:21 UTC
yep, looks good, I'll apply it shortly, thanks!
Comment 3 RHEL Product and Program Management 2008-07-21 20:40:07 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Comment 4 Neil Horman 2008-07-22 15:56:29 UTC
fixed in -28.el5. Thanks!
Comment 8 errata-xmlrpc 2009-01-20 21:00:32 UTC
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-0105.html