Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.

Bug 159590

Summary: corrupted data using software-raid (md)
Product: [Fedora] Fedora Reporter: Arian Prins <hgaprins>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED NOTABUG QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 3CC: wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2005-06-05 13:47:50 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Arian Prins 2005-06-05 13:08:32 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; nl-NL; rv:1.7.5) Gecko/20041202 Firefox/1.0

Description of problem:
My system was a updated version of FC3. It had 3 180Gb drives (P.ATA) that I combined using software raid, level 5 (as /dev/md0). I created them at install time using the default partitioning tools. On top of that I used LVM and on top of that I had a few partitions (ext3), including root-dir (booting from a seperate harddisk that was not included in the raid-set).

After a few months of usage all the filesystems were suddenly completely corrupted. The system could not boot anymore (it couldn't find init) and when I tried mount the partitions using a rescue-CD, live-CD or a complete new install on a seperate drive I could not get /dev/md0 mounted.

I tried reinstalling everything and now at install-time when the installer is formatting the partitions it fails and reboots (after giving a message that something serious happened).

I have now reinstalled FC3 on the seperate harddisk (not part of the 3 180 Gb drives) without creating the raid-array at all. When I try to create a raid-5 set (after the install, using mdadm), the newly created /dev/md0 partition corrupts after a few hours of usage. After unmounting, it won't remount. Because I wanted to rule out the possibility of drive (or controller) failure I fdisk-ed the seperate drives and put an ext3 partition directly on each of them. I filled the 3 drives up with 1Gb files. No problem. Reading back a few of them (eg. cat < 1gbfile > /dev/null) gives no problem either.

This means I have tried the following "chains":
direct partitions on the drives: no problems
combine the drives using raid-5: corruption
combine the drives using raid-5 and then using LVM on top of that: corruption.

This problem may be related to bug-nr 152162 but I'm not sure.

Version-Release number of selected component (if applicable):
kernel-2.6.9-1.667 (but upgrades probably too)

How reproducible:

Steps to Reproduce:
Scenario 1:
1. Start installation of FC3
2. Create raid-5 set using three 180Gb disks (for all three disks: partition 1: 256Mb swap, partition 2: remaining size of disk for software RAID)
3. continue installation process.
4. at formatting-time, just before the formating's finished the installer gives an error-message indicating something serious went wrong and reboots.

Scenario 2:
1. Install FC3 on a 40Gb harddrive, leave the 180 Gb disks empty (no partitions).  
2. After the systems runs, create a partition on each 180 Gb drive (type 0xfd).
3. use mdadm to create a raid-5 set of the 3 partitions
4. mount the /dev/md0 at a dir.
5. start adding random data.
6. After a few Gb's, the data corrupts the filesystem (ls displays irregularities).
7. Unmound /dev/md0
8. reboot
9. mount /dev/md0 gives an error-message

Actual Results:  see steps

Expected Results:  the installer should have finished formatting/no curruption 

Additional info:

I get emails from smartd with subject:
SMART error (CurrentPendingSector) detected on host: bio.lan
Device: /dev/hdh, 11 Currently unreadable (pending) sectors

and in another mail:
Device: /dev/hdg, 2 Currently unreadable (pending) sectors

Comment 1 Arian Prins 2005-06-05 13:47:50 UTC
After more investigation it seems that the hardware was faulty after all
(filling the harddisks up with data didn't give any problems but I now dumped
all data to /dev/nu// and did get errors). Apologies.