Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 159590 - corrupted data using software-raid (md)
Summary: corrupted data using software-raid (md)
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 3
Hardware: i386
OS: Linux
medium
high
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-06-05 13:08 UTC by Arian Prins
Modified: 2007-11-30 22:11 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-06-05 13:47:50 UTC


Attachments (Terms of Use)

Description Arian Prins 2005-06-05 13:08:32 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; nl-NL; rv:1.7.5) Gecko/20041202 Firefox/1.0

Description of problem:
My system was a updated version of FC3. It had 3 180Gb drives (P.ATA) that I combined using software raid, level 5 (as /dev/md0). I created them at install time using the default partitioning tools. On top of that I used LVM and on top of that I had a few partitions (ext3), including root-dir (booting from a seperate harddisk that was not included in the raid-set).

After a few months of usage all the filesystems were suddenly completely corrupted. The system could not boot anymore (it couldn't find init) and when I tried mount the partitions using a rescue-CD, live-CD or a complete new install on a seperate drive I could not get /dev/md0 mounted.

I tried reinstalling everything and now at install-time when the installer is formatting the partitions it fails and reboots (after giving a message that something serious happened).

I have now reinstalled FC3 on the seperate harddisk (not part of the 3 180 Gb drives) without creating the raid-array at all. When I try to create a raid-5 set (after the install, using mdadm), the newly created /dev/md0 partition corrupts after a few hours of usage. After unmounting, it won't remount. Because I wanted to rule out the possibility of drive (or controller) failure I fdisk-ed the seperate drives and put an ext3 partition directly on each of them. I filled the 3 drives up with 1Gb files. No problem. Reading back a few of them (eg. cat < 1gbfile > /dev/null) gives no problem either.

This means I have tried the following "chains":
direct partitions on the drives: no problems
combine the drives using raid-5: corruption
combine the drives using raid-5 and then using LVM on top of that: corruption.

This problem may be related to bug-nr 152162 but I'm not sure.


Version-Release number of selected component (if applicable):
kernel-2.6.9-1.667 (but upgrades probably too)

How reproducible:
Always

Steps to Reproduce:
Scenario 1:
1. Start installation of FC3
2. Create raid-5 set using three 180Gb disks (for all three disks: partition 1: 256Mb swap, partition 2: remaining size of disk for software RAID)
3. continue installation process.
4. at formatting-time, just before the formating's finished the installer gives an error-message indicating something serious went wrong and reboots.

Scenario 2:
1. Install FC3 on a 40Gb harddrive, leave the 180 Gb disks empty (no partitions).  
2. After the systems runs, create a partition on each 180 Gb drive (type 0xfd).
3. use mdadm to create a raid-5 set of the 3 partitions
4. mount the /dev/md0 at a dir.
5. start adding random data.
6. After a few Gb's, the data corrupts the filesystem (ls displays irregularities).
7. Unmound /dev/md0
8. reboot
9. mount /dev/md0 gives an error-message

Actual Results:  see steps

Expected Results:  the installer should have finished formatting/no curruption 

Additional info:

I get emails from smartd with subject:
SMART error (CurrentPendingSector) detected on host: bio.lan
.........
Device: /dev/hdh, 11 Currently unreadable (pending) sectors

and in another mail:
Device: /dev/hdg, 2 Currently unreadable (pending) sectors

Comment 1 Arian Prins 2005-06-05 13:47:50 UTC
After more investigation it seems that the hardware was faulty after all
(filling the harddisks up with data didn't give any problems but I now dumped
all data to /dev/nu// and did get errors). Apologies.


Note You need to log in before you can comment on or make changes to this bug.