Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 158424 - slab corruption during x86_64 install using Firewire/USB disks
Summary: slab corruption during x86_64 install using Firewire/USB disks
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
Assignee: Dave Jones
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: FC4Blocker
TreeView+ depends on / blocked
 
Reported: 2005-05-22 05:30 UTC by Alexandre Oliva
Modified: 2015-01-04 22:19 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-05-30 21:37:18 UTC


Attachments (Terms of Use)
fix slab corruption error (deleted)
2005-05-29 23:25 UTC, Alexandre Oliva
no flags Details | Diff

Description Alexandre Oliva 2005-05-22 05:30:25 UTC
Installing FC4-re0520.0/x86_64 (kernel-2.6.11-1.1329_FC4) to a box with USB- and
Firewire-connected HDs (the raid 1 physical volume containing the root logical
volume had one member on the Firewire disk), I got the following slab corruption
errors logged to /var/log/anaconda.syslog.  Installation seems to have completed
successfully and I was able to rollback to kernel 2.6.11-1.1323_FC4, that's
working just fine.  No such problems were observed on other i686 boxes, but
those didn't have external disks plugged in.

Version-Release number of selected component (if applicable):
kernel-2.6.11-1.1329_FC4

How reproducible:
Always

<6>EXT3-fs: mounted filesystem with ordered data mode.
<3>slab size-1024: redzone mismatch in slabp ffff81001f20e140, objp
ffff81001f20e9a8, bufctl 0xfffe
<3>Redzone: 0x170fc2a5/0xffff81001ecfc8e0.
<3>Last user: [<ffffffff880c47b4>](scsi_host_alloc+0xe4/0x470 [scsi_mod])
<3>000: a8 e5 20 1f 00 81 ff ff a8 e5 20 1f 00 81 ff ff
<3>010: d0 c6 cf 1e 00 81 ff ff d0 c6 cf 1e 00 81 ff ff
<3>slab size-1024: redzone mismatch in slabp ffff81001f20e140, objp
ffff81001f20e9a8, bufctl 0xfffe
<3>Redzone: 0x170fc2a5/0xffff81001ecfc8e0.
<3>Last user: [<ffffffff880c47b4>](scsi_host_alloc+0xe4/0x470 [scsi_mod])
<3>000: a8 e5 20 1f 00 81 ff ff a8 e5 20 1f 00 81 ff ff
<3>010: d0 c6 cf 1e 00 81 ff ff d0 c6 cf 1e 00 81 ff ff
<7>ISO 9660 Extensions: Microsoft Joliet Level 3
<4>Unable to load NLS charset utf8
<4>Unable to load NLS charset utf8
<7>ISO 9660 Extensions: RRIP_1991A
<4>warning: many lost ticks.
<4>Your time source seems to be instable or some driver is hogging interupts
<4>rip ide_end_request+0x1ce/0x1f0
<3>slab size-1024: redzone mismatch in slabp ffff81001f20e140, objp
ffff81001f20e9a8, bufctl 0xfffe
<3>Redzone: 0x170fc2a5/0xffff81001ecfc8e0.
<3>Last user: [<ffffffff880c47b4>](scsi_host_alloc+0xe4/0x470 [scsi_mod])
<3>000: a8 e5 20 1f 00 81 ff ff a8 e5 20 1f 00 81 ff ff
<3>010: d0 c6 cf 1e 00 81 ff ff d0 c6 cf 1e 00 81 ff ff
<3>slab size-1024: redzone mismatch in slabp ffff81001f20e140, objp
ffff81001f20e9a8, bufctl 0xfffe
<3>Redzone: 0x170fc2a5/0xffff81001ecfc8e0.
<3>Last user: [<ffffffff880c47b4>](scsi_host_alloc+0xe4/0x470 [scsi_mod])
<3>000: a8 e5 20 1f 00 81 ff ff a8 e5 20 1f 00 81 ff ff
<3>010: d0 c6 cf 1e 00 81 ff ff d0 c6 cf 1e 00 81 ff ff
<7>ISO 9660 Extensions: Microsoft Joliet Level 3
<4>Unable to load NLS charset utf8
<4>Unable to load NLS charset utf8
<7>ISO 9660 Extensions: RRIP_1991A
<3>slab size-1024: redzone mismatch in slabp ffff81001f20e140, objp
ffff81001f20e9a8, bufctl 0xfffe
<3>Redzone: 0x170fc2a5/0xffff81001ecfc8e0.
<3>Last user: [<ffffffff880c47b4>](scsi_host_alloc+0xe4/0x470 [scsi_mod])
<3>000: a8 e5 20 1f 00 81 ff ff a8 e5 20 1f 00 81 ff ff
<3>010: d0 c6 cf 1e 00 81 ff ff d0 c6 cf 1e 00 81 ff ff
<3>slab size-1024: redzone mismatch in slabp ffff81001f20e140, objp
ffff81001f20e9a8, bufctl 0xfffe
<3>Redzone: 0x170fc2a5/0xffff81001ecfc8e0.
<3>Last user: [<ffffffff880c47b4>](scsi_host_alloc+0xe4/0x470 [scsi_mod])
<3>000: a8 e5 20 1f 00 81 ff ff a8 e5 20 1f 00 81 ff ff
<3>010: d0 c6 cf 1e 00 81 ff ff d0 c6 cf 1e 00 81 ff ff
<7>ISO 9660 Extensions: Microsoft Joliet Level 3
<4>Unable to load NLS charset utf8
<4>Unable to load NLS charset utf8
<7>ISO 9660 Extensions: RRIP_1991A
<3>slab size-1024: redzone mismatch in slabp ffff81001f20e140, objp
ffff81001f20e9a8, bufctl 0xfffe
<3>Redzone: 0x170fc2a5/0xffff81001ecfc8e0.
<3>Last user: [<ffffffff880c47b4>](scsi_host_alloc+0xe4/0x470 [scsi_mod])
<3>000: a8 e5 20 1f 00 81 ff ff a8 e5 20 1f 00 81 ff ff
<3>010: d0 c6 cf 1e 00 81 ff ff d0 c6 cf 1e 00 81 ff ff
<3>slab size-1024: redzone mismatch in slabp ffff81001f20e140, objp
ffff81001f20e9a8, bufctl 0xfffe
<3>Redzone: 0x170fc2a5/0xffff81001ecfc8e0.
<3>Last user: [<ffffffff880c47b4>](scsi_host_alloc+0xe4/0x470 [scsi_mod])
<3>000: a8 e5 20 1f 00 81 ff ff a8 e5 20 1f 00 81 ff ff
<3>010: d0 c6 cf 1e 00 81 ff ff d0 c6 cf 1e 00 81 ff ff
<3>slab size-1024: redzone mismatch in slabp ffff81001f20e140, objp
ffff81001f20e9a8, bufctl 0xfffe
<3>Redzone: 0x170fc2a5/0xffff81001ecfc8e0.
<3>Last user: [<ffffffff880c47b4>](scsi_host_alloc+0xe4/0x470 [scsi_mod])
<3>000: a8 e5 20 1f 00 81 ff ff a8 e5 20 1f 00 81 ff ff
<3>010: d0 c6 cf 1e 00 81 ff ff d0 c6 cf 1e 00 81 ff ff
<7>ISO 9660 Extensions: Microsoft Joliet Level 3
<4>Unable to load NLS charset utf8
<4>Unable to load NLS charset utf8
<7>ISO 9660 Extensions: RRIP_1991A
<3>slab size-1024: redzone mismatch in slabp ffff81001f20e140, objp
ffff81001f20e9a8, bufctl 0xfffe
<3>Redzone: 0x170fc2a5/0xffff81001ecfc8e0.
<3>Last user: [<ffffffff880c47b4>](scsi_host_alloc+0xe4/0x470 [scsi_mod])
<3>000: a8 e5 20 1f 00 81 ff ff a8 e5 20 1f 00 81 ff ff
<3>010: d0 c6 cf 1e 00 81 ff ff d0 c6 cf 1e 00 81 ff ff
<3>slab size-1024: redzone mismatch in slabp ffff81001f20e140, objp
ffff81001f20e9a8, bufctl 0xfffe
<3>Redzone: 0x170fc2a5/0xffff81001ecfc8e0.
<3>Last user: [<ffffffff880c47b4>](scsi_host_alloc+0xe4/0x470 [scsi_mod])
<3>000: a8 e5 20 1f 00 81 ff ff a8 e5 20 1f 00 81 ff ff
<3>010: d0 c6 cf 1e 00 81 ff ff d0 c6 cf 1e 00 81 ff ff
<3>slab size-1024: redzone mismatch in slabp ffff81001f20e140, objp
ffff81001f20e9a8, bufctl 0xfffe
<3>Redzone: 0x170fc2a5/0xffff81001ecfc8e0.
<3>Last user: [<ffffffff880c47b4>](scsi_host_alloc+0xe4/0x470 [scsi_mod])
<3>000: a8 e5 20 1f 00 81 ff ff a8 e5 20 1f 00 81 ff ff
<3>010: d0 c6 cf 1e 00 81 ff ff d0 c6 cf 1e 00 81 ff ff
<6>parport: PnPBIOS parport detected.
<6>parport0: PC-style at 0x378 (0x778), irq 7 [PCSPP,TRISTATE]
<4>Trying to free free DMA1
<6>pnp: Device 00:0b disabled.

Comment 1 Alexandre Oliva 2005-05-22 12:53:21 UTC
Unsurprisingly. the errors also show up post-install, with kernel 1.1337_FC4 as
well.

Comment 2 Dave Jones 2005-05-23 19:57:40 UTC
is it possible you can try and repeat this with just firewire, or just usb, so
we can narrow the scope a little ?


Comment 3 Warren Togami 2005-05-23 20:00:23 UTC
Bug #158468 indicates firewire screwing up x86_64 install too.


Comment 4 Alexandre Oliva 2005-05-24 06:18:33 UTC
I've done a new install today with Firewire only, and still got the same errors.
 Unfortunately, USB doesn't seem to be very stable under very high I/O loads, I
get constant freezes while rsyncing isos for new test trees primed from my local
mirror of rawhide if I have one of the raid 1 members on Firewire and the other
on USB.  If everything is on Firewire, it works fine.  Except for the slab
corruption, that is.  I haven't even tried both on USB, but I could if you think
that would help.

Comment 5 Alexandre Oliva 2005-05-27 09:35:25 UTC
Ugh, 1.1355_FC4 just won't boot if my disks are connected to firewire, printing
far too many oops/panic messages to fit an 80x60 screen before coming to a
complete halt :-(

I suspect this is not a new bug, just a side effect of turning slab debugging
off.  But this sucks, just before the release :-(  It means Firewire-only people
will need custom install disks again.

Comment 6 Alexandre Oliva 2005-05-27 09:36:28 UTC
And, to make matters worse, I'm leaving on a trip in the next few minutes, and
won't be back before Sat evening.  Oh well ;-(

Comment 7 Alexandre Oliva 2005-05-29 23:25:38 UTC
Created attachment 114959 [details]
fix slab corruption error

This patch makes sure we allocate at least one unsigned long for hostdata. 
Without it, we write past the memory block allocated by scsi_host_alloc,
triggering slab corruption detection.  It seems like this has been broken
forever, and I can't figure out how come it didn't run into a problem before,
but I've verified that this fixes it on 2.6.11-1.1353_FC4.  I'm yet to build
1.1363_FC4 with the fix to see whether the init crashes at sbp2 load time are
gone.  The info I'm getting from the stack trace doesn't make it obvious
whether it's related.

Comment 8 Alexandre Oliva 2005-05-30 02:36:17 UTC
It was the same bug, after all.  I'm now running 1.1363_FC4 with the patch
above, and it boots and works just fine.  Please oh please add it to FC4 final.
 It's very narrow in scope, can't possibly break anything that isn't already
broken and will fix a memory corruption error.

Comment 9 Warren Togami 2005-05-30 03:21:53 UTC
This will be davej and Sopwith's decision.  Last week I heard davej say "final"
fc4 kernel a few times so it may already be too late.

Comment 10 Warren Togami 2005-05-30 21:37:18 UTC
Accepted into dist-fc4

Comment 11 Alexandre Oliva 2005-05-31 07:23:30 UTC
Fix confirmed in FC4-re0530.1 (kernel-2.6.11-1.1366_FC4).


Note You need to log in before you can comment on or make changes to this bug.