Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 588599 - Kernel BUG at fs/ext3/super.c:425
Summary: Kernel BUG at fs/ext3/super.c:425
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.5
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Josef Bacik
QA Contact: Igor Zhang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-05-04 03:10 UTC by Tao Ma
Modified: 2018-11-14 14:14 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-01-13 21:30:44 UTC
Target Upstream Version:


Attachments (Terms of Use)
a temp fix. (deleted)
2010-05-04 03:30 UTC, Tao Ma
no flags Details | Diff
potential fix (deleted)
2010-05-04 18:47 UTC, Josef Bacik
no flags Details | Diff


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0017 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.6 kernel security and bug fix update 2011-01-13 10:37:42 UTC

Description Tao Ma 2010-05-04 03:10:09 UTC
Description of problem:
__journal_remove_journal_head: freeing b_frozen_data 
ext3_abort called. 
EXT3-fs error (device loop0): ext3_put_super: Couldn't clean up the journal 
sb orphan head is 49156 
sb_info orphan list: 
  inode loop0:49156 at ffff880017156a78: mode 100644, nlink 1, next 49153 
  inode loop0:49153 at ffff88001f59f488: mode 100644, nlink 1, next 0 
Assertion failure in ext3_put_super() at fs/ext3/super.c:426: 
"list_empty(&sbi->s_orphan)" 
----------- [cut here ] --------- [please bite here ] --------- 
Kernel BUG at fs/ext3/super.c:425 


Version-Release number of selected component (if applicable):
redhat enterprise linux 5.5

How reproducible:
It depends.

Steps to Reproduce:
1. Create a large sparse file in a ext3 volume. The file size should be larger than the volume size left.
2. setup this file as a loopback device.
3. try to do direct IO.
4. If step 3 ends with ENOSPC, umount the volume and the kernel bug shows up.
  
Actual results:


Expected results:


Additional info:

Comment 1 Tao Ma 2010-05-04 03:18:45 UTC
Acutally I have investigated the bug, it is caused by the function ext3_direct_IO. In case of write, we add the inodes in orphan first, and then after blockdev_direct_IO, we remove it. But that is a corner case. What if we succeed in adding while fail in removing? That would leave the inodes in the ext3_sb->s_orphans, so when we do umount it would trigger kernel bug.

Comment 2 Tao Ma 2010-05-04 03:30:56 UTC
Created attachment 411173 [details]
a temp fix.

Here is my temp fix, and I take what ext4_ind_direct_IO did as a reference.
But we can also call ext3_truncate there and then change ext3_truncate like 
mainline commit ef43618a47179b41e7203a624f2c7445e7da488c. That way we need to change 2 places and a little more codes to change.
I am not familiar with ext3 enough, so I choose the simplest way to work around it.

Comment 3 Josef Bacik 2010-05-04 14:45:11 UTC
You're patch seems to be correct, but it's not upstream.  Please post it upstream and then we can pull it into RHEL.  Thanks,

Josef

Comment 4 Josef Bacik 2010-05-04 14:47:07 UTC
Duh, of course I then go look at upstream and see what's done there and we call ext3_truncate there.  I'll backport the upstream patches.

Comment 5 Josef Bacik 2010-05-04 18:47:26 UTC
Created attachment 411368 [details]
potential fix

I tried to reproduce your problem, but I couldn't.  This is what I did

mkfs.ext3 -b 4096 /dev/sdb1 (500m partition)
mount /dev/sdb1 /mnt/test
dd if=/dev/zero of=/mnt/test/file bs=1M seek=1000 count=1
losetup /dev/loop0 /mnt/test/file
dd if=/dev/zero of=/dev/loop0 bs=4k oflag=direct count=262144

Here is a backport of a couple of upstream bz's that should fix the problem.  It's a little odd that the thing is failing in the journal_start, but if thats what's happening then this fix is definitely needed.  Can you verify this patch fixes the problem?  Thanks,

Josef

Comment 6 Tao Ma 2010-05-04 23:56:00 UTC
(In reply to comment #4)
> Duh, of course I then go look at upstream and see what's done there and we call
> ext3_truncate there.  I'll backport the upstream patches.    

So you have choose the 2nd way I said above. use ext3_truncate and backport commit ef43618a47179b41e7203a624f2c7445e7da488c. ;)
I have no objection to it.

Comment 7 Tao Ma 2010-05-05 00:02:08 UTC
(In reply to comment #5)
> Created an attachment (id=411368) [details]
> potential fix
> I tried to reproduce your problem, but I couldn't.  This is what I did
yes, it is not very easy to trigger. Sorry for not pasting the test script at the very beginning.
> mkfs.ext3 -b 4096 /dev/sdb1 (500m partition)
> mount /dev/sdb1 /mnt/test
> dd if=/dev/zero of=/mnt/test/file bs=1M seek=1000 count=1
> losetup /dev/loop0 /mnt/test/file
> dd if=/dev/zero of=/dev/loop0 bs=4k oflag=direct count=262144
here the volume is needed to be mounted and then be written into.
mkfs.ext3 /dev/loop0
mount /dev/loop0 /mnt/test1
cd /mnt/test1
dd if=/dev/zero of=testfile1 bs=1024k & 
dd if=/dev/zero of=testfile2 bs=1024k oflag=direct & 
dd if=/dev/zero of=testfile3 bs=24k  & 
dd if=/dev/zero of=testfile4 bs=1024k oflag=direct seek=10000 & 

Sometimes you will trigger it.
> Here is a backport of a couple of upstream bz's that should fix the problem. 
> It's a little odd that the thing is failing in the journal_start, but if thats
> what's happening then this fix is definitely needed.  Can you verify this patch
> fixes the problem?  Thanks,
OK, I will test it.
> Josef

Comment 8 Tao Ma 2010-05-05 14:25:16 UTC
yeah, we have tested the patch. It works. Thanks.

Comment 9 Josef Bacik 2010-05-05 15:24:00 UTC
Great, thanks for doing all of the actual work :).

Josef

Comment 10 Josef Bacik 2010-05-05 15:33:51 UTC
BTW this is a backport of

ef43618a47179b41e7203a624f2c7445e7da488c
7eb4969e04060dcf3fbd46af9c21b1059b853068

Comment 11 RHEL Product and Program Management 2010-08-06 05:50:10 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 13 Jarod Wilson 2010-08-11 00:12:26 UTC
in kernel-2.6.18-211.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Detailed testing feedback is always welcomed.

Comment 17 errata-xmlrpc 2011-01-13 21:30:44 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0017.html


Note You need to log in before you can comment on or make changes to this bug.