Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 157386 - QLA21xx reports qla2xxx_eh_abort when file system is under load
Summary: QLA21xx reports qla2xxx_eh_abort when file system is under load
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: i386
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: Tom Coughlan
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-05-11 04:09 UTC by aaron
Modified: 2007-11-30 22:07 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-04-13 13:53:00 UTC


Attachments (Terms of Use)

Description aaron 2005-05-11 04:09:33 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050323 Firefox/1.0.2 Fedora/1.0.2-1.3.1

Description of problem:
When you put the file system under load the system starts to spiral with an ever incresing load.  Sometimes the only way to recover the system is to power cycle it.

RHEL 4
kernel-smp-2.6.9-5.0.5.EL & kernel-smp-2.6.9-5.0.3.EL & 

Running Dual Xeon system with 1gig of ram
QLA2100 card connected to fiber channel chassis
Software raid5 across 12 members with 2 spares mounted as /home

Version-Release number of selected component (if applicable):
kernel-smp-2.6.9-5.0.5.EL, kernel-smp-2.6.9-5.0.3.EL

How reproducible:
Always

Steps to Reproduce:
1. Run bonnie or postmark on the filesystem that is on the QLA2100 card  watch as system careens out of control.  Loads >150 are not unusual if the system stays responsive.

Setup/hardware: 
RHEL 4 kernel-smp-2.6.9-5.0.5.EL or kernel-smp-2.6.9-5.0.3.EL & 

Running Dual Xeon system with 1gig of ram
QLA2100 card connected to fiber channel chassis 14 drives total.  Software RAID5 across 12 members with 2 spare, ext3 fs, mounted /home


Actual Results:  System load increases dramatically, system becomes almost useless, or locks up.

Expected Results:  To run the bonnie or postmark and provide the results.

Additional info:

Comment 1 aaron 2005-05-11 04:10:51 UTC
This same setup is running under RH 8.0 with out any problems

Comment 2 aaron 2005-05-11 04:21:07 UTC
This bug also appears to have shown up on the LKML

http://lkml.org/lkml/2004/9/15/351

Comment 3 aaron 2005-05-13 04:32:14 UTC
Upon further inspection (tearing down the box), the HBA in question is actually
a QLA2000.  The install detects and says that card is a 2100, and it uses the
2100 driver.  This has now been tested on 2 other installs, with 2 other cards.
 QLA200's are detected as QLA2100's. Putting ANY stress on a software RAID 5
file system that the QLA2000's fiber channel arrays will cause the machine to
dramatically increase load.  Left unattended, the machine eventually locks hard.



 

Comment 4 Tom Coughlan 2005-07-28 20:10:25 UTC
We have updated RHEL 4 U2 to the latest driver from QLogic (kernel version
2.6.9-11.35, or higher). This will be available in beta test soon. Will you be
able to do a test to confirm that this problem is fixed during the U2 beta? Or
would you be able to try an updated driver that I supply (let me know your
kernel version and type)? Thanks.

Comment 5 Tom Coughlan 2005-09-19 20:16:25 UTC
Have you been able to do a test with RHEL 4 U2? 

Comment 6 aaron 2005-09-22 15:21:17 UTC
I am no longer working in an environment where I can test this.  I have
forwarded the BUG on to the people working in that environment.  I hope that
they will be able to test and give you an answer.

Comment 7 Tom Coughlan 2006-04-13 13:53:00 UTC
The QLogic driver has been updated several times since this report. Please
re-test with U3 and re-open if the problem persists. 


Note You need to log in before you can comment on or make changes to this bug.