Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.

Bug 157386

Summary: QLA21xx reports qla2xxx_eh_abort when file system is under load
Product: Red Hat Enterprise Linux 4 Reporter: aaron <aaron>
Component: kernelAssignee: Tom Coughlan <coughlan>
Status: CLOSED WORKSFORME QA Contact: Brian Brock <bbrock>
Severity: high Docs Contact:
Priority: medium    
Version: 4.0CC: davej
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-04-13 13:53:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description aaron 2005-05-11 04:09:33 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050323 Firefox/1.0.2 Fedora/1.0.2-1.3.1

Description of problem:
When you put the file system under load the system starts to spiral with an ever incresing load.  Sometimes the only way to recover the system is to power cycle it.

kernel-smp-2.6.9-5.0.5.EL & kernel-smp-2.6.9-5.0.3.EL & 

Running Dual Xeon system with 1gig of ram
QLA2100 card connected to fiber channel chassis
Software raid5 across 12 members with 2 spares mounted as /home

Version-Release number of selected component (if applicable):
kernel-smp-2.6.9-5.0.5.EL, kernel-smp-2.6.9-5.0.3.EL

How reproducible:

Steps to Reproduce:
1. Run bonnie or postmark on the filesystem that is on the QLA2100 card  watch as system careens out of control.  Loads >150 are not unusual if the system stays responsive.

RHEL 4 kernel-smp-2.6.9-5.0.5.EL or kernel-smp-2.6.9-5.0.3.EL & 

Running Dual Xeon system with 1gig of ram
QLA2100 card connected to fiber channel chassis 14 drives total.  Software RAID5 across 12 members with 2 spare, ext3 fs, mounted /home

Actual Results:  System load increases dramatically, system becomes almost useless, or locks up.

Expected Results:  To run the bonnie or postmark and provide the results.

Additional info:

Comment 1 aaron 2005-05-11 04:10:51 UTC
This same setup is running under RH 8.0 with out any problems

Comment 2 aaron 2005-05-11 04:21:07 UTC
This bug also appears to have shown up on the LKML

Comment 3 aaron 2005-05-13 04:32:14 UTC
Upon further inspection (tearing down the box), the HBA in question is actually
a QLA2000.  The install detects and says that card is a 2100, and it uses the
2100 driver.  This has now been tested on 2 other installs, with 2 other cards.
 QLA200's are detected as QLA2100's. Putting ANY stress on a software RAID 5
file system that the QLA2000's fiber channel arrays will cause the machine to
dramatically increase load.  Left unattended, the machine eventually locks hard.


Comment 4 Tom Coughlan 2005-07-28 20:10:25 UTC
We have updated RHEL 4 U2 to the latest driver from QLogic (kernel version
2.6.9-11.35, or higher). This will be available in beta test soon. Will you be
able to do a test to confirm that this problem is fixed during the U2 beta? Or
would you be able to try an updated driver that I supply (let me know your
kernel version and type)? Thanks.

Comment 5 Tom Coughlan 2005-09-19 20:16:25 UTC
Have you been able to do a test with RHEL 4 U2? 

Comment 6 aaron 2005-09-22 15:21:17 UTC
I am no longer working in an environment where I can test this.  I have
forwarded the BUG on to the people working in that environment.  I hope that
they will be able to test and give you an answer.

Comment 7 Tom Coughlan 2006-04-13 13:53:00 UTC
The QLogic driver has been updated several times since this report. Please
re-test with U3 and re-open if the problem persists.