Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.

Bug 235336

Summary: qla3200 HBA Card hangs with a memory error
Product: Red Hat Enterprise Linux 4 Reporter: Qian Shen <qshen>
Component: kernelAssignee: Red Hat Kernel Manager <kernel-mgr>
Status: CLOSED NOTABUG QA Contact: Martin Jenner <mjenner>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.4CC: coughlan, mzhan
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2007-09-27 22:12:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Qian Shen 2007-04-05 09:24:32 UTC
Description of problem:

The customer uses a qla2300 HBA card to connect to a storage box. when one of
memory chips becomes fault, the qla2300 card reports waring to /var/log/message
and doesn't work. After replaced a new memory chip, the card become work then.

The customer think that it's a big issue that HBA card doesn't work, so it's not
enough to report warning message only one time. He think it will be good that
HBA card can report warning when each i/o request arrived.

The error message like this:

> Uhhuh. NMI received. Dazed and confused, but trying to continue

> > cpqphp: power fault interrupt

> > cpqphp: power fault bit 1 set

> > You probably have a hardware problem with your RAM chips qla2300 

> > 0000:06:02.0: qla2xxx_eh_abort scsi(1:0:0:26): cmd_timeout_in_sec=0x1e.

> > qla2300 0000:06:02.0: Parity error -- HCCR=ffff.

> > qla2300 0000:06:02.0: Mailbox command timeout occured. Issuing ISP abort.

> > qla2300 0000:06:02.0: Performing ISP error recovery - ha= f6f9c258.

> > cpqphp: power fault interrupt

> > cpqphp: power fault bit 0 set

> > qla2300 0000:06:01.0: qla2xxx_eh_abort scsi(0:0:0:26): cmd_timeout_in_sec=0x1e.

> > qla2300 0000:06:01.0: Parity error -- HCCR=ffff.

> > qla2300 0000:06:01.0: Mailbox command timeout occured. Issuing ISP abort.

> > qla2300 0000:06:01.0: Performing ISP error recovery - ha= f7e0c258.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
Actual results:

Expected results:

Additional info:

Comment 1 Ernie Petrides 2007-09-27 22:12:34 UTC
This is a hardware problem.  The operating system cannot be expected
to continue to function properly when there are disk controller errors.