Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 77861 - Kernel lockup in qlogicfc0 driver
Summary: Kernel lockup in qlogicfc0 driver
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 7.2
Hardware: i686
OS: Linux
medium
high
Target Milestone: ---
Assignee: Dave Jones
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 77803
TreeView+ depends on / blocked
 
Reported: 2002-11-14 15:10 UTC by Hrunting Johnson
Modified: 2015-01-04 22:02 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2003-12-17 13:26:22 UTC


Attachments (Terms of Use)
output from readprofile -v (deleted)
2002-11-22 17:25 UTC, Hrunting Johnson
no flags Details

Description Hrunting Johnson 2002-11-14 15:10:28 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 
1.0.3705)

Description of problem:
Compaq 8500, 8 P3 CPU, 4GB RAM, QLA2200 FC
RH 7.2, all errata, kernel 2.4.18-17.7.x

Under heavy load (backups, running real-time monitoring system, and lookupd 
data) across the fibre-channel card, the system locks up.  Upon 
reboot, /var/log/messages contains 36 lines like:

kernel: qlogicfc0 : no handle slots, this should not happen
kernel: hostdata->queued is 4d, in_ptr: 38

The '4d' and 'in_ptr' values will vary.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.boot system
2.run under heavy load
3.wait

Actual Results:  system locks up

Expected Results:  system does not lock up

Additional info:

http://ldm.bkbits.net:8080/linux-2.5-cpu/cset@1.621.1.10?nav=index.html%
7CChangeSet@-4w

This URL contains information about changes made in 2.5 that supposedly fix 
this problem.  Looks like a change was made in the way drivers need to handle 
locks (per device vs. global).

I consider 2.4.18-17.7.x to be an extremely buggy kernel.  This is the third 
bug related to this kernel I've filed since I upgraded an RH7.2 box to this 
kernel yesterday.  Was any QA done on this kernel at all?

Comment 1 Arjan van de Ven 2002-11-14 15:12:37 UTC
please use the qla2200 driver instead; that one is actually supported

Comment 2 Hrunting Johnson 2002-11-14 15:26:26 UTC
Will do.  Under 2.4.9-31, I was using the qla2x00 without incident, but that 
disappeared in the new release.  The qla2200 driver under that kernel never 
worked for us, so I didn't bother to try it again.

Comment 3 Hrunting Johnson 2002-11-14 23:01:38 UTC
Okay, switching to the supported qla2200 driver appears to fix the problems 
with the machine lockup (and another bug, 77803, which I have no idea why or 
how), but the kjournald thread for the ext3 partition that is on the RAID 
accessed through that card is taking up around 11% of the total CPU on the box, 
whereas before it took up around 2%.  Why the increase?  Is that qla2200 driver 
that poor?

Under 2.4.9-31 and the qla2x00 driver, we didn't have that much journal 
activity, but we were also running under a different VM.  Under the qlogicfc0 
driver and the new VM, we had basically the same system usage as the qla2x00 
driver.

Comment 4 Stephen Tweedie 2002-11-15 10:15:48 UTC
The 77803 bug is likely due to dropped interrupts if a driver change fixes it.

As for the kjournald overhead, that could be a number of things, including
bounce buffer overhead.  We'd need to see a kernel profile to have any hope of
diagnosing it.  (Boot with the kernel parameters "profile=2"; man readprofile to
see how to extract info.)

Comment 5 Hrunting Johnson 2002-11-22 14:14:49 UTC
At the risk of being taken for an idiot, when I enable profiling (with 
profile=2), no matter what, I always get:

# readprofile -m /boot/System.map-2.4.18-18.7.xbigmem 
     4 _stext                                     0.0500
     4 total                                      0.0000

No matter what.  Do I need to do something else to enable accurate profiling on 
this machine?  The system is under heavy load.  The /proc/profile file is 
constantly being updated (according to its timestamp), but it's always the same 
size, and it always contains that same data (in -v, everything is set to 0).

This is with 2.4.18-18.7.xbigmem.

Comment 6 Arjan van de Ven 2002-11-22 14:20:19 UTC
you need to ALSO specify nmi_watchdog=1 in addition to profile=

Comment 7 Hrunting Johnson 2002-11-22 17:25:38 UTC
Created attachment 86072 [details]
output from readprofile -v

Comment 8 Dave Jones 2003-12-17 02:34:44 UTC
Fixed in the 2.4.20-20 erratas ?


Comment 9 Hrunting Johnson 2003-12-17 13:11:11 UTC
Yes.


Note You need to log in before you can comment on or make changes to this bug.