Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 76240 - cat /proc/scsi/gdth/0 causes kernel oops
Summary: cat /proc/scsi/gdth/0 causes kernel oops
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 2.1
Classification: Red Hat
Component: kernel
Version: 2.1
Hardware: i686
OS: Linux
medium
high
Target Milestone: ---
Assignee: Larry Woodman
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2002-10-18 15:59 UTC by Jure Pecar
Modified: 2007-11-30 22:06 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2003-06-24 11:39:33 UTC
Target Upstream Version:


Attachments (Terms of Use)
decoded oops with 2.4.9-e.8 enterprise kernel (deleted)
2002-10-18 16:00 UTC, Jure Pecar
no flags Details
a diff of linux/drivers/scsi between 2.4.18-17.7.x and 2.4.18-17.2 (deleted)
2002-12-12 13:02 UTC, Jure Pecar
no flags Details | Diff

Description Jure Pecar 2002-10-18 15:59:48 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1) Gecko/20020827

Description of problem:
trying to see /proc/scsi/gdth/0 causes an oops.

Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
the simples way is to cat /proc/scsi/gdth/0.
or, just halt the machine, you'll see the oops at the end.
	

Actual Results:  decoded oops attached.

Expected Results:  i belive the kernel should print some information about the
card and the status of the array ...

Additional info:

Configuration:

Intel SHG2 board
dual Xeon 2.4ghz
6Gb memory
7 disk raid5 array + 1 hotspare configured in raid controller bios

raid controller:

02:08.0 RAID bus controller: Intel Corporation RAID Controller
	Subsystem: Intel Corporation: Unknown device 01ae
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr+ Stepping-
SERR+ FastB2B+
	Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=slow >TAbort- <TAbort-
<MAbort+ >SERR- <PERR-
	Latency: 64, cache line size 08
	Interrupt: pin A routed to IRQ 24
	Region 0: Memory at fc000000 (32-bit, prefetchable) [size=32M]
	Expansion ROM at <unassigned> [disabled] [size=32K]
	Capabilities: [80] Power Management version 2
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 PME-Enable- DSel=0 DScale=0 PME-

Comment 1 Jure Pecar 2002-10-18 16:00:49 UTC
Created attachment 80947 [details]
decoded oops with 2.4.9-e.8 enterprise kernel

Comment 2 Jure Pecar 2002-12-04 19:10:16 UTC
I was trying to do some kind of bisection on patches applied to the standard
2.4.9-ac10 (which works properly), but figured out that this is near impossible
... patches are one big pile of mess, only aio is done aproximately in the way
i'd expect it to be. I only found out that up to the patch #1000 things still
work, after the #1000 the kernel was becoming a PITA to compile.

Can you reorganize all this patch mess in a way that each patch (or a group of
them) would be a self-sufficient unit that would still allow the kernel to compile?

Comment 3 Jure Pecar 2002-12-12 13:00:45 UTC
I did some more work searching the bugzilla and diffing various kernel packages
... The closest thing i came to is diff betwenn 2.4.18-17.7.x and 2.4.18-17.2,
for wich arjanv said in bug #77398 it fixes the problem (and indeed it does).
I'm attaching a diff of drivers/scsi of these two kernels ... there's just a
couple of changes, none of them change the behaviour of the 2.4.9-e.10 kernel in
no way. Gdth oopses still in the same way.
Is there some other place in the source to look at?

Comment 4 Jure Pecar 2002-12-12 13:02:06 UTC
Created attachment 88564 [details]
a diff of linux/drivers/scsi between 2.4.18-17.7.x and 2.4.18-17.2

Comment 5 Jure Pecar 2002-12-16 08:35:34 UTC
Finally ... i needed some printks to figure out what exactly is going on ... if
i modify the patch applied to 2.4.18-17.2 to look like this:

--- kernel-2.4.18-14/linux/drivers/scsi/scsi.c	Tue Dec 10 14:04:55 2002
+++ kernel-2.4.18-17.2/linux/drivers/scsi/scsi.c	Fri Dec  6 10:47:02 2002
@@ -1470,8 +1470,9 @@
 	int j;
 	Scsi_Cmnd *SCpnt;
 	request_queue_t *q = &SDpnt->request_queue;
-
-	spin_lock_irqsave(q->queue_lock, flags);
+	
+	if (q->queue_lock != NULL)
+		spin_lock_irqsave(q->queue_lock, flags);
 
 	if (SDpnt->queue_depth == 0)
 	{
@@ -1520,7 +1521,8 @@
 	} else {
 		SDpnt->has_cmdblocks = 1;
 	}
-	spin_unlock_irqrestore(q->queue_lock, flags);
+	if (q->queue_lock != NULL)
+		spin_unlock_irqrestore(q->queue_lock, flags);
 }


then it actually works.

But it still looks like some ugly workaround ... hiding the real cause of the
problem ... 

Does anyone care to comment?

Comment 6 Jure Pecar 2002-12-16 13:25:40 UTC
add this chunk too:

@@ -2705,13 +2707,12 @@
                 panic("Attempt to delete wrong device\n");
         }
 
-        blk_cleanup_queue(&SDpnt->request_queue);
-
         /*
          * We only have a single SCpnt attached to this device.  Free
          * it now.
          */
 	scsi_release_commandblocks(SDpnt);
+        blk_cleanup_queue(&SDpnt->request_queue);
         kfree(SDpnt);
 }


then it really works :)



Comment 7 Larry Woodman 2003-06-23 17:52:51 UTC
This was fixed quite a while ago.  Did you try this with the latest
AS2.1 kernel errata(e.24)?

Larry Woodman


Comment 8 Jure Pecar 2003-06-24 11:39:33 UTC
It was fixed in e.10 or e.12, yes. Might as well close this bug.


Note You need to log in before you can comment on or make changes to this bug.