Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.

Bug 87659

Summary: gcc-2.96-113 produces broken kernels (creating DO_IRQ kernel stack trace deadlocks)
Product: [Retired] Red Hat Linux Reporter: jason andrade <jason>
Component: gccAssignee: Jakub Jelinek <jakub>
Status: CLOSED NOTABUG QA Contact: David Lawrence <dkl>
Severity: high Docs Contact:
Priority: medium    
Version: 7.3CC: howen, john
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-10-03 19:19:26 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 92002, 108092    
Description Flags
Perl script to deadlock kernels built with gcc-2.96-113 none

Description jason andrade 2003-03-31 23:00:30 UTC
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 5.14; Mac_PowerPC)

Description of problem:
I have installed the latest gcc and glibc on all our systems.  On a number 
of heavily network loaded systems i am seeing stack traces (do_IRQ) and 
the machine would crash randomly or worse, would just hang.

after thinking this was a problem with network drivers and/or with entropy 
handling it appears to have been narrowed down to a bug in the compiler 
or glibc or something there when you build a custom kernel.

i reverted to a binary kernel (2.4.18-27.7.xsmp) and the problem went 

of course this is not optimal since we have always been able to 
successfully compile custom kernels in the past and this also means we 
cannot add any patches or other changes as a custom kernel now fails..

i am about 90% sure it is related to the compiler but who knows.. 

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. compile custom kernel and install
2. apply heavy network load
3. thud/hang

Actual Results:  stack traces in syslog and machine hangs

Expected Results:  machine shouldn't have been crashing

Additional info:

i have not had a chance to test this with gcc-112 but since that
was the previous compiler version i used to compile the 2.4.18-18.7.x 
kernels which didn't exhibit this bug i am guessing that something in gcc-
113 (and it's associated software) is broken for kernel compiles.

note this seems to also affect redhat 7.2 (at that compiler version..)

Comment 1 Howard Owen 2003-10-17 17:04:33 UTC
We see this bug too. For us, it shows up using NFS alone, or in combination with
mvfs.o. The kernel stack traces occur when the system takes an IRQ while the
current process has traversed five successive symlinks. The kernel refuses to
service the IRQ because there is less than 1KiB of kernel stack left.
Recompiling with gcc-2.96-112 fixes the problem.

The latest errata kernel for 7.x is vulnerable to this issue because it was
built with gcc-2.96-113

Comment 2 Howard Owen 2003-10-17 17:15:33 UTC
Created attachment 95269 [details]
Perl script to deadlock kernels built with gcc-2.96-113

Run this script with '--jobs=40 --net' with cwd in an NFS mount that has
rsize=wsize=16KiB or greater.  Start 3-8 large scp transfers out from the box.
Watch the console. If you have a serial console, the system will deadlock. If
not, it may stay up, but you will still see the stack traces.

Comment 3 Howard Owen 2003-10-17 22:29:19 UTC
Jason, or someone, could you please change the summary of this bug to read
something like "gcc-2.96-113 produces broken kernels (DO_IRQ kernel stack trace
deadlocks)"? I might have found this bug three weeks ago when I started
investigating if it had had a summmary like that. 8)

Comment 4 John R 2003-11-10 08:17:24 UTC
This bug lists gcc-2.96-112 as safe, but one of my 7.2 systems running
the 2.4.18-27 kernel rebuilt with gcc-2.96-112 just died with the
do_IRQ: stack overflow errors.

Comment 5 Howard Owen 2003-11-10 18:43:37 UTC
As a sanity check, what does 'strings /boot/vmlinux-2.4.18-27.7x |
grep gcc' show?

Also, do_IRQ refusing to service an interrupt because there's not
enough space on the stack can happen for other reasons. Can you make
the fault happen with the attached Perl script?

Comment 6 Howard Owen 2003-12-23 20:57:16 UTC
Red Hat today released kernel-2.4.20-27.7, which fixes bug #108092,
one of the dependent bugs of this bug. 'strings
/boot/vmlinux-2.4.20-27.7 | grep gcc' shows that the new kernel was
built with gcc 2.96-126, which isn't released AFAIK. Here's hoping Red
Hat releases this gcc version before end-of-life of 7.3. (Or,
alternatively, *after* that date. 8)

Comment 7 Howard Owen 2003-12-31 22:59:17 UTC
On a hint supplied by Erling Jacobsen over on bug #108092, I applied
gcc-strict-alias-optimization2.patch from the RHEL 2.1 gcc-2.96-124
source RPM to the 7.3 gcc-2.96-113 sources. The patch applied cleanly,
and a kernel built with the resulting gcc failed to crash when
subjected to a variant of my kernel stack crash script.

It appears this patch addresses the problem this bug refers to.
Knowing this is not as helpful as it could be however, because the
effect of applying this patch in isolation isn't known.

Comment 8 Richard Henderson 2004-10-03 19:19:26 UTC
Yes, the networking code is known to have strict-aliasing violations.
The kernel makefiles have been updated since then to always supply