Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 162548 - interrupt handlers run on thread's kernel stack
Summary: interrupt handlers run on thread's kernel stack
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: i386
OS: Linux
Target Milestone: ---
: ---
Assignee: Kernel Maintainer List
QA Contact: Brian Brock
Depends On:
Blocks: 156322
TreeView+ depends on / blocked
Reported: 2005-07-06 06:23 UTC by craig harmer
Modified: 2007-11-30 22:07 UTC (History)
1 user (show)

Fixed In Version: RHSA-2005-514
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2005-10-05 13:39:35 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2005:514 qe-ready SHIPPED_LIVE Important: Updated kernel packages available for Red Hat Enterprise Linux 4 Update 2 2005-10-05 04:00:00 UTC

Description craig harmer 2005-07-06 06:23:48 UTC
Description of problem:

In several discussions, Red Hat engineers told us (Veritas) that Red Hat EL 4.0
would be based on the 2.6 kernel and would move to a 4 Kbyte stack size, but
would process hardware interrupts on a seperate stack.
but it turns out that it's not true!  Interrupts are still being processed on
the thread's kernel stack.
this is a huge problem for Veritas.
here's an example of an interrupt handler running on the thread stack caught by
our deep stack tracking kernel:
Comm: find (0xea4)
[kernel]     sys_getdents64                 (+0x64  =  0x00064)
[kernel]     vfs_readdir                    (+0x24  =  0x00088)
[nfs]        nfs_readdir                    (+0x1a4 =  0x0022c)
[nfs]        readdir_search_pagecache       (+0x18  =  0x00244)
[nfs]        find_dirent_page               (+0x14  =  0x00258)
[kernel]     read_cache_page                (+0x24  =  0x0027c)
[kernel]     __read_cache_page              (+0x24  =  0x002a0)
[nfs]        nfs_readdir_filler             (+0x28  =  0x002c8)
[nfs]        nfs3_proc_readdir              (+0x104 =  0x003cc)
[nfs]        nfs3_rpc_wrapper               (+0x28  =  0x003f4)
[sunrpc]     rpc_call_sync                  (+0x28  =  0x0041c)
[sunrpc]     rpc_execute                    (+0x14  =  0x00430)
[sunrpc]     __rpc_execute                  (+0x64  =  0x00494)
[sunrpc]     call_transmit                  (+0x10  =  0x004a4)
[sunrpc]     xprt_transmit                  (+0x24  =  0x004c8)
[sunrpc]     xprt_sendmsg                   (+0x28  =  0x004f0)
[sunrpc]     xdr_sendpages                  (+0x94  =  0x00584)
[kernel]     kernel_sendmsg                 (+0x24  =  0x005a8)
[kernel]     sock_sendmsg                   (+0xec  =  0x00694)
[kernel]     __sock_sendmsg                 (+0x24  =  0x006b8)
[kernel]     inet_sendmsg                   (+0x20  =  0x006d8)
[kernel]     tcp_sendmsg                    (+0x58  =  0x00730)
[kernel]     tcp_push                       (+0x28  =  0x00758)
[kernel]     __tcp_push_pending_frames      (+0x30  =  0x00788)
[kernel]     tcp_write_xmit                 (+0x24  =  0x007ac)
[kernel]     tcp_transmit_skb               (+0x2c  =  0x007d8)
[kernel]     ip_queue_xmit                  (+0xc4  =  0x0089c)
[kernel]     dst_output                     (+0x10  =  0x008ac)
[kernel]     ip_output                      (+0x14  =  0x008c0)
[kernel]     ip_finish_output               (+0x14  =  0x008d4)
[kernel]     nf_hook_slow                   (+0x38  =  0x0090c)
[kernel]     nf_iterate                     (+0x34  =  0x00940)
[kernel]     selinux_ipv4_postroute_last    (+0x20  =  0x00960)
[kernel]     selinux_ip_postroute_last      (+0x94  =  0x009f4)
[kernel]     avc_has_perm                   (+0x48  =  0x00a3c)
[kernel]     avc_has_perm_noaudit           (+0x5c  =  0x00a98)
[kernel]     avc_lookup                     (+0x24  =  0x00abc)
[kernel]     avc_search_node                (+0x28  =  0x00ae4)
[kernel]     avc_hash                       (+0x1c  =  0x00b00)
====> CDROM interrupt occurs here with ~1,200 bytes remaining <===
[kernel]     do_IRQ                         (+0x74  =  0x00b74)
[kernel]     handle_IRQ_event               (+0x20  =  0x00b94)
[kernel]     ide_intr                       (+0x28  =  0x00bbc)
[kernel]     cdrom_read_intr                (+0x20  =  0x00bdc)
[kernel]     ide_end_request                (+0x24  =  0x00c00)
[kernel]     __ide_end_request              (+0x28  =  0x00c28)
[kernel]     end_that_request_first         (+0x18  =  0x00c40)
[kernel]     __end_that_request_first       (+0x2c  =  0x00c6c)
[kernel]     bio_endio                      (+0x20  =  0x00c8c)
[kernel]     bounce_end_io_read             (+0x1c  =  0x00ca8)
[kernel]     __bounce_end_io_read           (+0x18  =  0x00cc0)
[kernel]     bounce_end_io                  (+0x24  = *0x00ce4)
[kernel]     bio_endio                      (+0x20  = *0x00d04)
[kernel]     end_bio_bh_io_sync             (+0x20  = *0x00d24)
[kernel]     end_buffer_async_read          (+0x24  = *0x00d48)
[kernel]     unlock_page                    (+0xc   = *0x00d54)
[kernel]     wake_up_page                   (+0x14  = *0x00d68)
[kernel]     __wake_up                      (+0x1c  = *0x00d84)
[kernel]     __wake_up_common               (+0x28  = *0x00dac)
[kernel]     page_wake_function             (+0x1c  = *0x00dc8)
[kernel]     autoremove_wake_function       (+0x20  = *0x00de8)
[kernel]     default_wake_function          (+0x1c  = *0x00e04)
[kernel]     try_to_wake_up                 (+0x48  = *0x00e4c)
[kernel]     wake_idle                      (+0x20  = *0x00e6c)
[kernel]     find_next_bit                  (+0x38  = *0x00ea4)
(CDROM interrupt consumes 0xea4 - 0xb00 + 0x74 = 1,048 bytes of stack.)
Note that frame sizes and stack depths shown here are roughly 10% larger than on
a production Redhat kernel, since our deepstack tracking kernel is compiled with
frame pointers and with "-mregparm=0" to make out debugging easier)
Also note that interrupts on Linux can nest.  the ~1,000 bytes consumed by the
CDROM interrupt could easily have had another ~500 bytes added to it by the
ethernet driver and another ~500 bytes added to it by the QLogic FC driver.  So,
under the right confluence of events this could have been a stack overflow
involving only kernel code shipped by Redhat.
you're probably wondering why we're only reporting this problem now ...
the problem is that veritas does most of it's testing using custom kernels built
with an kdb, frame-pointers, "-mregparm=0", and an 8 Kbyte stack. because we
have larger stack frames due to passing arguments on the stack, extra debugging
code, and kdb we need additional stack space (it really sucks when dropping into
kdb causes a stack overrun; in addition, we used to have problems with deep
stacks in our production code, although we believe they've all been resolved).
when we built our custome kernels, we used "#define CONFIG_4KSTACKS" because it
enables the interrupt stack switching code and because we *assumed* that's what
Red Hat was doing to get 4 Kbyte kernel stacks.
that was a mistake.  it turns out Red Hat builds their kernels with a custom
patch that enables 4 Kbyte stacks but disables interrupt stack switching.  that
patch is:
it strips out every "#ifdef CONFIG_4KSTACKS" in the kernel *except* for the
#ifdef around the interrupt stack switching code in do_IRQ() (in
arch/i386/kernel/irq.c), which explains why we're in this situation.
i'd really like to know why that patch was added.
Veritas has done some limited testing on Red Hat production kernels (most
recently the rhel4 Update 1 RC 1 drop) and hasn't seen any actual stack
overflows, or even any stack overflow warning messages.  but our stack depth
tracking kernels were being built using CONFIG_4KSTACKS so we weren't exploring
this issue with most of our testing.
at this point it's difficult to know what the actual risk is, but currently we
don't think we can release our products for the i386 (or i686) with this
ncreased risk of stack overflow (since we do know overflow *might* occur if the
conditions were right).
so we're urgently looking for Red Hat to make kernels available that actually
perform hardware interrupt handling on a different stack.

Version-Release number of selected component (if applicable):

How reproducible: every time

Steps to Reproduce:
1. build a kernel with stack depth tracking
2. run an i/o intensive test like SpecSFS
Actual results:
interrupts are handled on thread's kernel stack, not interrupt stack

Expected results:
interrupts handled on dedicated interrupt stack

Additional info:

Comment 1 Jason Baron 2005-07-06 16:06:01 UTC
Hi Craig, Thanks for the bug report. This was simply an oversight, and we have
corrected this issue by re-enabling 4k irq stacks during for U2. The bug noting
the issue is 162257. Thus, i'm closing this one as a duplicate of that. thanks.


*** This bug has been marked as a duplicate of 162257 ***

Comment 2 Red Hat Bugzilla 2005-10-05 13:39:35 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

Note You need to log in before you can comment on or make changes to this bug.