Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 455621 - 100% CPU Use in Hardware Interrupts with noapic option after updating to new kernel
Summary: 100% CPU Use in Hardware Interrupts with noapic option after updating to new ...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 9
Hardware: x86_64
OS: Linux
low
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-07-16 17:24 UTC by Alex Chernyakhovsky
Modified: 2008-09-10 01:20 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-09-10 01:20:10 UTC


Attachments (Terms of Use)

Description Alex Chernyakhovsky 2008-07-16 17:24:49 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9) Gecko/2008061712 Fedora/3.0-1.fc9 Firefox/3.0

Description of problem:
I am running an HP tx2000z series laptop (a tx2117cl to be exact) with a dual-core AMD Turion 64 X2 TL-62 (2.1Ghz cores).

I had Fedora 9 with kernel-2.6.25.9-76.fc9 running perfectly, including suspend-to-ram and hibernate if I booted with the noapic option. According to top, there were no negative side effects to this, and the machine worked perfectly, aside from a few minor glitches not relevant to this bug,

However, when I updated to kernel-2.6.25.10-86.fc9.x86_64, the second CPU is always 100% in use performing hardware interrupts (hi).

Since such did not occur earlier, I can only presume that this is a bug. I cannot boot without noapic consistently. Doing so will give rise to a Machine Check Exception. However, since this machine works perfectly with noapic and in vista, I strongly doubt that there is a hardware issue.

Version-Release number of selected component (if applicable):
kernel-2.6.25.10-86.fc9.x86_64

How reproducible:
Always


Steps to Reproduce:
1. Install latest kernel update (kernel-2.6.25.10-86.fc9.x86_64) on a tx2000z series tablet
2. Attempt to boot with noapic
3. Observe 100% cpu use in one core doing hardware interrupts

Actual Results:
100% CPU use in hardware interrupts.

Expected Results:
Both cores 99% idle, being used by the processes, not the hardware.

Additional info:
This is a tablet pc, manufactured by HP. As with all other HP laptops, it requires noapic to boot properly. This is the first time I have seen CPU utilization on this tablet in doing so.

Comment 1 Chuck Ebbert 2008-07-20 19:52:28 UTC
I have a tx1000 that also needs noapic to boot. It occasionally gets into that
state where a CPU gets stuck processing interrupts but usually it stops after a
while.

F9 kernels starting with 2.6.25.11-95 have the sysrq-l key added to show a
backtrace on all processors. It can be activated by the command

  echo 'l' >/proc/sysrq-trigger

Additional debugging will hopefully be added to the next release so we can find
out why noapic is needed too.


Comment 2 Alex Chernyakhovsky 2008-07-20 20:08:20 UTC
I have checked my tx2000z laptop's F9 installation, and I only have
kernel-2.6.25.9-76.fc9.x86_64 and kernel-2.6.25.10-86.fc9.x86_64 installed. Is
2.6.25.11-95 still in rawhide?

Additionally, I see the following message shortly after boot-up or reloading the
USB modules:

Jul 20 15:57:38 localhost kernel: irq 7: nobody cared (try booting with the
"irqpoll" option)
Jul 20 15:57:38 localhost kernel: Pid: 0, comm: swapper Tainted: P        
2.6.25.9-76.fc9.x86_64 #1
Jul 20 15:57:38 localhost kernel: 
Jul 20 15:57:38 localhost kernel: Call Trace:
Jul 20 15:57:38 localhost kernel:  <IRQ>  [<ffffffff8107180f>]
__report_bad_irq+0x38/0x7c
Jul 20 15:57:38 localhost kernel:  [<ffffffff81071a35>] note_interrupt+0x1e2/0x2
49
Jul 20 15:57:38 localhost kernel:  [<ffffffff8107221b>] handle_level_irq+0xb1/0xe7
Jul 20 15:57:38 localhost kernel:  [<ffffffff8100e48f>] do_IRQ+0xf7/0x167
Jul 20 15:57:38 localhost kernel:  [<ffffffff8100aff0>] ? default_idle+0x0/0x5f
Jul 20 15:57:38 localhost kernel:  [<ffffffff8100c3f1>] ret_from_intr+0x0/0xa
Jul 20 15:57:38 localhost kernel:  <EOI>  [<ffffffff8100b029>] ?
default_idle+0x39/0x5f
Jul 20 15:57:38 localhost kernel:  [<ffffffff8100b024>] ? default_idle+0x34/0x5f
Jul 20 15:57:38 localhost kernel:  [<ffffffff8100aff0>] ? default_idle+0x0/0x5f
Jul 20 15:57:38 localhost kernel:  [<ffffffff8100afa8>] ? cpu_idle+0x78/0xc0
Jul 20 15:57:38 localhost kernel:  [<ffffffff81289e7b>] ?
start_secondary+0x3fc/0x40b
Jul 20 15:57:38 localhost kernel: 
Jul 20 15:57:38 localhost kernel: handlers:
Jul 20 15:57:38 localhost kernel: [<ffffffff811be414>] (usb_hcd_irq+0x0/0x63)
Jul 20 15:57:38 localhost kernel: Disabling IRQ #7

Perhaps it is related? Following the recommendation to boot with irqpoll (in
addition to my normal noapic) results in a deadlocked machine. 

However, USB works, as long as no new devices are plugged in.

Comment 3 Chuck Ebbert 2008-07-21 02:08:40 UTC
(In reply to comment #2)
> I have checked my tx2000z laptop's F9 installation, and I only have
> kernel-2.6.25.9-76.fc9.x86_64 and kernel-2.6.25.10-86.fc9.x86_64 installed. Is
> 2.6.25.11-95 still in rawhide?
>
It has not been built yet.

> Additionally, I see the following message shortly after boot-up or reloading the
> USB modules:
> 
> Jul 20 15:57:38 localhost kernel: irq 7: nobody cared (try booting with the
> "irqpoll" option)
> Jul 20 15:57:38 localhost kernel: Pid: 0, comm: swapper Tainted: P        
> 2.6.25.9-76.fc9.x86_64 #1
> Jul 20 15:57:38 localhost kernel: 
> Jul 20 15:57:38 localhost kernel: Call Trace:
> Jul 20 15:57:38 localhost kernel:  <IRQ>  [<ffffffff8107180f>]
> __report_bad_irq+0x38/0x7c
> Jul 20 15:57:38 localhost kernel:  [<ffffffff81071a35>] note_interrupt+0x1e2/0x2
> 49
> Jul 20 15:57:38 localhost kernel:  [<ffffffff8107221b>] handle_level_irq+0xb1/0xe7
> Jul 20 15:57:38 localhost kernel:  [<ffffffff8100e48f>] do_IRQ+0xf7/0x167
> Jul 20 15:57:38 localhost kernel:  [<ffffffff8100aff0>] ? default_idle+0x0/0x5f
> Jul 20 15:57:38 localhost kernel:  [<ffffffff8100c3f1>] ret_from_intr+0x0/0xa
> Jul 20 15:57:38 localhost kernel:  <EOI>  [<ffffffff8100b029>] ?
> default_idle+0x39/0x5f
> Jul 20 15:57:38 localhost kernel:  [<ffffffff8100b024>] ? default_idle+0x34/0x5f
> Jul 20 15:57:38 localhost kernel:  [<ffffffff8100aff0>] ? default_idle+0x0/0x5f
> Jul 20 15:57:38 localhost kernel:  [<ffffffff8100afa8>] ? cpu_idle+0x78/0xc0
> Jul 20 15:57:38 localhost kernel:  [<ffffffff81289e7b>] ?
> start_secondary+0x3fc/0x40b
> Jul 20 15:57:38 localhost kernel: 
> Jul 20 15:57:38 localhost kernel: handlers:
> Jul 20 15:57:38 localhost kernel: [<ffffffff811be414>] (usb_hcd_irq+0x0/0x63)
> Jul 20 15:57:38 localhost kernel: Disabling IRQ #7
> 
> Perhaps it is related? Following the recommendation to boot with irqpoll (in
> addition to my normal noapic) results in a deadlocked machine. 
> 
> However, USB works, as long as no new devices are plugged in.

Use 'noirqdebug' and the bogus interrupts will be ignored instead of causing errors.


Comment 4 Chuck Ebbert 2008-07-22 03:34:34 UTC
2.6.25.11-97 has been submitted to the updates-testing repository.


Comment 5 Alex Chernyakhovsky 2008-07-30 17:34:04 UTC
I have installed 2.6.25.11-97 and run the specified command. I see the following
in /var/log/messages:

Jul 30 13:33:10 localhost kernel: SysRq : Show backtrace of all active CPUs
Jul 30 13:33:10 localhost kernel: CPU1:
Jul 30 13:33:10 localhost kernel:  ffff8100bb693f18 0000000000000046
ffffffff81195b3e 0000000000000000
Jul 30 13:33:10 localhost kernel:  0000000000000001 00007f464090b760
ffff8100bb693f58 ffffffff8100d817
Jul 30 13:33:10 localhost kernel:  ffff8100bb693f78 ffffffff81195b86
ffff8100bb693f78 0000000000000000
Jul 30 13:33:10 localhost kernel: Call Trace:
Jul 30 13:33:10 localhost kernel:  <IRQ>  [<ffffffff81195b3e>] ? showacpu+0x0/0x5b
Jul 30 13:33:10 localhost kernel:  [<ffffffff8100d817>] ? show_stack+0x10/0x12
Jul 30 13:33:10 localhost kernel:  [<ffffffff81195b86>] ? showacpu+0x48/0x5b
Jul 30 13:33:10 localhost kernel:  [<ffffffff8101b0ea>] ?
smp_call_function_interrupt+0x48/0x71
Jul 30 13:33:10 localhost kernel:  [<ffffffff8100ca36>] ?
call_function_interrupt+0x66/0x70
Jul 30 13:33:10 localhost kernel:  <EOI> 

This is a boot with noapic.

Comment 6 Alex Chernyakhovsky 2008-07-30 17:40:52 UTC
Backtrace without noapic (lucky boot? unsure at this time)
Jul 30 13:39:57 localhost kernel: SysRq : Show backtrace of all active CPUs
Jul 30 13:39:57 localhost kernel: CPU0:
Jul 30 13:39:57 localhost kernel:  ffffffff814bbf18 0000000000000046
ffffffff81195b3e 0000000000000000
Jul 30 13:39:57 localhost kernel:  0000000000000020 0000000000000001
ffffffff814bbf58 ffffffff8100d817
Jul 30 13:39:57 localhost kernel:  ffffffff814bbf78 ffffffff81195b86
ffffffff814bbf88 0000000000000000
Jul 30 13:39:57 localhost kernel: Call Trace:
Jul 30 13:39:57 localhost kernel:  <IRQ>  [<ffffffff81195b3e>] ? showacpu+0x0/0x5b
Jul 30 13:39:57 localhost kernel:  [<ffffffff8100d817>] ? show_stack+0x10/0x12
Jul 30 13:39:57 localhost kernel:  [<ffffffff81195b86>] ? showacpu+0x48/0x5b
Jul 30 13:39:57 localhost kernel:  [<ffffffff8101b0ea>] ?
smp_call_function_interrupt+0x48/0x71
Jul 30 13:39:57 localhost kernel:  [<ffffffff8100ca36>] ?
call_function_interrupt+0x66/0x70
Jul 30 13:39:57 localhost kernel:  <EOI>  [<ffffffff810cc63c>] ?
inotify_inode_queue_event+0x34/0xd9
Jul 30 13:39:57 localhost kernel:  [<ffffffff8110149b>] ?
security_file_permission+0x11/0x13
Jul 30 13:39:57 localhost kernel:  [<ffffffff810a43d6>] ?
do_readv_writev+0x17e/0x193
Jul 30 13:39:57 localhost kernel:  [<ffffffff8106c6c3>] ?
audit_syscall_entry+0x126/0x15a
Jul 30 13:39:57 localhost kernel:  [<ffffffff8106c394>] ?
audit_syscall_exit+0x331/0x353
Jul 30 13:39:57 localhost kernel:  [<ffffffff810a4429>] ? vfs_writev+0x3e/0x49
Jul 30 13:39:57 localhost kernel:  [<ffffffff810a447b>] ? sys_writev+0x47/0x94
Jul 30 13:39:57 localhost kernel:  [<ffffffff8100c052>] ? tracesys+0xd5/0xda
Jul 30 13:39:57 localhost kernel: 


Comment 7 Chuck Ebbert 2008-08-01 20:23:57 UTC
The computer should work fine without noapic with kernels 2.6.25 and later. It
was only older kernels that needed that option. Check the boot messages and look
for a line containing " using 0xed I/O delay port". If that is present IOAPIC
mode should just work.


Comment 8 Alex Chernyakhovsky 2008-08-01 20:43:04 UTC
Unfortunately, I cannot get the tx2000z to reliably boot without noapic in any
kernel. I just tried, and received a Machine Check Exception (the whole reason I
had to use noapic in the first place):

HARDWARE ERROR
CPU 0: Machine Check Exception 4 Bank 4: b2000000000f0f
TSC 11e4deb697

The kernel claims that this is a hardware issue, but since Windows Vista can
survive, as can a noapic'd kernel, I strongly doubt the kernel's claim.

Furthermore, when I _do_ get the tx2000z to boot without noapic, it can't wake
up from suspend-to-ram. When I run 2.6.25.9-76 with noapic, everything works
properly (with the exception of an irq 7: nobody cared).

Comment 9 Alex Chernyakhovsky 2008-08-02 01:47:38 UTC
After some playing with options, including re-installing kmod-nvidia from livna,
I found that 2.6.25-11.97 performs the same way as the earlier kernels, with the
exception that the second core is stuck at 100% Hardware interrupts until IRQ 7
nobody cared occurs. (This uses the noapic option).

Comment 10 Chuck Ebbert 2008-08-12 02:17:55 UTC
(In reply to comment #8)
> Unfortunately, I cannot get the tx2000z to reliably boot without noapic in any
> kernel. I just tried, and received a Machine Check Exception (the whole reason I
> had to use noapic in the first place):
> 
> HARDWARE ERROR
> CPU 0: Machine Check Exception 4 Bank 4: b2000000000f0f
> TSC 11e4deb697
> 
> The kernel claims that this is a hardware issue, but since Windows Vista can
> survive, as can a noapic'd kernel, I strongly doubt the kernel's claim.
> 
> Furthermore, when I _do_ get the tx2000z to boot without noapic, it can't wake
> up from suspend-to-ram. When I run 2.6.25.9-76 with noapic, everything works
> properly (with the exception of an irq 7: nobody cared).

Does it print the line containing " using 0xed I/O delay port" ?

If not, try booting with the option "io_delay=0xed"

Comment 11 Alex Chernyakhovsky 2008-08-12 17:53:38 UTC
I have checked the boot logs. There is no mention of 0xed at all. I continue to get an MCE if I boot without any options.

However, adding  io_delay=0xed seems to fix that. There is still nothing printed, but no MCE, and I am able to suspend to ram. It has survived a few boot ups, and seems to be functioning correctly.

Comment 12 Alex Chernyakhovsky 2008-08-12 17:58:53 UTC
Also, I found out that the 100% CPU utilization (under noapic) seems to be caused  by the USB bus. Once the IRQ 7 nobody cared occurs, the (spurious?) USB interrupts stop, and the CPU utilization would return to idle. 

Furthermore, the io_delay has cleared up the IRQ 7 nobody cared and reduced the total wakeups (as shown with powertop). It is now between 500 and 600 (wake ups per 10 seconds) , depending on use, while earlier, it would be at least 1500 wake ups (per 10 seconds). This is a massive improvement.


Note You need to log in before you can comment on or make changes to this bug.