|Summary:||hpet_alloc Kernel Panic|
|Product:||[Fedora] Fedora||Reporter:||John Williams <web582>|
|Component:||kernel||Assignee:||Kernel Maintainer List <kernel-maint>|
|Status:||CLOSED CURRENTRELEASE||QA Contact:||Fedora Extras Quality Assurance <extras-qa>|
|Fixed In Version:||18.104.22.168-115.fc8||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2008-02-07 20:56:29 UTC||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Cloudforms Team:||---||Target Upstream Version:|
|Bug Depends On:|
Description John Williams 2007-11-21 22:54:12 UTC
Description of problem: Hi, I've done a fresh install of Fedora 8 on three computers. Only one had a problem - a PCChips M811 Motherboard with Athlon XP 1800Hz Processor. The Motherboard doesn't power off the machine at shutdown so I have been running it with the BIOS set to ACPI disabled and booting with acpi=off. There were no problems with Fedora 6 and 7. With Fedora 8 I often get: 'Kernel Panic Not syncing - attempted to kill init' This happens straight after the line 'Uncompressing Linux...OK, booting the Kernel' Occasionally it will boot normally. If the Kernel Panic occurs, pressing the RESET button usually lets the system boot normally. Attached is a photo of the Panic screen. If I change the boot parameter to pci=noacpi instead of acpi=off the panics are fewer. I get 'Uncompressing Linux...OK, booting the Kernel' 'ACPI Unable to load the system description tables.' and then the system boots normally. Occasionally I still get the Kernel Panic straight after the line: 'Uncompressing Linux...OK, booting the Kernel' Version-Release number of selected component (if applicable): 22.214.171.124-49.fc8 How reproducible: Occasionally at Random Steps to Reproduce: 1. Boot or restart the computer 2. 3. Actual results: Kernel Panic - See attachment Expected results: Normal boot Additional info: This has been happening with the first F8 kernel and the updated kernels.
Comment 1 John Williams 2007-11-21 22:54:12 UTC
Created attachment 266371 [details] Screen showing panic message.
Comment 2 John Williams 2007-11-22 09:09:52 UTC
Correction: Apologies, the processor is an AMD Duron 1800Mhz. John.
Comment 3 John Williams 2007-11-22 14:45:51 UTC
I don't know whether this is relevant but I've just noticed that Fedora 8 doesn't detect the floppy drive on this computer although the BIOS does. There is no list of devices beginning fd0 in /dev and there is no floppy icon in the 'Computer' folder on the desktop. John.
Comment 4 Chuck Ebbert 2007-11-27 00:28:17 UTC
Is there any way you can capture the missing line or so at the top of the oops? A serial console might be necessary...
Comment 5 Chuck Ebbert 2007-11-27 00:35:05 UTC
0: 89 54 24 1c mov %edx,0x1c(%esp) 4: 8b 7c 24 30 mov 0x30(%esp),%edi 8: 43 inc %ebx 9: 8b 44 24 34 mov 0x34(%esp),%eax d: 8b 97 f0 00 00 00 mov 0xf0(%edi),%edx 13: 8b 7c 24 28 mov 0x28(%esp),%edi 17: 01 d0 add %edx,%eax 19: 03 47 18 add 0x18(%edi),%eax 1c: 89 46 08 mov %eax,0x8(%esi) 1f: 2b 54 24 1c sub 0x1c(%esp),%edx 23: 39 ea cmp %ebp,%edx 25: 72 dd jb 0x4 27: 89 c8 mov %ecx,%eax 29: 50 push %eax 2a: 9d popf =>2b: 8d 04 05 00 00 00 00 lea 0x0(,%eax,1),%eax Code was replaced by the paravirt code-patching functions.
Comment 6 Chuck Ebbert 2007-11-27 01:03:15 UTC
Warren's oopsing instruction was: 8d b4 26 00 00 00 00 lea 0x0(%esi),%esi Which is the generic 7-byte NOP instead of the K7 optimized one.
Comment 7 Warren Togami 2007-11-27 04:31:49 UTC
http://people.redhat.com/wtogami/temp/hpet_vga1.jpg I managed to make it panic with vga=1. I hope that this is valuable.
Comment 8 Warren Togami 2007-11-27 05:36:11 UTC
http://www.disklessworkstations.com/cgi-bin/web/200029.html VIA CLE266 chipset "T170" thin client is affected by this as well.
Comment 9 John Williams 2007-11-27 13:42:11 UTC
Created attachment 269721 [details] Panic message using vga=1 Same panic message with extra lines at the top
Comment 10 Chuck Ebbert 2007-11-27 22:28:19 UTC
Created attachment 270441 [details] Warren's oops message using vga=1
Comment 11 Warren Togami 2007-11-28 18:29:54 UTC
Created attachment 271591 [details] hpet-panic-seven-0x90.jpg --- ./include/asm-i386/processor.h.orig 2007-11-28 12:10:17.000000000 -0500 +++ ./include/asm-i386/processor.h 2007-11-28 12:10:50.000000000 -0500 @@ -656,7 +656,7 @@ #define GENERIC_NOP4 ".byte 0x8d,0x74,0x26,0x00\n" #define GENERIC_NOP5 GENERIC_NOP1 GENERIC_NOP4 #define GENERIC_NOP6 ".byte 0x8d,0xb6,0x00,0x00,0x00,0x00\n" -#define GENERIC_NOP7 ".byte 0x8d,0xb4,0x26,0x00,0x00,0x00,0x00\n" +#define GENERIC_NOP7 ".byte 0x90,0x90,0x90,0x90,0x90,0x90,0x90\n" #define GENERIC_NOP8 GENERIC_NOP1 GENERIC_NOP7 /* Opteron nops */ Tried replacing the seven byte NOP with seven single byte NOP's. It still panics, this time on the second 0x90 NOP. "divide error" on a NOP?
Comment 12 Chuck Ebbert 2007-11-28 20:23:46 UTC
*** Bug 383551 has been marked as a duplicate of this bug. ***
Comment 13 Warren Togami 2007-11-29 22:56:14 UTC
Created attachment 273401 [details] hpet_paravirt-noreplace.jpg Tried "paravirt-noreplace" as a boot parameter so it wont replace the paravirt calls with NOP's. decodecode on the code shows the divide error happening on a pop... which also isn't supposed to be possible.
Comment 14 Warren Togami 2007-11-29 22:57:28 UTC
Created attachment 273411 [details] hpet_no_paravirt.jpg Rebuilt this kernel with CONFIG_PARAVIRT disabled. It *still* crashes. I think we've established that this self-modifying code and paravirt stuff has nothing to do with this problem. But where does that leave us? =)
Comment 15 Warren Togami 2007-12-05 05:41:34 UTC
Created attachment 277621 [details] T170 thin client lspci -vv lspci -vvn dmesg /proc/cpuinfo This should be enough to blacklist hpet for this buggy motherboard?
Comment 16 Chuck Ebbert 2007-12-05 23:15:28 UTC
The system gets a divide error immediately after enabling interrupts. No divide instruction is anywhere around the oopsing one. It's almost like the hardware interrupt 0 is causing CPU interrupt 0 instead of 32. Vendor says we should blacklist the HPET on this chipset but there doesn't seem to be any infrastructure for that upstream.
Comment 17 Warren Togami 2007-12-05 23:34:07 UTC
Created attachment 278901 [details] dmesg from Ubuntu's 2.6.22-based kernel hpet kernel option settings for Ubuntu's kernel: CONFIG_HPET=y CONFIG_HPET_MMAP=y # CONFIG_HPET_RTC_IRQ is not set CONFIG_HPET_TIMER=y CONFIG_HPET_EMULATE_RTC=y No mention of hpet anywhere in the boot, however it does initialize clocksources tsc and acpi_pm successfully. Perhaps it never crashed prior to 2.6.23 because it somehow didn't attempt to initialize hpet at all?
Comment 18 Warren Togami 2007-12-06 21:55:28 UTC
Created attachment 280221 [details] Full config for the above Ubuntu kernel
Comment 19 Thomas Gleixner 2007-12-10 06:32:12 UTC
(In reply to comment #17) > No mention of hpet anywhere in the boot, however it does initialize > clocksources tsc and acpi_pm successfully. Perhaps it never crashed prior to > 2.6.23 because it somehow didn't attempt to initialize hpet at all? Right. Does the problem still exist with 2.6.24-rc4-latest-git ?
Comment 20 Thomas Gleixner 2007-12-10 06:35:26 UTC
Also can you reliably boot with "hpet=disable" on the kernel command line ?
Comment 21 John Williams 2007-12-10 10:23:34 UTC
In the case of the M811 Motherboard, acpi=off hpet=disable seems to be working fine. There have been no panics yet. There is a line in dmesg:- Force enabled HPET at base address 0xfed00000
Comment 22 Thomas Gleixner 2007-12-10 11:03:15 UTC
(In reply to comment #21) > In the case of the M811 Motherboard, acpi=off hpet=disable seems to be working > fine. There have been no panics yet. There is a line in dmesg:- > Force enabled HPET at base address 0xfed00000 Ah, that gives a clue. The force enable of HPET on this chipset is causing the trouble. What happens if you omit "acpi=off" and only put "hpet=disable" on the command line ?
Comment 23 Thomas Gleixner 2007-12-10 11:38:24 UTC
Chuck, Warren, we have a patch in mainline, which prevents that the HPET is force enabled automatically on undocumented chipsets. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b17530bda22e7ffbf08f7a8a50743256b1672f6a The chipset support for the 8237/39 has a check for the hpet=force command line option. http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b196884e2f5d45fb505b46011e41ca95e0859e34 This is probably missing in the f8 2.6.23 based kernels. I probably never ported that back to the -hrt patches. We added this, because we did not trust the undocumented features. Sorry that I missed to backport it. I can do this tomorrow when I'm back from India. Have no access to my main devel bbox right now. Thanks, tglx
Comment 24 John Williams 2007-12-10 12:07:26 UTC
It works equally well with just "hpet=disable" (I normally have acpi=off because acpi is diabled in BIOS)
Comment 25 Thomas Gleixner 2007-12-10 12:56:54 UTC
Chuck, I uploaded an untested 2.6.23-hrt4 to http://www.kernel.org/pub/linux/kernel/people/tglx/hrtimers/2.6.23/ Has not propagated yet to the public servers, but should show up soon. It has the backport of the above checks and the hrtimer-prevent-overflow one. Thanks, tglx
Comment 26 Chuck Ebbert 2007-12-10 19:05:25 UTC
-hrt4 is in 126.96.36.199-87
Comment 27 Fedora Update System 2008-01-15 22:55:54 UTC
kernel-188.8.131.52-105.fc8 has been pushed to the Fedora 8 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update kernel'
Comment 28 Fedora Update System 2008-01-24 22:01:20 UTC
kernel-184.108.40.206-115.fc8 has been pushed to the Fedora 8 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update kernel'
Comment 29 Fedora Update System 2008-02-07 20:56:19 UTC
kernel-220.127.116.11-115.fc8 has been pushed to the Fedora 8 stable repository. If problems still persist, please make note of it in this bug report.