Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 79884 - Oops with 2.4.18-18.7.x
Summary: Oops with 2.4.18-18.7.x
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 7.3
Hardware: All
OS: Linux
Target Milestone: ---
Assignee: Arjan van de Ven
QA Contact: Brian Brock
Depends On:
TreeView+ depends on / blocked
Reported: 2002-12-17 20:05 UTC by Michal Jaegermann
Modified: 2007-04-18 16:49 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2003-01-11 18:27:49 UTC

Attachments (Terms of Use)

Description Michal Jaegermann 2002-12-17 20:05:24 UTC
Description of problem:

On one machines around 6:40 local time, when a computer in question was
really not doing anything, a kernel oopsed and a machine went down.
An attempt of an autoreboot (nobody was around) ended up with

Uncompressing Linux....

crc error

-- System halted

Only later when a machine was powered down manually it was possible
to power it up and restart.

Here is a decoded oops.

Unable to handle kernel NULL pointer dereference at virtual address 00000005
 printing eip:
*pde = 00000000
Oops: 0002
CPU:    0
EIP:    0010:[<c0116a3a>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00210092
eax: 00000001   ebx: 00200292   ecx: 00000002   edx: dd97c03c
esi: cd97c000   edi: cd97c008   ebp: 00000000   esp: d390ff1c
ds: 0018   es: 0018   ss: 0018
Process gnome-smproxy (pid: 23583, stackpage=d390f000)
Stack: dd97c038 c0146a6e 00000000 d6a68340 00000001 c0146e26 d390ff54 d390ff54 
       00000020 d390e000 7fffffff 00000006 00000000 00000006 00000000 cd97c000 
       00000001 bffff7f4 deb1dd58 00000006 c01471a9 00000006 d390ff90 d390ff8c 
Call Trace: [<c0146a6e>] poll_freewait [kernel] 0x2e (0xd390ff20))
[<c0146e26>] do_select [kernel] 0x226 (0xd390ff30))
[<c01471a9>] sys_select [kernel] 0x339 (0xd390ff6c))
[<c010893b>] system_call [kernel] 0x33 (0xd390ffc0))
Code: 89 48 04 89 01 53 9d 5b c3 8d b6 00 00 00 00 8d bc 27 00 00 

>>EIP; c0116a3a <remove_wait_queue+a/20>   <=====
Trace; c0146a6e <poll_freewait+2e/50>
Trace; c0146e26 <do_select+226/240>
Trace; c01471a9 <sys_select+339/480>
Trace; c010893b <system_call+33/38>
Code;  c0116a3a <remove_wait_queue+a/20>
00000000 <_EIP>:
Code;  c0116a3a <remove_wait_queue+a/20>   <=====
   0:   89 48 04                  mov    %ecx,0x4(%eax)   <=====
Code;  c0116a3d <remove_wait_queue+d/20>
   3:   89 01                     mov    %eax,(%ecx)
Code;  c0116a3f <remove_wait_queue+f/20>
   5:   53                        push   %ebx
Code;  c0116a40 <remove_wait_queue+10/20>
   6:   9d                        popf   
Code;  c0116a41 <remove_wait_queue+11/20>
   7:   5b                        pop    %ebx
Code;  c0116a42 <remove_wait_queue+12/20>
   8:   c3                        ret    
Code;  c0116a43 <remove_wait_queue+13/20>
   9:   8d b6 00 00 00 00         lea    0x0(%esi),%esi
Code;  c0116a49 <remove_wait_queue+19/20>
   f:   8d bc 27 00 00 00 00      lea    0x0(%edi,1),%edi

Version-Release number of selected component (if applicable):

Comment 1 Ben LaHaise 2002-12-17 20:37:02 UTC
Is the hardware for this machine known good?  Does it pass an overnight run of
memtest86?  The fact that a boot failed with an invalid crc strongly hints at
that, or possibly the cpu overheating.

Comment 2 Michal Jaegermann 2002-12-17 22:49:47 UTC
> Is the hardware for this machine known good?

Well, it is in a continous use for the last two years and this is the
first incident of that sort (some three weeks after 2.4.18-18.7.x
was installed). In other words so far hardware looked good. :-)
It runs for now after a powerdown and reboot.

memtest86 did not run so far and this is not that easy as the machine
is quite far from my desk. :-)  That is still an open option but not
that easy to arrange.

Comment 3 Michal Jaegermann 2002-12-21 17:44:29 UTC
As for today this oops looks like it was really caused by a broken CPU fan.
I will monitor the situation further.

Comment 4 Michal Jaegermann 2003-01-11 18:27:49 UTC
It definitely was a broken fan.

Note You need to log in before you can comment on or make changes to this bug.