Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 79924 - Kernel BUG at page_alloc.c:220!
Summary: Kernel BUG at page_alloc.c:220!
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 7.2
Hardware: i686
OS: Linux
high
high
Target Milestone: ---
Assignee: Arjan van de Ven
QA Contact: Brian Brock
URL:
Whiteboard:
: 80023 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2002-12-18 02:39 UTC by Paul Zimdars
Modified: 2005-10-31 22:00 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2004-09-30 15:40:18 UTC


Attachments (Terms of Use)

Description Paul Zimdars 2002-12-18 02:39:29 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20020823
Netscape/7.0

Description of problem:
We have a 64 node cluster. We run a scientific job that heavily depends on
memory and cpu. 

Here is the uname output from a node:

Linux mach-0-0 2.4.18-17.7.xsmp #6 Tue Dec 17 16:41:44 PST 2002 i686 unknown

The error below can be caused by any process such as (bash, sh, kswapd, etc..).
I also turned off SMP and gave the test a try without a single crash. When I
turned SMP back on the nodes would start to die. We loose between 5-10 nodes out
of 64 each run and usually within the first 10-15 minutes.


Nov 22 18:51:59 mach-0-35 kernel: kernel BUG at page_alloc.c:220!
Nov 22 18:51:59 mach-0-35 kernel: invalid operand: 0000
Nov 22 18:51:59 mach-0-35 kernel: CPU:    0
Nov 22 18:51:59 mach-0-35 kernel: EIP:    0010:[rmqueue+525/592]    Not tainted
Nov 22 18:51:59 mach-0-35 kernel: EIP:    0010:[<c0132c6d>]    Not tainted
Nov 22 18:51:59 mach-0-35 kernel: EFLAGS: 00010202
Nov 22 18:51:59 mach-0-35 kernel: eax: 00000040   ebx: c23bc8f0   ecx: 00038000
  edx: 0006942f
Nov 22 18:51:59 mach-0-35 kernel: esi: c028b128   edi: 00048000   ebp: c1000020
  esp: efe31dcc
Nov 22 18:51:59 mach-0-35 kernel: ds: 0018   es: 0018   ss: 0018
Nov 22 18:51:59 mach-0-35 kernel: Process mlsl2 (pid: 1928, stackpage=efe31000)
Nov 22 18:51:59 mach-0-35 kernel: Stack: 00038000 0003142f 00000296 00000000
c028b128 c028b200 000001ff 00000000
Nov 22 18:51:59 mach-0-35 kernel:        00000025 c0132f01 c028b128 c028b1fc
000001d2 00000018 00104025 00000000
Nov 22 18:51:59 mach-0-35 kernel:        00000001 00000025 c0127ded 69430025
00000000 f69451c0 f61bec60 efef2118
Nov 22 18:51:59 mach-0-35 kernel: Call Trace:    [__alloc_pages+81/384]
[do_anonymous_page+93/368] [do_no_page+71/576] [it_real_fn+16/80] [han
dle_mm_fault+154/288]
Nov 22 18:51:59 mach-0-35 kernel: Call Trace:    [<c0132f01>] [<c0127ded>]
[<c0127f47>] [<c011c5e0>] [<c01281da>]
Nov 22 18:51:59 mach-0-35 kernel:   [<c011d57b>] [<c011d431>] [<c012900a>]
[<c010a64d>] [<c011472a>] [<c012939b>]
Nov 22 18:51:59 mach-0-35 kernel:   [<c01293ab>] [<c010ea9e>] [<c0114570>]
[<c0108bfc>]
Nov 22 18:51:59 mach-0-35 kernel: Code: 0f 0b dc 00 81 4b 25 c0 8b 43 18 a9 80
00 00 00 74 08 0f 0b


Version-Release number of selected component (if applicable):


How reproducible:
Always

Steps to Reproduce:
1.We have a program that processes satellite data using PVM.
2.Tried it with and without PVM. Same results.


    

Actual Results:  5-10 nodes would die.

Expected Results:  No crash.

Additional info:

64 node cluster configuration. The drives are IDE, we used RedHat 7.2, ext3, 2
GB virtual memory and 4gb swap.

Comment 1 Arjan van de Ven 2002-12-18 10:14:41 UTC
First of all this trace sort of looks to be from a modified kernel.
Can you attach dmesg, lsmod and lspci from such a system before it oopses?

Comment 2 Paul Zimdars 2002-12-18 12:12:40 UTC
Hi,

Well I have used my own 2.4.19 modified kernel, 2.4.18 xsmp kernel, and a
modified 2.4.18 redhat source (I removed almost everything and smp for a
test)but still had the same crashes. The only time all the nodes have not
crashed was when I disabled SMP. I could provide more errors. Another one from a
different node has been placed at the end.

Here is lsmod:
[root@mach-0-35 root]# lsmod
Module                  Size  Used by    Not tainted
[root@mach-0-35 root]#

[root@mach-0-35 root]# lspci
00:00.0 Host bridge: ServerWorks: Unknown device 0012 (rev 13)
00:00.1 Host bridge: ServerWorks: Unknown device 0012
00:00.2 Host bridge: ServerWorks: Unknown device 0000
00:02.0 VGA compatible controller: ATI Technologies Inc Rage XL (rev 27)
00:04.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 0d)
00:0f.0 ISA bridge: ServerWorks CSB5 South Bridge (rev 93)
00:0f.1 IDE interface: ServerWorks CSB5 IDE Controller (rev 93)
00:0f.3 Host bridge: ServerWorks: Unknown device 0225
00:10.0 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
00:10.2 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
00:11.0 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
00:11.2 Host bridge: ServerWorks: Unknown device 0101 (rev 03)
01:03.0 Ethernet controller: BROADCOM Corporation NetXtreme BCM5701 Gigabit
Ethernet (rev 15)

Linux version 2.4.18-17.7.x (root@mach-0-0) (gcc version 2.96 20000731 (Red Hat
Linux 7.1 2.96-98)) #6 Tue Dec 17 16:41:44 PST 2002
BIOS-provided physical RAM map:
 BIOS-e820: 0000000000000000 - 000000000009f400 (usable)
 BIOS-e820: 000000000009f400 - 000000000009f800 (reserved)
 BIOS-e820: 00000000000e0000 - 0000000000100000 (reserved)
 BIOS-e820: 0000000000100000 - 0000000080000000 (usable)
 BIOS-e820: 00000000fee00000 - 00000000fee01000 (reserved)
 BIOS-e820: 00000000fff80000 - 0000000100000000 (reserved)
1152MB HIGHMEM available.
896MB LOWMEM available.
On node 0 totalpages: 524288
zone(0): 4096 pages.
zone(1): 225280 pages.
zone(2): 294912 pages.
Kernel command line: auto BOOT_IMAGE=bzImage ro root=303 BOOT_FILE=/boot/bzImage
console=ttyS0
Initializing CPU#0
Detected 2199.941 MHz processor.
Console: colour VGA+ 80x25
Calibrating delay loop... 4364.15 BogoMIPS
Memory: 2061592k/2097152k available (1205k kernel code, 30948k reserved, 337k
data, 236k init, 1179648k highmem)
Dentry cache hash table entries: 262144 (order: 9, 2097152 bytes)
Inode cache hash table entries: 131072 (order: 8, 1048576 bytes)
Mount cache hash table entries: 32768 (order: 6, 262144 bytes)
ramfs: mounted with options: <defaults>
ramfs: max_pages=258227 max_file_pages=0 max_inodes=0 max_dentries=258227
Buffer cache hash table entries: 131072 (order: 7, 524288 bytes)
Page-cache hash table entries: 524288 (order: 9, 2097152 bytes)
CPU: Before vendor init, caps: 3febfbff 00000000 00000000, vendor = 0
CPU: L1 I cache: 0K, L1 D cache: 8K
CPU: L2 cache: 512K
CPU: After vendor init, caps: 3febfbff 00000000 00000000 00000000
Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#0.
CPU:     After generic, caps: 3febfbff 00000000 00000000 00000000
CPU:             Common caps: 3febfbff 00000000 00000000 00000000
CPU: Intel(R) XEON(TM) CPU 2.20GHz stepping 04
Enabling fast FPU save and restore... done.
Enabling unmasked SIMD FPU exception support... done.
Checking 'hlt' instruction... OK.
POSIX conformance testing by UNIFIX
mtrr: v1.40 (20010327) Richard Gooch (rgooch@atnf.csiro.au)
mtrr: detected mtrr type: Intel
PCI: PCI BIOS revision 2.10 entry at 0xfdba1, last bus=4
PCI: Using configuration type 1
PCI: Probing PCI hardware
PCI: Discovered primary peer bus 01 [IRQ]
PCI: Discovered primary peer bus 02 [IRQ]
PCI: Discovered primary peer bus 03 [IRQ]
PCI: Discovered primary peer bus 04 [IRQ]
Linux NET4.0 for Linux 2.4
Based upon Swansea University Computer Society NET3.039
Initializing RT netlink socket
cpufreq: Intel(R) SpeedStep(TM) support $Revision: 1.34 $
cpufreq: Intel(R) SpeedStep(TM) for this chipset not (yet) available.
cpufreq: CPU#0 P4/Xeon(TM) CPU On-Demand Clock Modulation available
CPU clock: 2199.941 MHz (219.994-2199.941 MHz)
Starting kswapd
allocated 64 pages and 64 bhs reserved for the highmem bounces
Journalled Block Device driver loaded
Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
pty: 2048 Unix98 ptys configured
Serial driver version 5.05c (2001-07-08) with MANY_PORTS SHARE_IRQ SERIAL_PCI
enabled
ttyS0 at 0x03f8 (irq = 4) is a 16550A
ttyS1 at 0x02f8 (irq = 3) is a 16550A
Real Time Clock Driver v1.10e
oprofile: can't get RTC I/O Ports
block: 1024 slots per queue, batch=256
Uniform Multi-Platform E-IDE driver Revision: 6.31
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
SvrWks CSB5: IDE controller on PCI bus 00 dev 79
SvrWks CSB5: chipset revision 147
SvrWks CSB5: not 100% native mode: will probe irqs later
    ide0: BM-DMA at 0xffa0-0xffa7, BIOS settings: hda:DMA, hdb:pio
 ide1: BM-DMA at 0xffa8-0xffaf, BIOS settings: hdc:pio, hdd:pio
hda: ST340016A, ATA DISK drive
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
hda: 78165360 sectors (40021 MB) w/2048KiB Cache, CHS=77545/16/63, UDMA(100)
Partition check:
 hda: hda1 hda2 hda3 hda4 < hda5 hda6 hda7 >
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
eepro100.c:v1.09j-t 9/29/99 Donald Becker http://www.scyld.com/network/eepro100.html
eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin
<saw@saw.sw.com.sg> and others
eth0: OEM i82557/i82558 10/100 Ethernet, 00:30:48:51:7E:7E, IRQ 10.
  Board assembly 000000-000, Physical connectors present: RJ45
  Primary interface chip i82555 PHY #1.
  General self-test: passed.
  Serial sub-system self-test: passed.
  Internal registers self-test: passed.
  ROM checksum self-test: passed (0xb874c1d3).
tg3.c:v1.1 (Aug 30, 2002)
eth1: Tigon3 [partno(BCM95700A6) rev 0105 PHY(5701)] (PCIX:100MHz:64-bit)
10/100/1000BaseT Ethernet 00:30:48:51:7c:8d
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP, IGMP
IP: routing cache hash table of 16384 buckets, 128Kbytes
TCP: Hash tables configured (established 262144 bind 65536)
Linux IP multicast router 0.06 plus PIM-SM
ip_conntrack (8192 buckets, 65536 max)
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
kjournald starting.  Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
VFS: Mounted root (ext3 filesystem) readonly.
Freeing unused kernel memory: 236k freed
Adding Swap: 2048216k swap-space (priority -1)
Adding Swap: 2048248k swap-space (priority -2)
EXT3 FS 2.4-0.9.18, 14 May 2002 on ide0(3,3), internal journal
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.18, 14 May 2002 on ide0(3,1), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
kjournald starting.  Commit interval 5 seconds
EXT3 FS 2.4-0.9.18, 14 May 2002 on ide0(3,5), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
tg3: eth1: Link is up at 1000 Mbps, full duplex.
tg3: eth1: Flow control is off for TX and off for RX.

Nov 20 18:57:06 mach-0-51 kernel: kernel BUG at page_alloc.c:220!
Nov 20 18:57:06 mach-0-51 kernel: invalid operand: 0000
Nov 20 18:57:06 mach-0-51 kernel: CPU:    0
Nov 20 18:57:06 mach-0-51 kernel: EIP:    0010:[<c0130cdd>]    Not tainted
Nov 20 18:57:06 mach-0-51 kernel: EFLAGS: 00010202
Nov 20 18:57:06 mach-0-51 kernel: eax: 00000040   ebx: c1b39ea0   ecx: 00038000
  edx: 0003bdf8
Nov 20 18:57:06 mach-0-51 kernel: esi: c02a9a88   edi: 00048000   ebp: c1000020
  esp: f55f3e14
Nov 20 18:57:06 mach-0-51 kernel: ds: 0018   es: 0018   ss: 0018
Nov 20 18:57:06 mach-0-51 kernel: Process mlsl2 (pid: 1415, stackpage=f55f3000)
Nov 20 18:57:06 mach-0-51 kernel: Stack: 00038000 00003df8 00000286 00000000
c02a9a88 c02a9b60 000001ff 00000000
Nov 20 18:57:06 mach-0-51 kernel:        00181002 c0130f71 c02a9a88 c02a9b5c
000001d2 00002945 00000000 00000000
Nov 20 18:57:06 mach-0-51 kernel:        0c1ab98c 00181002 c0131674 00000002
00000000 00000008 f60789ac c01260dd
Nov 20 18:57:06 mach-0-51 kernel: Call Trace:    [<c0130f71>] [<c0131674>]
[<c01260dd>] [<c0126121>] [<c0126592>]
Nov 20 18:57:06 mach-0-51 kernel:   [<c01086ad>] [<c0113502>] [<c011f3c6>]
[<c011f619>] [<c011c10b>] [<c011bfc1>]
Nov 20 18:57:06 mach-0-51 kernel:   [<c011bd4b>] [<c0113350>] [<c0108bfc>]
Nov 20 18:57:06 mach-0-51 kernel: Code: 0f 0b dc 00 56 aa 26 c0 8b 43 18 a9 80
00 00 00 74 08 0f 0b

below is another node:

Nov 19 18:02:48 mach-0-55 kernel: kernel BUG at page_alloc.c:220!
Nov 19 18:02:48 mach-0-55 kernel: invalid operand: 0000
Nov 19 18:02:48 mach-0-55 kernel: CPU:    0
Nov 19 18:02:48 mach-0-55 kernel: EIP:    0010:[<c0130cdd>]    Not tainted
Nov 19 18:02:48 mach-0-55 kernel: EFLAGS: 00010202
Nov 19 18:02:48 mach-0-55 kernel: eax: 00000040   ebx: c122dea0   ecx: 00001000
  edx: 0000b9f8
Nov 19 18:02:48 mach-0-55 kernel: esi: c02a99d4   edi: 00037000   ebp: c1000020
  esp: f60dfe24
Nov 19 18:02:48 mach-0-55 kernel: ds: 0018   es: 0018   ss: 0018
Nov 19 18:02:48 mach-0-55 kernel: Process mlsl2 (pid: 1573, stackpage=f60df000)
Nov 19 18:02:48 mach-0-55 kernel: Stack: 00001000 0000a9f8 00000286 00000000
c02a99d4 c02a9b64 000003fd 00000000
Nov 19 18:02:48 mach-0-55 kernel:        f6ae4180 c0130f71 c02a9a88 c02a9b5c
000001d2 00000018 00000001 f67e4a80
Nov 19 18:02:48 mach-0-55 kernel:        00104025 f6ae4180 c012623b f67e4a80
63052000 f67e4a80 f6ae4180 c0126344
Nov 19 18:02:48 mach-0-55 kernel: Call Trace:    [<c0130f71>] [<c012623b>]
[<c0126344>] [<c0126582>] [<c0126fe9>]
Nov 19 18:02:48 mach-0-55 kernel:   [<c0113502>] [<c010ea2e>] [<c011bd4b>]
[<c0113350>] [<c0108bfc>]
Nov 19 18:02:48 mach-0-55 kernel:
Nov 19 18:02:48 mach-0-55 kernel: Code: 0f 0b dc 00 56 aa 26 c0 8b 43 18 a9 80
00 00 00 74 08 0f 0b

Comment 3 Arjan van de Ven 2002-12-18 12:17:47 UTC
well in your modified kernel you have disabled the config option that we enable
to get usable backtraces........ makes it hard to investigate you know 

Comment 4 Paul Zimdars 2002-12-18 18:44:25 UTC
Hi,

Here is some more information, do any f these help? If not I will put back the
RedHat XSMP kernel and get more info or I can enable all the kernel debugging
options?

Dec  5 01:06:01 mach-0-7 kernel: kernel BUG at page_alloc.c:220!
Dec  5 01:06:01 mach-0-7 kernel: invalid operand: 0000
Dec  5 01:06:01 mach-0-7 kernel: CPU:    0
Dec  5 01:06:01 mach-0-7 kernel: EIP:    0010:[rmqueue+525/592]    Not tainted
Dec  5 01:06:01 mach-0-7 kernel: EIP:    0010:[<c0132c6d>]    Not tainted
Dec  5 01:06:01 mach-0-7 kernel: EFLAGS: 00010202
Dec  5 01:06:01 mach-0-7 kernel: eax: 00000040   ebx: c2020d70   ecx: 00038000 
 edx: 00056047
Dec  5 01:06:01 mach-0-7 kernel: esi: c028b128   edi: 00048000   ebp: c1000020 
 esp: ef16fdcc
Dec  5 01:06:01 mach-0-7 kernel: ds: 0018   es: 0018   ss: 0018
Dec  5 01:06:01 mach-0-7 kernel: Process mlsl2 (pid: 2775, stackpage=ef16f000)
Dec  5 01:06:01 mach-0-7 kernel: Stack: 00038000 0001e047 00000296 00000000
c028b128 c028b200 000001ff 00000000 
Dec  5 01:06:01 mach-0-7 kernel:        00000025 c0132f01 c028b128 c028b1fc
000001d2 00000044 00104025 00000000 
Dec  5 01:06:01 mach-0-7 kernel:        00000001 00000025 c0127ded 66e48025
00000000 f6997400 f6a48920 ef25c0d8 
Dec  5 01:06:01 mach-0-7 kernel: Call Trace:    [__alloc_pages+81/384]
[do_anonymous_page+93/368] [do_no_page+71/576] [handle_mm_fault+154/288]
[ip_frag_create+16/192]
Dec  5 01:06:01 mach-0-7 kernel: Call Trace:    [<c0132f01>] [<c0127ded>]
[<c0127f47>] [<c01281da>] [<c0206860>]
Dec  5 01:06:01 mach-0-7 kernel:   [ip_frag_queue+516/864]
[ip_frag_queue+128/864] [eth_type_trans+115/192] [do_page_fault+442/1401]
[ip_frag_queue+128/864] [set_rx_mode+369/1504]
Dec  5 01:06:01 mach-0-7 kernel:   [<c0206b14>] [<c0206990>] [<c01fec83>]
[<c011472a>] [<c0206990>] [<c01c4b41>]
Dec  5 01:06:01 mach-0-7 kernel:   [update_wall_time+38/80] [timer_bh+73/976]
[update_process_times+48/160] [do_page_fault+0/1401] [error_code+52/60]
Dec  5 01:06:01 mach-0-7 kernel:   [<c0120836>] [<c0120a89>] [<c0120980>]
[<c0114570>] [<c0108bfc>]
Dec  5 01:06:01 mach-0-7 kernel: 
Dec  5 01:06:01 mach-0-7 kernel: Code: 0f 0b dc 00 81 4b 25 c0 8b 43 18 a9 80 00
00 00 74 08 0f 0b


Nov 26 17:53:00 mach-0-39 kernel: kernel BUG at page_alloc.c:220!
Nov 26 17:53:00 mach-0-39 kernel: invalid operand: 0000
Nov 26 17:53:00 mach-0-39 kernel: CPU:    0
Nov 26 17:53:00 mach-0-39 kernel: EIP:    0010:[rmqueue+525/592]    Not tainted
Nov 26 17:53:00 mach-0-39 kernel: EIP:    0010:[<c0132c6d>]    Not tainted
Nov 26 17:53:00 mach-0-39 kernel: EFLAGS: 00010202
Nov 26 17:53:00 mach-0-39 kernel: eax: 00000040   ebx: c25e5cd0   ecx: 00038000
  edx: 00074c99
Nov 26 17:53:00 mach-0-39 kernel: esi: c028b128   edi: 00048000   ebp: c1000020
  esp: e0507dbc
Nov 26 17:53:00 mach-0-39 kernel: ds: 0018   es: 0018   ss: 0018
Nov 26 17:53:00 mach-0-39 kernel: Process mlsl2 (pid: 1441, stackpage=e0507000)
Nov 26 17:53:00 mach-0-39 kernel: Stack: 00038000 0003cc99 00000292 00000000
c028b128 c028b200 000001ff 00000000 
Nov 26 17:53:00 mach-0-39 kernel:        0005ca02 c0132f01 c028b128 c028b1fc
000001d2 0005c902 00000000 00000000 
Nov 26 17:53:00 mach-0-39 kernel:        0c0854f6 0005ca02 c0133604 00000002
00000002 00000008 00000000 c0127bfd 
Nov 26 17:53:00 mach-0-39 kernel: Call Trace:    [__alloc_pages+81/384]
[read_swap_cache_async+116/158] [swapin_readahead+77/80] [do_swap_page+70/400]
[handle_mm_fault+180/288]
Nov 26 17:53:00 mach-0-39 kernel: Call Trace:    [<c0132f01>] [<c0133604>]
[<c0127bfd>] [<c0127c46>] [<c01281f4>]
Nov 26 17:53:00 mach-0-39 kernel:   [sys_getsockname+44/128]
[sys_sendto+166/240] [ip_route_output_slow+740/1648]
[ip_route_output_slow+304/1648] [neigh_proxy_process+243/288]
[do_page_fault+442/1401]
Nov 26 17:53:00 mach-0-39 kernel:   [<c01f483c>] [<c01f49b6>] [<c0206b44>]
[<c0206990>] [<c01fec83>] [<c011472a>]
Nov 26 17:53:00 mach-0-39 kernel:   [process_timeout+0/96]
[update_wall_time+38/80] [timer_bh+73/976] [update_process_times+48/160]
[smp_apic_timer_interrupt+239/288] [do_page_fault+0/1401]
Nov 26 17:53:00 mach-0-39 kernel:   [<c01151f0>] [<c0120836>] [<c0120a89>]
[<c0120980>] [<c0112b4f>] [<c0114570>]
Nov 26 17:53:00 mach-0-39 kernel:   [error_code+52/60]
Nov 26 17:53:00 mach-0-39 kernel:   [<c0108bfc>]
Nov 26 17:53:00 mach-0-39 kernel: 
Nov 26 17:53:00 mach-0-39 kernel: Code: 0f 0b dc 00 81 4b 25 c0 8b 43 18 a9 80
00 00 00 74 08 0f 0b


Nov 27 12:53:48 mach-0-39 kernel: CPU:    0
Nov 27 12:53:48 mach-0-39 kernel: EIP:    0010:[rmqueue+525/592]    Not tainted
Nov 27 12:53:48 mach-0-39 kernel: EIP:    0010:[<c0132c6d>]    Not tainted
Nov 27 12:53:48 mach-0-39 kernel: EFLAGS: 00010202
Nov 27 12:53:48 mach-0-39 kernel: eax: 00000040   ebx: c1fc8a80   ecx: 00038000
  edx: 000542e2
Nov 27 12:53:48 mach-0-39 kernel: esi: c028b128   edi: 00048000   ebp: c1000020
  esp: ef9e9dcc
Nov 27 12:53:48 mach-0-39 kernel: ds: 0018   es: 0018   ss: 0018
Nov 27 12:53:48 mach-0-39 kernel: Process mlsl2 (pid: 2040, stackpage=ef9e9000)
Nov 27 12:53:48 mach-0-39 kernel: Stack: 00038000 0001c2e2 00000296 00000000
c028b128 c028b200 000001ff 00000000 
Nov 27 12:53:48 mach-0-39 kernel:        00000025 c0132f01 c028b128 c028b1fc
000001d2 00000018 00104025 00000000 
Nov 27 12:53:48 mach-0-39 kernel:        00000001 00000025 c0127ded 442b7025
00000000 f65a5f20 f6421b60 c4fc7848 
Nov 27 12:53:48 mach-0-39 kernel: Call Trace:    [__alloc_pages+81/384]
[do_anonymous_page+93/368] [do_no_page+71/576] [handle_mm_fault+154/288]
[ip_route_output_slow+0/1648]
Nov 27 12:53:48 mach-0-39 kernel: Call Trace:    [<c0132f01>] [<c0127ded>]
[<c0127f47>] [<c01281da>] [<c0206860>]
Nov 27 12:53:48 mach-0-39 kernel:   [ip_route_output_slow+692/1648]
[ip_route_output_slow+304/1648] [neigh_proxy_process+243/288]
[do_page_fault+442/1401] [update_process_times+48/160] [sys_brk+202/240]
Nov 27 12:53:48 mach-0-39 kernel:   [<c0206b14>] [<c0206990>] [<c01fec83>]
[<c011472a>] [<c0120980>] [<c012874a>]
Nov 27 12:53:48 mach-0-39 kernel:   [do_page_fault+0/1401] [error_code+52/60]
Nov 27 12:53:48 mach-0-39 kernel:   [<c0114570>] [<c0108bfc>]
Nov 27 12:53:48 mach-0-39 kernel: 
Nov 27 12:53:48 mach-0-39 kernel: Code: 0f 0b dc 00 81 4b 25 c0 8b 43 18 a9 80
00 00 00 74 08 0f 0b 
Nov 27 12:53:49 mach-0-39 kernel:  kernel BUG at page_alloc.c:220!
Nov 27 12:53:49 mach-0-39 kernel: invalid operand: 0000
Nov 27 12:53:49 mach-0-39 kernel: CPU:    0
Nov 27 12:53:49 mach-0-39 kernel: EIP:    0010:[rmqueue+525/592]    Not tainted
Nov 27 12:53:49 mach-0-39 kernel: EIP:    0010:[<c0132c6d>]    Not tainted
Nov 27 12:53:49 mach-0-39 kernel: EFLAGS: 00010202
Nov 27 12:53:49 mach-0-39 kernel: eax: 00000040   ebx: c1ccbfc0   ecx: 00038000
  edx: 000443fe
Nov 27 12:53:49 mach-0-39 kernel: esi: c028b128   edi: 00048000   ebp: c1000020
  esp: ef773dd0
Nov 27 12:53:49 mach-0-39 kernel: ds: 0018   es: 0018   ss: 0018
Nov 27 12:53:49 mach-0-39 kernel: Process pvmd3 (pid: 1993, stackpage=ef773000)
Nov 27 12:53:49 mach-0-39 kernel: Stack: 00038000 0000c3fe 00000282 00000000
c028b128 c028b200 000001ff 00000000 
Nov 27 12:53:49 mach-0-39 kernel:        0016f502 c0132f01 c028b128 c028b1fc 
000001d2 0016f502 00000000 00000000 
Nov 27 12:53:49 mach-0-39 kernel:        0c197ff6 0016f502 c0133604 0016f502
00000000 080876c4 00000000 c0127c4c 
Nov 27 12:53:49 mach-0-39 kernel: Call Trace:    [__alloc_pages+81/384]
[read_swap_cache_async+116/158] [do_swap_page+76/400] [handle_mm_fault+180/288]
[do_page_fault+442/1401]
Nov 27 12:53:49 mach-0-39 kernel: Call Trace:    [<c0132f01>] [<c0133604>]
[<c0127c4c>] [<c01281f4>] [<c011472a>]
Nov 27 12:53:49 mach-0-39 kernel:   [copy_page_range+397/624]
[build_mmap_rb+84/96] [do_fork+1746/2048] [sys_close+4/112]
[do_page_fault+0/1401] [error_code+52/60]
Nov 27 12:53:49 mach-0-39 kernel:   [<c01267bd>] [<c0129994>] [<c0117e02>]
[<c0139784>] [<c0114570>] [<c0108bfc>]
Nov 27 12:53:49 mach-0-39 kernel: 
Nov 27 12:53:49 mach-0-39 kernel: Code: 0f 0b dc 00 81 4b 25 c0 8b 43 18 a9 80
00 00 00 74 08 0f 0b 
Nov 27 12:53:49 mach-0-39 kernel:  kernel BUG at page_alloc.c:220!
Nov 27 12:53:49 mach-0-39 kernel: invalid operand: 0000
Nov 27 12:53:49 mach-0-39 kernel: CPU:    0
Nov 27 12:53:49 mach-0-39 kernel: EIP:    0010:[rmqueue+525/592]    Not tainted
Nov 27 12:53:49 mach-0-39 kernel: EIP:    0010:[<c0132c6d>]    Not tainted
Nov 27 12:53:49 mach-0-39 kernel: EFLAGS: 00010202
Nov 27 12:53:49 mach-0-39 kernel: eax: 00000040   ebx: c20c4c60   ecx: 00038000
  edx: 000596ec
Nov 27 12:53:49 mach-0-39 kernel: esi: c028b128   edi: 00048000   ebp: c1000020
  esp: e246ddcc
Nov 27 12:53:49 mach-0-39 kernel: ds: 0018   es: 0018   ss: 0018
Nov 27 12:53:49 mach-0-39 kernel: Process mlsl2 (pid: 2047, stackpage=e246d000)
Nov 27 12:53:49 mach-0-39 kernel: Stack: 00038000 000216ec 00000296 00000000
c028b128 c028b200 000001ff 00000000 
Nov 27 12:53:49 mach-0-39 kernel:        00000025 c0132f01 c028b128 c028b1fc
000001d2 0000055e 00104025 00000000 
Nov 27 12:53:49 mach-0-39 kernel:        00000001 00000025 c0127ded 00000000
f65a5f20 f65a5f20 f6421aa0 ef7e8008 
Nov 27 12:53:49 mach-0-39 kernel: Call Trace:    [__alloc_pages+81/384]
[do_anonymous_page+93/368] [do_no_page+71/576] [handle_mm_fault+154/288]
[svcauth_null+192/240]
Nov 27 12:53:49 mach-0-39 kernel: Call Trace:    [<c0132f01>] [<c0127ded>]
[<c0127f47>] [<c01281da>] [<c0246120>]
Nov 27 12:53:49 mach-0-39 kernel:   [__vma_link+116/192]
[do_page_fault+442/1401] [do_mmap_pgoff+1220/1392] [blk_ioctl+407/1184]
[old_mmap+238/304] [do_page_fault+0/1401]
Nov 27 12:53:49 mach-0-39 kernel:   [<c0128854>] [<c011472a>] [<c0128e74>]
[<c01bb4d7>] [<c010ea9e>] [<c0114570>]
Nov 27 12:53:49 mach-0-39 kernel:   [error_code+52/60]
Nov 27 12:53:49 mach-0-39 kernel:   [<c0108bfc>]
Nov 27 12:53:49 mach-0-39 kernel: 
Nov 27 12:53:49 mach-0-39 kernel: Code: 0f 0b dc 00 81 4b 25 c0 8b 43 18 a9 80
00 00 00 74 08 0f 0b 

Nov 27 13:00:00 mach-0-39 kernel:  kernel BUG at page_alloc.c:220!
Nov 27 13:00:00 mach-0-39 kernel: invalid operand: 0000
Nov 27 13:00:00 mach-0-39 kernel: CPU:    0
Nov 27 13:00:00 mach-0-39 kernel: EIP:    0010:[rmqueue+525/592]    Not tainted
Nov 27 13:00:00 mach-0-39 kernel: EIP:    0010:[<c0132c6d>]    Not tainted
Nov 27 13:00:00 mach-0-39 kernel: EFLAGS: 00010202
Nov 27 13:00:00 mach-0-39 kernel: eax: 00000040   ebx: c2210cd0   ecx: 00038000
  edx: 00060599
Nov 27 13:00:00 mach-0-39 kernel: esi: c028b128   edi: 00048000   ebp: c1000020
  esp: e246ddcc
Nov 27 13:00:00 mach-0-39 kernel: ds: 0018   es: 0018   ss: 0018
Nov 27 13:00:00 mach-0-39 kernel: Process sh (pid: 2049, stackpage=e246d000)
Nov 27 13:00:00 mach-0-39 kernel: Stack: 00038000 00028599 00000296 00000000
c028b128 c028b200 000001ff 00000000 
Nov 27 13:00:00 mach-0-39 kernel:        00000025 c0132f01 c028b128 c028b1fc
000001d2 00000132 00104025 00000000 
Nov 27 13:00:00 mach-0-39 kernel:        00000001 00000025 c0127ded 00000000
f4ebe3c0 f4ebe3c0 f63ee920 f53f1af8 
Nov 27 13:00:00 mach-0-39 kernel: Call Trace:    [__alloc_pages+81/384]
[do_anonymous_page+93/368] [do_no_page+71/576] [handle_mm_fault+154/288]
[sys_munmap+2/80]
Nov 27 13:00:00 mach-0-39 kernel: Call Trace:    [<c0132f01>] [<c0127ded>]
[<c0127f47>] [<c01281da>] [<c01296c2>]
Nov 27 13:00:00 mach-0-39 kernel:   [__vma_link+116/192]
[do_page_fault+442/1401] [do_mmap_pgoff+1220/1392] [zap_page_range+945/1056]
[unmap_fixup+115/352] [sys_munmap+2/80]
Nov 27 13:00:00 mach-0-39 kernel:   [<c0128854>] [<c011472a>] [<c0128e74>]
[<c0126c51>] [<c01292c3>] [<c01296c2>]
Nov 27 13:00:00 mach-0-39 kernel:   [sys_close+4/112] [sys_munmap+67/80]
[do_page_fault+0/1401] [error_code+52/60]
Nov 27 13:00:00 mach-0-39 kernel:   [<c0139784>] [<c0129703>] [<c0114570>]
[<c0108bfc>]
Nov 27 13:00:00 mach-0-39 kernel: 
Nov 27 13:00:00 mach-0-39 kernel: Code: 0f 0b dc 00 81 4b 25 c0 8b 43 18 a9 80
00 00 00 74 08 0f 0b 



Comment 5 Paul Zimdars 2002-12-19 00:17:07 UTC
Hi,

I was just wondering if the output above helped any. I went into the kernel and
turned on all the debugging information under the kernel debug section. I
enabled smp and lost 4 seperate nodes now. I will bring them back up and see
what information they can provide me. 

Comment 6 Mike McLean 2003-01-02 17:28:00 UTC
This bug has been inappropriately marked MODIFIED. Please review the bug life
cycle information at 
http://bugzilla.redhat.com/bugzilla/bug_status.cgi


Comment 7 Paul Zimdars 2003-01-08 18:55:44 UTC
Hi,

I was wondering if anyone has been able to respond???

Thanks,

Pauld


Comment 8 Paul Zimdars 2003-01-17 08:34:57 UTC
Hi,

After looking through all the logs I noticed this on each machine that is 
common:

..MP-BIOS bug: 8254 timer not connected to IO-APIC

[root@mach-0-30 log]# cat /proc/interrupts
           CPU0       CPU1
  0:     288851          0    IO-APIC-edge  timer
  1:          2          0    IO-APIC-edge  keyboard
  2:          0          0          XT-PIC  cascade
  4:        207          0    IO-APIC-edge  serial
  8:          1          0    IO-APIC-edge  rtc
 14:       8073          0    IO-APIC-edge  ide0
 30:      53277          0   IO-APIC-level  eth0
 31:     161010          0   IO-APIC-level  eth1
NMI:          0          0
LOC:     288532     288541
ERR:          0
MIS:          0
[root@mach-0-30 log]#

[root@mach-0-30 log]# uname -a
Linux mach-0-30 2.4.19 #7 SMP Thu Dec 12 13:49:51 PST 2002 i686 unknown
[root@mach-0-30 log]#



Comment 9 Need Real Name 2003-04-22 00:04:35 UTC
This happened to me over the weekend.  Is there anything else I can provide to help?

[root@kmc2 log]# ksymoops -k ./ksyms.3 < ./oops.txt 
ksymoops 2.4.1 on i686 2.4.7-10enterprise.  Options used
     -V (default)
     -k ./ksyms.3 (specified)
     -l /proc/modules (default)
     -o /lib/modules/2.4.7-10enterprise/ (default)
     -m /boot/System.map-2.4.7-10enterprise (default)

Error (expand_objects): cannot stat(/lib/aic7xxx.o) for aic7xxx
ksymoops: No such file or directory
Error (expand_objects): cannot stat(/lib/sd_mod.o) for sd_mod
ksymoops: No such file or directory
Error (expand_objects): cannot stat(/lib/scsi_mod.o) for scsi_mod
ksymoops: No such file or directory
Warning (compare_ksyms_lsmod): module 3c59x is in lsmod but not in ksyms,
probably no symbols exported
Warning (compare_ksyms_lsmod): module appletalk is in lsmod but not in ksyms,
probably no symbols exported
Warning (compare_ksyms_lsmod): module eepro100 is in lsmod but not in ksyms,
probably no symbols exported
Warning (compare_ksyms_lsmod): module ipx is in lsmod but not in ksyms, probably
no symbols exported
Warning (compare_maps): mismatch on symbol partition_name  , ksyms_base says
c01c09e0, System.map says c0160900.  Ignoring ksyms_base entry
Warning (compare_maps): mismatch on symbol sd  , sd_mod says f881cce4,
/lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/sd_mod.o says f881cba0. 
Ignoring /lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/sd_mod.o entry
Warning (compare_maps): mismatch on symbol proc_scsi  , scsi_mod says f8818088,
/lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/scsi_mod.o says f8816910. 
Ignoring /lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/scsi_mod.o entry
Warning (compare_maps): mismatch on symbol scsi_devicelist  , scsi_mod says
f88180b4, /lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/scsi_mod.o says
f881693c.  Ignoring
/lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/scsi_mod.o entry
Warning (compare_maps): mismatch on symbol scsi_hostlist  , scsi_mod says
f88180b0, /lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/scsi_mod.o says
f8816938.  Ignoring
/lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/scsi_mod.o entry
Warning (compare_maps): mismatch on symbol scsi_hosts  , scsi_mod says f88180b8,
/lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/scsi_mod.o says f8816940. 
Ignoring /lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/scsi_mod.o entry
Warning (compare_maps): mismatch on symbol scsi_logging_level  , scsi_mod says
f8818084, /lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/scsi_mod.o says
f881690c.  Ignoring
/lib/modules/2.4.7-10enterprise/kernel/drivers/scsi/scsi_mod.o entry
 kernel BUG at page_alloc.c:220!
invalid operand: 0000
CPU:    0
EIP:    0010:[<c013620a>]
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010086
eax: 00000020   ebx: c0263540   ecx: c02616dc   edx: 12f58a7a
esi: c0263540   edi: 00000000   ebp: 00000000   esp: e6101da8
ds: 0018   es: 0018   ss: 0018
Process sh (pid: 5906, stackpage=e6101000)
Stack: c02490a3 000000dc 00000000 00000283 c0263964 00000000 c0263540 c0263540
       c0263a24 00000000 000000d2 c01365c4 00000001 000000d2 dcbf5220 00000000
       c99bd464 c01367df 000000d2 00000000 c0263a20 d8f030c0 dcbf5220 00104000
Call Trace: [<c02490a3>] [<c01365c4>] [<c01367df>] [<c0129d75>] [<c012a973>]
   [<c01172c0>] [<c0117466>] [<c0125443>] [<c01172c0>] [<c0107268>]
Code: 0f 0b 59 8b 56 08 5b 89 d3 8b 53 04 8b 03 89 50 04 89 02 ff

>>EIP; c013620a <rmqueue+7a/300>   <=====
Trace; c02490a3 <call_spurious_interrupt+1eaca/24d47>
Trace; c01365c4 <_wrapped_alloc_pages+74/280>
Trace; c01367df <__alloc_pages+f/a0>
Trace; c0129d75 <do_wp_page+1b5/410>
Trace; c012a973 <handle_mm_fault+103/150>
Trace; c01172c0 <do_page_fault+0/540>
Trace; c0117466 <do_page_fault+1a6/540>
Trace; c0125443 <sys_rt_sigaction+93/f0>
Trace; c01172c0 <do_page_fault+0/540>
Trace; c0107268 <error_code+38/40>
Code;  c013620a <rmqueue+7a/300>
00000000 <_EIP>:
Code;  c013620a <rmqueue+7a/300>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c013620c <rmqueue+7c/300>
   2:   59                        pop    %ecx
Code;  c013620d <rmqueue+7d/300>
   3:   8b 56 08                  mov    0x8(%esi),%edx
Code;  c0136210 <rmqueue+80/300>
   6:   5b                        pop    %ebx
Code;  c0136211 <rmqueue+81/300>
   7:   89 d3                     mov    %edx,%ebx
Code;  c0136213 <rmqueue+83/300>
   9:   8b 53 04                  mov    0x4(%ebx),%edx
Code;  c0136216 <rmqueue+86/300>
   c:   8b 03                     mov    (%ebx),%eax
Code;  c0136218 <rmqueue+88/300>
   e:   89 50 04                  mov    %edx,0x4(%eax)
Code;  c013621b <rmqueue+8b/300>
  11:   89 02                     mov    %eax,(%edx)
Code;  c013621d <rmqueue+8d/300>
  13:   ff 00                     incl   (%eax)

 kernel BUG at page_alloc.c:220!
invalid operand: 0000
CPU:    0
EIP:    0010:[<c013620a>]
EFLAGS: 00010086
Warning (Oops_read): Code line not seen, dumping what data is available

>>EIP; c013620a <rmqueue+7a/300>   <=====


12 warnings and 3 errors issued.  Results may not be reliable.
[root@kmc2 log]#

Comment 10 Dave Jones 2003-12-17 02:28:41 UTC
*** Bug 80023 has been marked as a duplicate of this bug. ***

Comment 11 Bugzilla owner 2004-09-30 15:40:18 UTC
Thanks for the bug report. However, Red Hat no longer maintains this version of
the product. Please upgrade to the latest version and open a new bug if the problem
persists.

The Fedora Legacy project (http://fedoralegacy.org/) maintains some older releases, 
and if you believe this bug is interesting to them, please report the problem in
the bug tracker at: http://bugzilla.fedora.us/



Note You need to log in before you can comment on or make changes to this bug.