Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 167672 - GART error during bootup
Summary: GART error during bootup
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Jim Paradis
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: RHEL3U8CanFix
TreeView+ depends on / blocked
 
Reported: 2005-09-06 20:31 UTC by Linda Wang
Modified: 2007-11-30 22:07 UTC (History)
9 users (show)

Fixed In Version: RHSA-2006-0437
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-07-20 13:29:09 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2006:0437 normal SHIPPED_LIVE Important: Updated kernel packages for Red Hat Enterprise Linux 3 Update 8 2006-07-20 13:11:00 UTC

Description Linda Wang 2005-09-06 20:31:37 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050512 Red Hat/1.0.4-1.4.1 Firefox/1.0.4

Description of problem:
Reopening this as the following seems to be the behaviour of the system.



With 2.4.21-35.EL the system does not panic with numa=off.  However, while
booting with numa=off shows following errors, just after it has finished
mounting the filesystems: 

 ----------------------------------------------- 

  CPU 0: Silent Northbridge MCE 

  Northbridge status a6000001:0005001b 

      Error gart error 

      GART TLB error generic level generic 

      err cpu1 

      processor context corrupt 

      error uncorrected 

      previous error lost 

      NB error address 0000000037ff0000 

 ----------------------------------------------- 

 

The error address 0000000037ff0000 is the same as that reported by 

2.4.21-34.EL when it panics on rebooting with numa=off alone. 



After rebooting, same messages keep coming up at some random time intervals


for the rest of the cpus CPU 1, CPU 2 and CPU 3. 

 

However, the system does not show any other noticeable abnormal behaviour.





Version-Release number of selected component (if applicable):
kernel-2.4.21-35.EL

How reproducible:
Always

Steps to Reproduce:
1.install 2.4.21-35.EL kernel
2.boot the box up
3.view the console log
  

Actual Results:  seeing the following messages:

 ----------------------------------------------- 

  CPU 0: Silent Northbridge MCE 

  Northbridge status a6000001:0005001b 

      Error gart error 

      GART TLB error generic level generic 

      err cpu1 

      processor context corrupt 

      error uncorrected 

      previous error lost 

      NB error address 0000000037ff0000 

 ----------------------------------------------- 



Expected Results:  should not see any of these messages.

Additional info:

Comment 9 Lonni J Friedman 2005-12-07 16:31:01 UTC
This looks like a duplicate of bug 163210.

Comment 11 Ernie Petrides 2006-01-27 22:43:10 UTC
Removing ITs 73360 and 86498, which are not about GART errors during boot.

Comment 12 Ernie Petrides 2006-01-27 22:49:02 UTC
How much physical memory was on the system exhibiting this problem?

Specifically, I'm wondering if the 37ff0000 address is the base of
the last page of physical memory.

Comment 13 Jim Paradis 2006-01-27 23:04:36 UTC
Can you attach a boot log (either serial capture or dmesg) from a system that
shows this behavior?  There may be some clues in there.


Comment 15 Ernie Petrides 2006-02-02 03:17:46 UTC
Marizol Martinez, could we please get some help trying to reproduce this
problem on RHEL3 U7?  Could you also post the data requested in comment #13?

Thanks in advance.  -ernie


Comment 22 Ernie Petrides 2006-04-20 01:23:34 UTC
A fix for this problem has just been committed to the RHEL3 U8
patch pool this evening (in kernel version 2.4.21-40.7.EL).


Comment 25 Dan Carpenter 2006-05-09 02:49:48 UTC
Is there a test kernel available for this?  I have hardware that can reproduce
the GART errors with RHEL3u7.



Comment 26 Ernie Petrides 2006-05-09 20:16:52 UTC
Yes, but it is in internal beta at the moment.  Watch for it in the
RHN beta channels in a couple of weeks or so.  The latest kernel
version (1st U8 beta respin) is 2.4.21-42.EL (built on Friday).

Comment 27 Joshua Giles 2006-06-01 04:19:39 UTC
A kernel has been released that contains a patch for this problem.  Please
verify if your problem is fixed with the latest available kernel from the RHEL3
public beta channel at rhn.redhat.com and report your test results.

Comment 29 Red Hat Bugzilla 2006-07-20 13:29:09 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2006-0437.html



Note You need to log in before you can comment on or make changes to this bug.