Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 593003 - intel VT-d bios option causes kernel chaos
Summary: intel VT-d bios option causes kernel chaos
Keywords:
Status: CLOSED DUPLICATE of bug 548198
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 13
Hardware: All
OS: Linux
low
medium
Target Milestone: ---
Assignee: Kernel Maintainer List
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-05-17 15:16 UTC by Joshua Roys
Modified: 2010-06-15 20:24 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-06-15 20:24:16 UTC


Attachments (Terms of Use)
/var/log/messages (deleted)
2010-05-17 15:16 UTC, Joshua Roys
no flags Details
lspci (deleted)
2010-05-17 15:17 UTC, Joshua Roys
no flags Details
dmidecode (deleted)
2010-05-17 15:17 UTC, Joshua Roys
no flags Details
kernel patch for F-12 latest (2.6.32.14-127) (deleted)
2010-06-03 21:22 UTC, Don Zickus
no flags Details | Diff
kernel patch, cruft removed (deleted)
2010-06-03 21:24 UTC, Don Zickus
no flags Details | Diff

Description Joshua Roys 2010-05-17 15:16:57 UTC
Created attachment 414580 [details]
/var/log/messages

Description of problem:
If the Intel VT-d BIOS option is set, the system will fail to boot properly.  Numerous strange error messages are displayed:
DRHD: handling fault status reg 2
DMAR:[DMA Read] Request device [04:00.0] fault addr ffed9000
DMAR:[fault reason 06] PTE Read access is not set
and:
Uhhuh. NMI received for unknown reason b0 on CPU 0.
You have some hardware problem, likely on the PCI bus.
Dazed and confused, but trying to continue


Version-Release number of selected component (if applicable):
fedora 13 RC3
2.6.33.3-85.fc13.i686


How reproducible:
boot on this platform with VT-d enabled

  
Actual results:
system fails to boot up and hangs around udev


Expected results:
system boots

Comment 1 Joshua Roys 2010-05-17 15:17:30 UTC
Created attachment 414581 [details]
lspci

Comment 2 Joshua Roys 2010-05-17 15:17:48 UTC
Created attachment 414583 [details]
dmidecode

Comment 3 Anton Arapov 2010-05-18 07:29:48 UTC
duplicate of the bug 573173.

Comment 4 Don Zickus 2010-06-03 20:15:23 UTC
Hi,

I have a kernel patch that I can provide that will walk the pci tree to determine which device is causing the NMI you are seeing.  If you are comfortable with compiling a kernel let me know and I will attach the patch.  Unless you can generate the NMI problems after boot up (but it doesn't seem like you can boot up), then I have a kernel module to try instead.

Cheers,
Don

Comment 5 Joshua Roys 2010-06-03 20:37:40 UTC
Yes, I can do that.  If you could make the patch so it applies cleanly to a f12 src rpm, that would be helpful.

Thanks,

Josh

Comment 6 Don Zickus 2010-06-03 21:22:54 UTC
Created attachment 421037 [details]
kernel patch for F-12 latest (2.6.32.14-127)

Hi,

I have created a kernel patch to locate the device causing the NMI.

In order to use it please follow the instructions below.
(slightly modified from my instructions for the kernel module)

- download the attached patch
- download the kernel src rpm
- add the patch to the kernel.spec file
- rpm -ba <path to spec>/kernel.spec
- install / boot the new kernel
- try to generate the nmi

Once an nmi is generated, some info should have been generated in the kernel
logs (dmesg and /var/log/messages).  Ignore the WARN for now, it is 
misplaced.

Please run the following to gather data:

dmesg | grep RHNMI > /tmp/nmi.txt
echo "LSPCI OUTPUT" >> /tmp/nmi.txt
lspci >> /tmp/nmi.txt
lspci -t >> /tmp/nmi.txt

Then attach the /tmp/nmi.txt to this bugzilla so I can review the data.

It should have enough data to pinpoint the device that is causing the 
problem.  After that is determined, we can decide the next steps (most 
likely a firmware update if possible).

Please let me know if you have issues with the above steps.

Thanks,
Don

Comment 7 Don Zickus 2010-06-03 21:24:57 UTC
Created attachment 421039 [details]
kernel patch, cruft removed

had some extra cruft in there original patch.  sorry about that.

Comment 8 Joshua Roys 2010-06-15 20:24:16 UTC
After a number of reboots, I was unable to reproduce the "Uhhuh" error message, although the kernel would still panic almost immediately after bootup.  I was unable to get logs for this...  however I found rhbz 548198 and followed 548198#c11 and updated the P410i to firmware version 3.30.  Early results indicate this has fixed the immediate hang that was seen before.

Thanks!

*** This bug has been marked as a duplicate of bug 548198 ***


Note You need to log in before you can comment on or make changes to this bug.