Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.

Bug 593003

Summary: intel VT-d bios option causes kernel chaos
Product: [Fedora] Fedora Reporter: Joshua Roys <roysjosh>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED DUPLICATE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: medium Docs Contact:
Priority: low    
Version: 13CC: anton, dougsland, dzickus, gansalmon, itamar, jonathan, kernel-maint
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-06-15 20:24:16 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
/var/log/messages
none
lspci
none
dmidecode
none
kernel patch for F-12 latest (2.6.32.14-127)
none
kernel patch, cruft removed none

Description Joshua Roys 2010-05-17 15:16:57 UTC
Created attachment 414580 [details]
/var/log/messages

Description of problem:
If the Intel VT-d BIOS option is set, the system will fail to boot properly.  Numerous strange error messages are displayed:
DRHD: handling fault status reg 2
DMAR:[DMA Read] Request device [04:00.0] fault addr ffed9000
DMAR:[fault reason 06] PTE Read access is not set
and:
Uhhuh. NMI received for unknown reason b0 on CPU 0.
You have some hardware problem, likely on the PCI bus.
Dazed and confused, but trying to continue


Version-Release number of selected component (if applicable):
fedora 13 RC3
2.6.33.3-85.fc13.i686


How reproducible:
boot on this platform with VT-d enabled

  
Actual results:
system fails to boot up and hangs around udev


Expected results:
system boots

Comment 1 Joshua Roys 2010-05-17 15:17:30 UTC
Created attachment 414581 [details]
lspci

Comment 2 Joshua Roys 2010-05-17 15:17:48 UTC
Created attachment 414583 [details]
dmidecode

Comment 3 Anton Arapov 2010-05-18 07:29:48 UTC
duplicate of the bug 573173.

Comment 4 Don Zickus 2010-06-03 20:15:23 UTC
Hi,

I have a kernel patch that I can provide that will walk the pci tree to determine which device is causing the NMI you are seeing.  If you are comfortable with compiling a kernel let me know and I will attach the patch.  Unless you can generate the NMI problems after boot up (but it doesn't seem like you can boot up), then I have a kernel module to try instead.

Cheers,
Don

Comment 5 Joshua Roys 2010-06-03 20:37:40 UTC
Yes, I can do that.  If you could make the patch so it applies cleanly to a f12 src rpm, that would be helpful.

Thanks,

Josh

Comment 6 Don Zickus 2010-06-03 21:22:54 UTC
Created attachment 421037 [details]
kernel patch for F-12 latest (2.6.32.14-127)

Hi,

I have created a kernel patch to locate the device causing the NMI.

In order to use it please follow the instructions below.
(slightly modified from my instructions for the kernel module)

- download the attached patch
- download the kernel src rpm
- add the patch to the kernel.spec file
- rpm -ba <path to spec>/kernel.spec
- install / boot the new kernel
- try to generate the nmi

Once an nmi is generated, some info should have been generated in the kernel
logs (dmesg and /var/log/messages).  Ignore the WARN for now, it is 
misplaced.

Please run the following to gather data:

dmesg | grep RHNMI > /tmp/nmi.txt
echo "LSPCI OUTPUT" >> /tmp/nmi.txt
lspci >> /tmp/nmi.txt
lspci -t >> /tmp/nmi.txt

Then attach the /tmp/nmi.txt to this bugzilla so I can review the data.

It should have enough data to pinpoint the device that is causing the 
problem.  After that is determined, we can decide the next steps (most 
likely a firmware update if possible).

Please let me know if you have issues with the above steps.

Thanks,
Don

Comment 7 Don Zickus 2010-06-03 21:24:57 UTC
Created attachment 421039 [details]
kernel patch, cruft removed

had some extra cruft in there original patch.  sorry about that.

Comment 8 Joshua Roys 2010-06-15 20:24:16 UTC
After a number of reboots, I was unable to reproduce the "Uhhuh" error message, although the kernel would still panic almost immediately after bootup.  I was unable to get logs for this...  however I found rhbz 548198 and followed 548198#c11 and updated the P410i to firmware version 3.30.  Early results indicate this has fixed the immediate hang that was seen before.

Thanks!

*** This bug has been marked as a duplicate of bug 548198 ***