Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1509909 - host rebooted with kernel 3.10.0-6* after some vm migration
Summary: host rebooted with kernel 3.10.0-6* after some vm migration
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: qemu-kvm-rhev
Version: 7.0
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: pre-dev-freeze
: ---
Assignee: Virtualization Maintenance
QA Contact: xianwang
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-06 10:21 UTC by boruvka.michal
Modified: 2017-12-21 02:23 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-12-21 02:23:35 UTC


Attachments (Terms of Use)
Console error (deleted)
2017-11-06 10:23 UTC, boruvka.michal
no flags Details
/var/log/messages (deleted)
2017-11-23 12:10 UTC, boruvka.michal
no flags Details
qemu log (deleted)
2017-11-23 12:17 UTC, boruvka.michal
no flags Details
qemu log 2 (deleted)
2017-11-23 12:17 UTC, boruvka.michal
no flags Details

Description boruvka.michal 2017-11-06 10:21:48 UTC
Description of problem:
HW: HP ProLiant BL460c Gen9
Kernel: 3.10.0-6*
Host is rebooted after some VM migration to this host.

Kernel 3.10.0-5*  - VM migration is OK on all Hardware
Kernel 3.10.0-6*  - VM migration is OK on HP ProLiant BL460c G7

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. In description

Actual results:
Reboot after some VM migration

Expected results:
No Reboot


Additional info:

Comment 1 boruvka.michal 2017-11-06 10:23:28 UTC
Created attachment 1348514 [details]
Console error

Comment 2 Yaniv Kaul 2017-11-07 06:11:52 UTC
Might be a HW issue (see the warning about MCE).
Can we get complete logs? Specifically, journalctl output from the relevant time?

Comment 3 boruvka.michal 2017-11-10 07:16:52 UTC
In log is nothing - only next boot info. Its not HW issue probably - with kernel 3.10.0-5* is all OK.
I will try another HW.

Comment 4 boruvka.michal 2017-11-10 08:53:03 UTC
I found this error in "Integrated management log" (IML)

	System Error	11/10/2017 08:44	11/10/2017 08:44	1	An Unrecoverable System Error (NMI) has occurred (Service Information: 0x00000000, 0x00000000)
	ASR	11/10/2017 08:45	11/10/2017 08:45	1	ASR Detected by System ROM

Comment 5 Tomas Jelinek 2017-11-13 08:31:42 UTC
It reminds me of an old issue with IOMMU enabled on a host. Is it enabled by any chance?

Other than that, I think we will need to move this to lower layer. Can you please try to collect as much logs as possible from the relevant time?
Especially journalctl and qemu logs. 

Thank you.

Comment 6 Tomas Jelinek 2017-11-23 09:07:25 UTC
any news?

Comment 7 boruvka.michal 2017-11-23 12:10:13 UTC
Created attachment 1358169 [details]
/var/log/messages

/var/log/messages

Comment 8 boruvka.michal 2017-11-23 12:17:15 UTC
Created attachment 1358172 [details]
qemu log

this vm has been migrated to a hypervisor and it reboots

Comment 9 boruvka.michal 2017-11-23 12:17:47 UTC
Created attachment 1358173 [details]
qemu log 2

this vm has been migrated to a hypervisor and it reboots

Comment 10 boruvka.michal 2017-11-23 12:22:14 UTC
I tested other blade and the situation is the same. With kernel 3.10.0-5* is all OK, but with kernel 3.10.0-6* is in IML "NMI error" as described above. I added some logs to attachments.

Comment 12 Karen Noel 2017-11-27 18:47:31 UTC
(In reply to boruvka.michal from comment #0)
> Description of problem:
> HW: HP ProLiant BL460c Gen9
> Kernel: 3.10.0-6*
> Host is rebooted after some VM migration to this host.
> 
> Kernel 3.10.0-5*  - VM migration is OK on all Hardware
> Kernel 3.10.0-6*  - VM migration is OK on HP ProLiant BL460c G7
> 
> Version-Release number of selected component (if applicable):
> 
> 
> How reproducible:
> 
> 
> Steps to Reproduce:
> 1. In description
> 
> Actual results:
> Reboot after some VM migration
> 
> Expected results:
> No Reboot
> 
> 
> Additional info:

If this issue is critical or in any way time sensitive, please raise a
ticket through your regular Red Hat support channels to make certain it
receives the proper attention and prioritization that will result in a timely
resolution.

For information on how to contact the Red Hat production support team,
please visit: https://www.redhat.com/support/process/production/#howto

I strongly recommend you contact our support, but in the meanwhile, please
provide here:

* What version of RHEL is your host?
* What version of qemu-kvm-rhev and libvirt are installed?
* What is the exact kernel version that first fails? 3.10.0-6* is not precise enough. Is the kernel version you are using from Red Hat? Which errata stream?

Thanks.

Comment 13 boruvka.michal 2017-12-20 10:54:43 UTC
It is CENTOS Linux release 7.4.1708 - I have no support.
failed version: 
qemu-kvm-ev-2.9.0-16.el7_4.5.1.x86_64
libvirt-daemon-3.2.0-14.el7_4.3.x86_64
kernel-3.10.0-693.5.2.el7.x86_64

But after update it looks OK:
qemu-kvm-ev-2.9.0-16.el7_4.8.1.x86_64
libvirt-daemon-3.2.0-14.el7_4.5.x86_64
kernel-3.10.0-693.11.1.el7.x86_64


Note You need to log in before you can comment on or make changes to this bug.