Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1235510 - [virtio-win][whql]WIn2012 guest could not boot up while running multiple processor job(reboot case)
Summary: [virtio-win][whql]WIn2012 guest could not boot up while running multiple proc...
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: qemu-kvm
Version: 8.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Ademar Reis
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On: 1682882
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-06-25 03:11 UTC by Yu Wang
Modified: 2019-03-27 20:01 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Bugzilla 1513833 None VERIFIED whql job "Multiple processor group device test" fail when boot with cpu flag "hv_time/hv_relaxed/hv_vapic/hv_spinlocks=0... 2019-04-02 00:30:14 UTC

Internal Links: 1513833

Description Yu Wang 2015-06-25 03:11:08 UTC
Description of problem:
WIn2012 guest could not boot up while running multiple processor job(reboot case)

Version-Release number of selected component (if applicable):
virtio-win-prewhql-105
qemu-kvm-rhev-2.3.0-2.el7.x86_64
kernel-3.10.0-267.el7.x86_64
seabios-1.7.5-9.el7.x86_64

How reproducible:
100%

Steps to Reproduce:
1.boot guest with:
/usr/libexec/qemu-kvm -name 105SCS2012645FH -enable-kvm -m 6G -smp 8 -uuid 18052249-f65e-4592-afb6-639b6c8c3730 -nodefconfig -nodefaults -chardev socket,id=charmonitor,path=/tmp/105SCS2012645FH,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime,driftfix=slew -boot order=cd,menu=on -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=105SCS2012645FH,if=none,id=drive-ide0-0-0,format=raw,serial=mike_cao,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive file=en_windows_server_2012_x64_dvd_915478.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -drive file=105SCS2012645FH.vfd,if=none,id=drive-fdc0-0-0,format=raw,cache=none -global isa-fdc.driveA=drive-fdc0-0-0 -netdev tap,script=/etc/qemu-ifup,downscript=no,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=00:52:05:1f:cd:e0,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=isa_serial0 -device usb-tablet,id=input0 -vnc 0.0.0.0:2 -vga cirrus -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x7,num_queues=8 -drive file=105SCS2012645FH_test.raw,if=none,id=drive-scsi-disk0,format=raw,serial=mike_cao,cache=none -device scsi-hd,bus=scsi0.0,drive=drive-scsi-disk0,id=scsi-disk0 -monitor stdio

2.running multiple processor job

Actual results:

Hang while reboot case

Expected results:
Reboot normally

Additional info:
-smp 8 change to -smp 6, reboot normally

Comment 2 Yu Wang 2015-06-25 03:36:36 UTC
The NMI dump file is located at http://fileshare.englab.nay.redhat.com/pub/section2/images_backup/virtio-win/bug1235510/


*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck 80, {4f4454, 0, 0, 0}

Probably caused by : ntkrnlmp.exe ( nt!WheaReportHwError+249 )

Followup: MachineOwner
---------

0: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

NMI_HARDWARE_FAILURE (80)
This is typically due to a hardware malfunction.  The hardware supplier should
be called.
Arguments:
Arg1: 00000000004f4454
Arg2: 0000000000000000
Arg3: 0000000000000000
Arg4: 0000000000000000

Debugging Details:
------------------


DEFAULT_BUCKET_ID:  WIN8_DRIVER_FAULT

BUGCHECK_STR:  0x80

PROCESS_NAME:  System

CURRENT_IRQL:  f

ANALYSIS_VERSION: 6.3.9600.16384 (debuggers(dbg).130821-1623) amd64fre

LOCK_ADDRESS:  fffff8011d55b740 -- (!locks fffff8011d55b740)

Resource @ nt!PiEngineLock (0xfffff8011d55b740)    Exclusively owned
    Contention Count = 2
     Threads: fffffa80056ab040-01<*> 
1 total locks, 1 locks currently held

PNP_TRIAGE: 
	Lock address  : 0xfffff8011d55b740
	Thread Count  : 1
	Thread address: 0xfffffa80056ab040
	Thread wait   : 0x2ab

LAST_CONTROL_TRANSFER:  from fffff8011d2468de to fffff8011d300040

STACK_TEXT:  
fffff801`1c702c08 fffff801`1d2468de : 00000000`00000080 00000000`004f4454 00000000`00000000 00000000`00000000 : nt!KeBugCheckEx
fffff801`1c702c10 fffff801`1d3dec09 : 00000000`00000001 fffff801`1d25a030 00000000`00000000 fffffa80`14a27968 : hal!HalBugCheckSystem+0x9a
fffff801`1c702c50 fffff801`1d247204 : 00000000`000006c0 fffff801`1c702e20 fffff801`1d57b100 fffff801`1d25a030 : nt!WheaReportHwError+0x249
fffff801`1c702cb0 fffff801`1d4597a7 : fffff801`1c702e70 00000000`00000010 00000000`80000005 fffff801`1d29a27d : hal!HalHandleNMI+0x150
fffff801`1c702ce0 fffff801`1d2fcd02 : 00000000`b411dd56 fffff801`1c702ef0 00000000`00000000 fffff801`1d57b180 : nt! ?? ::FNODOBFM::`string'+0x13d6d
fffff801`1c702d30 fffff801`1d2fcb73 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KxNmiInterrupt+0x82
fffff801`1c702e70 fffff801`1d3330d1 : fffff880`00c6b080 00000000`8ebe4626 00000000`00000076 00000000`00000000 : nt!KiNmiInterrupt+0x173
fffff880`0373f5d0 fffff801`1d36de0c : fffff801`1d5d4a80 ffffffff`ffffffff fffff880`0373f8d9 00000000`00000000 : nt!KeFlushMultipleRangeTb+0x3c6
fffff880`0373f7d0 fffff801`1d2b3349 : fffffa80`056ab040 fffffa80`00000000 00000000`00000000 fffff880`00000000 : nt!MiFlushPteList+0x2c
fffff880`0373f800 fffff801`1d2b2e1e : fffff880`00fc0000 00000014`00000000 fffffa80`14acdd50 fffff801`1d5d4a80 : nt!MiRemoveMappedPtes+0x151
fffff880`0373f940 fffff801`1d64c147 : 00000000`0007ffff fffffa80`056ab040 00000000`00000001 00000000`00000000 : nt!MiRemoveFromSystemSpace+0x1ba
fffff880`0373f9c0 fffff801`1d64cc6a : fffffa80`14c13130 fffff880`00fb0000 00000000`00000000 00000000`00000001 : nt!MiUnmapImageInSystemSpace+0x4b
fffff880`0373f9f0 fffff801`1d6ee2df : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000002 : nt!MiValidateSectionCreate+0x2fa
fffff880`0373fb70 fffff801`1d6ea900 : fffffa80`14c13130 fffff880`01000002 00000000`00000222 00000000`00000220 : nt!MiCreateNewSection+0x20f
fffff880`0373fc70 fffff801`1d6fcf30 : fffff880`037401e0 00000000`00000020 fffff880`03740180 fffff801`1d3363de : nt!MiCreateSection+0x88c
fffff880`0373fe90 fffff801`1d2ff053 : fffffa80`056ab040 fffff880`03740148 fffff880`0373ff38 fffff880`03740280 : nt!NtCreateSection+0x1af
fffff880`0373ff20 fffff801`1d304230 : fffff801`1d70b3d8 fffff980`0141efd0 00000000`00000030 00000000`20206f49 : nt!KiSystemServiceCopyEnd+0x13
fffff880`03740128 fffff801`1d70b3d8 : fffff980`0141efd0 00000000`00000030 00000000`20206f49 fffff880`037403d8 : nt!KiServiceLinkage
fffff880`03740130 fffff801`1d7170ca : ffffffff`800000e0 fffff801`1d5fbc22 fffff801`1d54fa20 00000000`00000012 : nt!MiCreateSectionForDriver+0xe0
fffff880`037401e0 fffff801`1d716a58 : 00000000`00000000 fffff880`037402e9 00000000`00000000 00000000`20206f49 : nt!MiObtainSectionForDriver+0x8e
fffff880`03740230 fffff801`1d71743e : fffff880`03740390 00000000`00000000 00000000`00000001 00000000`00000000 : nt!MmLoadSystemImage+0x120
fffff880`03740340 fffff801`1d71179c : 00000000`00000000 00000000`00000000 fffff880`03740860 00000000`00000003 : nt!IopLoadDriver+0x2ca
fffff880`03740610 fffff801`1d70ffb6 : fffff8a0`003e7010 fffff880`0097b340 00000000`00000000 00000000`00000000 : nt!PipCallDriverAddDeviceQueryRoutine+0x22c
fffff880`03740730 fffff801`1d70dbc0 : 00000000`00000000 fffff880`00000002 00000000`00000000 00000000`000007ff : nt!PnpCallDriverQueryServiceHelper+0x13e
fffff880`037407b0 fffff801`1d70e11e : fffffa80`056c5c30 fffff880`03740a40 fffffa80`056f0d30 fffffa80`056f54a0 : nt!PipCallDriverAddDevice+0x400
fffff880`03740940 fffff801`1d781c07 : fffff801`1d309700 00000000`00000001 00000000`00000000 fffff801`1d608a52 : nt!PipProcessDevNodeTree+0x1ca
fffff880`03740bc0 fffff801`1d38b81f : 00000001`00000003 00000000`00000000 fffff801`1d5586e0 fffff8a0`000f6258 : nt!PiProcessStartSystemDevices+0x87
fffff880`03740c10 fffff801`1d338391 : fffffa80`056ab040 fffff801`1d38b4bc fffff801`1d5586e0 fffff801`1d309700 : nt!PnpDeviceActionWorker+0x363
fffff880`03740cc0 fffff801`1d2a7521 : 00000000`00000000 00000000`00000080 fffff801`1d338250 fffffa80`056ab040 : nt!ExpWorkerThread+0x142
fffff880`03740d50 fffff801`1d2e5dd6 : fffff801`1d57b180 fffffa80`056ab040 fffffa80`056ca040 fffffa80`056a4040 : nt!PspSystemThreadStartup+0x59
fffff880`03740da0 00000000`00000000 : fffff880`03741000 fffff880`0373b000 00000000`00000000 00000000`00000000 : nt!KiStartSystemThread+0x16


STACK_COMMAND:  kb

FOLLOWUP_IP: 
nt!WheaReportHwError+249
fffff801`1d3dec09 eb7c            jmp     nt!WheaReportHwError+0x2c7 (fffff801`1d3dec87)

SYMBOL_STACK_INDEX:  2

SYMBOL_NAME:  nt!WheaReportHwError+249

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: nt

IMAGE_NAME:  ntkrnlmp.exe

DEBUG_FLR_IMAGE_TIMESTAMP:  5010ac4b

IMAGE_VERSION:  6.2.9200.16384

BUCKET_ID_FUNC_OFFSET:  249

FAILURE_BUCKET_ID:  0x80_VRF_nt!WheaReportHwError

BUCKET_ID:  0x80_VRF_nt!WheaReportHwError

ANALYSIS_SOURCE:  KM

FAILURE_ID_HASH_STRING:  km:0x80_vrf_nt!wheareporthwerror

FAILURE_ID_HASH:  {cc66e451-8fe3-4b00-cfe6-2474d51c5874}

Followup: MachineOwner
---------

Comment 4 lijin 2016-07-28 07:57:17 UTC
Is this bug dup of https://bugzilla.redhat.com/show_bug.cgi?id=1039469?

Comment 5 Vadim Rozenfeld 2016-07-28 12:03:11 UTC
(In reply to lijin from comment #4)
> Is this bug dup of https://bugzilla.redhat.com/show_bug.cgi?id=1039469?

They seem to be very close but probably not the same.

Comment 7 Peixiu Hou 2016-08-12 01:25:47 UTC
Hi Amnon,

For this issue, I done follows tests:

1. Tried with vioscsi build 102(rhel7.2 released version) + qemu-kvm-rhev-2.6.0-17.el7.x86_64, cannot reproduced this bug.  --rhel7.3
2. Tried with vioscsi build 124 + qemu-kvm-rhev-2.6.0-17.el7.x86_64, can reproduced this bug. --rhel7.3  
3. According to the RHEL7 vioscsi whql report: https://mojo.redhat.com/docs/DOC-941688, test with vioscsi build 102 + qemu-kvm-rhev-2.3.0-31.el7.x86_64, didn't hit this bug.  --rhel7.2

According to above test results, this bug isn't a regression in qemu-kvm-rhev. 


Best Regards~
Peixiu Hou

Comment 8 Vadim Rozenfeld 2016-08-12 02:50:21 UTC
(In reply to Peixiu Hou from comment #7)
> Hi Amnon,
> 
> For this issue, I done follows tests:
> 
> 1. Tried with vioscsi build 102(rhel7.2 released version) +
> qemu-kvm-rhev-2.6.0-17.el7.x86_64, cannot reproduced this bug.  --rhel7.3
> 2. Tried with vioscsi build 124 + qemu-kvm-rhev-2.6.0-17.el7.x86_64, can
> reproduced this bug. --rhel7.3  
> 3. According to the RHEL7 vioscsi whql report:
> https://mojo.redhat.com/docs/DOC-941688, test with vioscsi build 102 +
> qemu-kvm-rhev-2.3.0-31.el7.x86_64, didn't hit this bug.  --rhel7.2
> 
> According to above test results, this bug isn't a regression in
> qemu-kvm-rhev. 
> 
> 
> Best Regards~
> Peixiu Hou

I'm afraid we cannot make any assumption comparing these two build (102 and 124). They are totally different in the way how they utilize NUMA facilities.

Comment 9 Vadim Rozenfeld 2016-12-27 06:57:32 UTC
Can we give a try to virtio-win build 129?
Thanks,
Vadim.

Comment 10 Yu Wang 2016-12-27 09:42:15 UTC
(In reply to Vadim Rozenfeld from comment #9)
> Can we give a try to virtio-win build 129?
> Thanks,
> Vadim.

Hi Vadim,

Try w/ build 129 (w/ vioscsi or vioinput device), still hit this issue.

virtio-win-prewhql-129
qemu-kvm-rhev-2.6.0-29.el7.x86_64
kernel-3.10.0-537.el7.x86_64
seabios-1.9.0-5.el7.x86_64

Thanks
Yu Wang

Comment 16 lijin 2018-07-04 06:37:57 UTC
Set priority to high as QE hit this issue on win2012 100% for all drivers

Comment 19 Yu Wang 2019-03-27 07:45:08 UTC
reproduce steps:

1 boot guest with win2012-64 guest (6G memory 8 vcpus)

2 In guest:
bcdedit.exe /set groupsize 1 
bcdedit.exe /set maxgroup on 
bcdedit.exe /set groupaware on 

3 reboot guest

It will hang at the beginning.

It can still reproduce without any virtio devices, so change bug to qemu-kvm


Note You need to log in before you can comment on or make changes to this bug.