Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 891316 - ovirt-engine-backend [scalability]: Deadlock occurred during mass startup of VMs.
Summary: ovirt-engine-backend [scalability]: Deadlock occurred during mass startup of ...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.2.0
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: ---
: 3.4.0
Assignee: Omer Frenkel
QA Contact: Yuri Obshansky
URL:
Whiteboard: virt
Depends On:
Blocks: 1060692 rhev3.4beta 1142926
TreeView+ depends on / blocked
 
Reported: 2013-01-02 15:09 UTC by Omri Hochman
Modified: 2015-09-22 13:09 UTC (History)
15 users (show)

Fixed In Version: is1
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1060692 (view as bug list)
Environment:
Last Closed: 2014-05-13 08:58:00 UTC
oVirt Team: ---
Target Upstream Version:


Attachments (Terms of Use)
engine.log (deleted)
2013-01-02 15:10 UTC, Omri Hochman
no flags Details
console_log (deleted)
2013-01-02 15:11 UTC, Omri Hochman
no flags Details


Links
System ID Priority Status Summary Last Updated
oVirt gerrit 13656 None None None Never
oVirt gerrit 13682 None None None Never

Description Omri Hochman 2013-01-02 15:09:25 UTC
ovirt-engine-backend [scalability]: Deadlock occurred during mass startup of VMs. 

This Issue was found during investigation of Bug 891270.

Description:
************ 
I Started 10000 VMs with 256MB Memory (100 by 100) on 13 hosts.   

Environment:
*************
rhevm3.2 (build sf2.1)
rhevm-3.2.0-2.el6ev.noarch 
rhevm-backend-3.2.0-2.el6ev.noarch
vdsm-cli-4.10.2-2.0.el6.noarch  (on hosts)
vdsm-4.10.2-2.0.el6.x86_64 (on hosts) 

Results:
********
Found one Java-level deadlock: <See Console.log>
=============================
"pool-3-thread-33":
  waiting to lock monitor 0x00007ff678002ee8 (object 0x00000000c398be80, a java.lang.Object),
  which is held by "QuartzScheduler_Worker-99"
"QuartzScheduler_Worker-99":
  waiting for ownable synchronizer 0x00000000c398c800, (a java.util.concurrent.locks.ReentrantLock$NonfairSync),
  which is held by "pool-3-thread-30"
"pool-3-thread-30":
  waiting to lock monitor 0x00007ff678002ee8 (object 0x00000000c398be80, a java.lang.Object),
  which is held by "QuartzScheduler_Worker-99"

Comment 1 Omri Hochman 2013-01-02 15:10:54 UTC
Created attachment 671471 [details]
engine.log

Comment 2 Omri Hochman 2013-01-02 15:11:51 UTC
Created attachment 671472 [details]
console_log

Comment 3 Omri Hochman 2013-01-02 19:24:30 UTC
Fixed Description:
***************** 
I Started *1000* VMs with 256MB Memory (100 by 100) on 13 hosts.

Comment 4 Roy Golan 2013-01-02 21:01:19 UTC
the deadlock is caused by the refresh thread holding the VdsManager lock waiting on decreasedPending lock and RunVm thread performing rerun() and holding the decreasedPending lock waiting to perform UpdateVdsDynamicData ( a VDS command which acquires the VdsManager lock)

I see 2 main ways to solve this:
1. get rid of the decreasedPending lock and make it AtomicInteger to ensure atomicity and visibility without blocking
2. fix the order of lock acquisition in decreasedPending() method - first get the VdsManager lock and then perform decreasPending and call

Comment 7 mkublin 2013-04-07 16:21:31 UTC
During doing some work on phantom vds status, the deadlock also will be solved a patch is added to bug

Comment 14 Shai Revivo 2013-12-30 09:11:40 UTC
Currently we do not have the resources (Lab) to test it.
will have to push it forward to 3.4

Comment 15 Shai Revivo 2014-01-15 15:11:22 UTC
QE cannot verify this bug in 3.3, will verify in 3.4

Comment 19 Yuri Obshansky 2014-05-13 08:58:00 UTC
The bug is identical to bug *Bug 1060692* <https://bugzilla.redhat.com/show_bug.cgi?id=1060692> -ovirt-engine-backend [scalability]: Deadlock occurred during mass startup of VMs.
which was fixed and verified in 3.3.2
So, closed


Note You need to log in before you can comment on or make changes to this bug.