Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1176407 - JSON-RPC | engine gets stuck randomly | ERROR (SSL Stomp Reactor) Unable to process messages
Summary: JSON-RPC | engine gets stuck randomly | ERROR (SSL Stomp Reactor) Unable to...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.5.0
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: 3.5.0
Assignee: Piotr Kliczewski
QA Contact: Meni Yakove
URL:
Whiteboard: infra
Depends On:
Blocks: rhev35rcblocker rhev35gablocker 1176417
TreeView+ depends on / blocked
 
Reported: 2014-12-21 13:26 UTC by Meni Yakove
Modified: 2016-02-10 19:40 UTC (History)
18 users (show)

Fixed In Version: vt13.5
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1176528 (view as bug list)
Environment:
Last Closed: 2015-02-17 17:13:39 UTC
oVirt Team: Infra
Target Upstream Version:


Attachments (Terms of Use)
vnic_profile_job_logs (deleted)
2014-12-21 13:26 UTC, Meni Yakove
no flags Details
new logs (deleted)
2014-12-30 13:57 UTC, Raz Tamir
no flags Details


Links
System ID Priority Status Summary Last Updated
oVirt gerrit 36332 master MERGED NPE when removing empty tracked request Never
oVirt gerrit 36335 ovirt-3.5 MERGED NPE when removing empty tracked request Never
oVirt gerrit 36340 ovirt-engine-3.5 MERGED jsonrpc version bump Never

Description Meni Yakove 2014-12-21 13:26:36 UTC
Created attachment 971693 [details]
vnic_profile_job_logs

Description of problem:
Our automation jobs fail randomly with:
Engine:
2014-12-21 10:59:14,282 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) Unable to process messages
2014-12-21 10:59:14,284 WARN  [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker] (ResponseWorker) Exception thrown during message processing
2014-12-21 10:59:14,284 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) Unable to process messages
2014-12-21 10:59:14,285 WARN  [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker] (ResponseWorker) Exception thrown during message processing

Server:
2014-12-21 10:43:51,129 ERROR [stderr] (Response tracker) Exception in thread "Response tracker" java.lang.NullPointerException
2014-12-21 10:43:51,130 ERROR [stderr] (Response tracker)       at org.ovirt.vdsm.jsonrpc.client.internal.ResponseTracker.removeRequestFromTracking(ResponseTracker.java:58)
2014-12-21 10:43:51,131 ERROR [stderr] (Response tracker)       at org.ovirt.vdsm.jsonrpc.client.internal.ResponseTracker.run(ResponseTracker.java:93)
2014-12-21 10:43:51,131 ERROR [stderr] (Response tracker)       at java.lang.Thread.run(Thread.java:745)

switching to xml-rpc solve the issue

Version-Release number of selected component (if applicable):
rhevm-3.5.0-0.26.el6ev.noarch

How reproducible:


Steps to Reproduce:
1. Automation jobs:
Port mirroring 
vNIC Profiles
Network migration

Comment 1 Piotr Kliczewski 2014-12-21 16:13:08 UTC
I checked vdsm logs and this is the issue found on vdsm:

Thread-2882::ERROR::2014-12-19 23:08:33,431::API::1499::vds::(_rollback) Cannot delete network u'sw162': It doesn't exist in the system
Traceback (most recent call last):
  File "/usr/share/vdsm/API.py", line 1497, in _rollback
    yield rollbackCtx
  File "/usr/share/vdsm/API.py", line 1385, in setupNetworks
    supervdsm.getProxy().setupNetworks(networks, bondings, options)
  File "/usr/share/vdsm/supervdsm.py", line 50, in __call__
    return callMethod()
  File "/usr/share/vdsm/supervdsm.py", line 48, in <lambda>
    **kwargs)
  File "<string>", line 2, in setupNetworks
  File "/usr/lib64/python2.7/multiprocessing/managers.py", line 773, in _callmethod
    raise convert_to_error(kind, result)
ConfigNetworkError: (27, "Cannot delete network u'sw162': It doesn't exist in the system")


which causes error on the engine side.

I found additional issue in the engine logs which seems to be unrelated to networking issue:

2014-12-19 21:55:19,167 INFO  [org.ovirt.engine.core.utils.transaction.TransactionSupport] (DefaultQuartzScheduler_Worker-30) [371dfc3f] transaction rolled back
2014-12-19 21:55:19,168 ERROR [org.ovirt.engine.core.bll.AddVmFromScratchCommand] (DefaultQuartzScheduler_Worker-30) [371dfc3f] Command org.ovirt.engine.core.bll.AddVmFromScratchCommand throw exception: java.lang.NullPointerException
	at org.ovirt.engine.core.bll.AddVmCommand.addDiskPermissions(AddVmCommand.java:1247) [bll.jar:]
	at org.ovirt.engine.core.bll.AddVmCommand$2.runInTransaction(AddVmCommand.java:825) [bll.jar:]
	at org.ovirt.engine.core.bll.AddVmCommand$2.runInTransaction(AddVmCommand.java:820) [bll.jar:]
	at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInNewTransaction(TransactionSupport.java:210) [utils.jar:]
	at org.ovirt.engine.core.bll.AddVmCommand.executeVmCommand(AddVmCommand.java:820) [bll.jar:]
	at org.ovirt.engine.core.bll.VmCommand.executeCommand(VmCommand.java:104) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.executeWithoutTransaction(CommandBase.java:1191) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.executeActionInTransactionScope(CommandBase.java:1330) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.runInTransaction(CommandBase.java:1955) [bll.jar:]
	at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInSuppressed(TransactionSupport.java:174) [utils.jar:]
	at org.ovirt.engine.core.utils.transaction.TransactionSupport.executeInScope(TransactionSupport.java:116) [utils.jar:]
	at org.ovirt.engine.core.bll.CommandBase.execute(CommandBase.java:1354) [bll.jar:]
	at org.ovirt.engine.core.bll.CommandBase.executeAction(CommandBase.java:352) [bll.jar:]
	at org.ovirt.engine.core.bll.Backend.runAction(Backend.java:430) [bll.jar:]
	at org.ovirt.engine.core.bll.Backend.runActionImpl(Backend.java:411) [bll.jar:]
	at org.ovirt.engine.core.bll.Backend.runInternalAction(Backend.java:617) [bll.jar:]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.7.0_71]

Comment 2 Oved Ourfali 2014-12-22 07:28:15 UTC
Dan - can you look at this as well?

Comment 3 Piotr Kliczewski 2014-12-22 08:27:07 UTC
I pushed a patch to remove side effect (NPE) of setupNetwork failure. If test case covers the failure then we have a fix for this bug.

We need to understand nature of setupNetworks failure.

Comment 4 Allon Mureinik 2014-12-22 09:05:04 UTC
(In reply to Piotr Kliczewski from comment #1)
> I found additional issue in the engine logs which seems to be unrelated to
> networking issue:
Agreed. 
I cloned it to bug 1176528 - let's track it there.

Comment 5 Nikolai Sednev 2014-12-22 09:25:29 UTC
Might be connected to 1162774 ?

Comment 6 Oved Ourfali 2014-12-22 09:40:00 UTC
(In reply to Nikolai Sednev from comment #5)
> Might be connected to 1162774 ?

Doesn't seem related.

Comment 7 Allon Mureinik 2014-12-22 09:41:13 UTC
(In reply to Oved Ourfali from comment #6)
> (In reply to Nikolai Sednev from comment #5)
> > Might be connected to 1162774 ?
> 
> Doesn't seem related.
Agreed.

Comment 8 Nikolai Sednev 2014-12-22 16:36:55 UTC
I was also unable to add RHEL7.0 two hosts to RHEVM vt13.4 with JSONRPC enabled on both hosts by default, until I've removed this functionality from them.
Used components:
vdsm-4.16.8.1-4.el7ev.x86_64
qemu-kvm-rhev-1.5.3-60.el7_0.11.x86_64
sanlock-3.2.2-2.el7.x86_64
libvirt-client-1.2.8-10.el7.x86_64
rhevm-3.5.0-0.26.el6ev.noarch

2014-12-22 11:01:11,587 INFO  [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (SSL Stomp Reactor) Connecting to brown-vdsc.qa.lab.tlv.redhat.com/10.
35.106.12
2014-12-22 11:01:11,590 WARN  [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker] (ResponseWorker) Exception thrown during message processing
2014-12-22 11:01:12,672 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.GetCapabilitiesVDSCommand] (DefaultQuartzScheduler_Worker-6) [58aefbe4] Command Ge
tCapabilitiesVDSCommand(HostName = purple-vds1.qa.lab.tlv.redhat.com, HostId = 90d25f9c-0e64-4c8e-9165-78e80b18219c, vds=Host[purple-vds1.qa.lab.tlv.redhat
.com,90d25f9c-0e64-4c8e-9165-78e80b18219c]) execution failed. Exception: VDSNetworkException: java.net.SocketTimeoutException: connect timed out
2014-12-22 11:01:12,673 WARN  [org.ovirt.engine.core.vdsbroker.VdsManager] (DefaultQuartzScheduler_Worker-6) [58aefbe4] Host purple-vds1.qa.lab.tlv.redhat.
com is not responding. It will stay in Connecting state for a grace period of 120 seconds and after that an attempt to fence the host will be issued.
2014-12-22 11:01:12,686 ERROR [org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo] (DefaultQuartzScheduler_Worker-6) [58aefbe4] Failure to refresh Vds ru
ntime info: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSNetworkException: java.net.SocketTimeoutException: connect timed out
        at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.createNetworkException(VdsBrokerCommand.java:126) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.vdsbroker.VdsBrokerCommand.executeVDSCommand(VdsBrokerCommand.java:101) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.VDSCommandBase.executeCommand(VDSCommandBase.java:56) [vdsbroker.jar:]
        at org.ovirt.engine.core.dal.VdcCommandBase.execute(VdcCommandBase.java:31) [dal.jar:]
        at org.ovirt.engine.core.vdsbroker.VdsManager.refreshCapabilities(VdsManager.java:571) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo.refreshVdsRunTimeInfo(VdsUpdateRunTimeInfo.java:648) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.VdsUpdateRunTimeInfo.refresh(VdsUpdateRunTimeInfo.java:494) [vdsbroker.jar:]
        at org.ovirt.engine.core.vdsbroker.VdsManager.onTimer(VdsManager.java:236) [vdsbroker.jar:]
        at sun.reflect.GeneratedMethodAccessor83.invoke(Unknown Source) [:1.7.0_71]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.7.0_71]
        at java.lang.reflect.Method.invoke(Method.java:606) [rt.jar:1.7.0_71]
        at org.ovirt.engine.core.utils.timer.JobWrapper.execute(JobWrapper.java:60) [scheduler.jar:]
        at org.quartz.core.JobRunShell.run(JobRunShell.java:213) [quartz.jar:]
        at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557) [quartz.jar:]
Caused by: java.net.SocketTimeoutException: connect timed out
        at java.net.PlainSocketImpl.socketConnect(Native Method) [rt.jar:1.7.0_71]
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) [rt.jar:1.7.0_71]
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) [rt.jar:1.7.0_71]
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) [rt.jar:1.7.0_71]
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) [rt.jar:1.7.0_71]
        at java.net.Socket.connect(Socket.java:579) [rt.jar:1.7.0_71]
        at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:618) [jsse.jar:1.7.0_71]
        at org.ovirt.engine.core.utils.ssl.AuthSSLProtocolSocketFactory.createSocket(AuthSSLProtocolSocketFactory.java:293) [utils.jar:]
        at org.apache.commons.httpclient.HttpConnection.open(HttpConnection.java:707) [commons-httpclient.jar:]
        at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.open(MultiThreadedHttpConnectionManager.java:1361) [commo
ns-httpclient.jar:]
        at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:387) [commons-httpclient.jar:]
:

Comment 9 Meni Yakove 2014-12-29 08:48:12 UTC
vt13.5

Comment 10 Raz Tamir 2014-12-30 13:56:38 UTC
Still occurs - vt13.5

ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) Unable to process messages
11:59:26 2014-12-30 11:59:05,216 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.Reactor] (SSL Stomp Reactor) Unable to process messages
11:59:26 2014-12-30 11:59:05,220 WARN  [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker] (ResponseWorker) Exception thrown during message processing
11:59:26 2014-12-30 11:59:07,745 ERROR [org.ovirt.vdsm.jsonrpc.client.reactors.ReactorClient] (DefaultQuartzScheduler_Worker-56) Exception during connection
11:59:26 2014-12-30 11:59:07,752 WARN  [org.ovirt.vdsm.jsonrpc.client.internal.ResponseWorker] (ResponseWorker) Exception thrown during message processing
11:59:26 2014-12-30 11:59:07,758 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStatusVDSCommand] (DefaultQuartzScheduler_Worker-56) [7e2170c7] Command SpmStatusVDSCommand(HostName = host_mixed_2, HostId = 825c1a69-19d0-4d64-8824-af76990d8c9e, storagePoolId = d83e1463-29dc-48a0-86e2-3fac3a41b15c) execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused

logs attached

Comment 11 Raz Tamir 2014-12-30 13:57:13 UTC
Created attachment 974407 [details]
new logs

Comment 12 Oved Ourfali 2014-12-30 17:45:35 UTC
Can you specify the flow in which it happened? Was the host responsive afterwards? Does it reproduce?

Comment 13 Piotr Kliczewski 2014-12-30 18:44:13 UTC
I checked the logs and the issue results is the same but cause is different:

Thread-64528::DEBUG::2014-12-30 11:59:04,007::API::545::vds::(migrationCreate) Migration create
Thread-64528::INFO::2014-12-30 11:59:04,007::clientIF::394::vds::(createVm) vmContainerLock acquired by vm 777439ba-28fe-4bd4-bad2-d602ea13b24d
Thread-64528::WARNING::2014-12-30 11:59:04,008::libvirtconnection::135::root::(wrapper) connection to libvirt broken. ecode: 1 edom: 7
Thread-64528::CRITICAL::2014-12-30 11:59:04,009::libvirtconnection::137::root::(wrapper) taking calling process down.
MainThread::DEBUG::2014-12-30 11:59:04,010::vdsm::58::vds::(sigtermHandler) Received signal 15
Thread-64528::DEBUG::2014-12-30 11:59:04,010::libvirtconnection::143::root::(wrapper) Unknown libvirterror: ecode: 1 edom: 7 level: 2 message: internal error client socket is closed
MainThread::DEBUG::2014-12-30 11:59:04,010::protocoldetector::144::vds.MultiProtocolAcceptor::(stop) Stopping Acceptor
Thread-64528::INFO::2014-12-30 11:59:04,010::vm::6098::root::(_getNetworkIp) network rhevm: using 0
MainThread::INFO::2014-12-30 11:59:04,011::__init__::565::jsonrpc.JsonRpcServer::(stop) Stopping JsonRPC Server
Detector thread::DEBUG::2014-12-30 11:59:04,012::protocoldetector::115::vds.MultiProtocolAcceptor::(_cleanup) Cleaning Acceptor
MainThread::INFO::2014-12-30 11:59:04,014::vmchannels::188::vds::(stop) VM channels listener was stopped.
MainThread::INFO::2014-12-30 11:59:04,014::momIF::91::MOM::(stop) Shutting down MOM


It seems that during vm migration vdsm stopped/crashed. I suggest to open another bug.

Comment 14 Oved Ourfali 2014-12-30 19:04:21 UTC
Based on comment #13, moving back to on_qa. Raz, if not reproduced with the scenario in the description, please move to verified. Also, please open a new bug on the new scenario, and attach the relevant logs there.

Comment 15 Raz Tamir 2014-12-31 10:17:20 UTC
Based on comment #14 - moving to verify and I will open new bug (if needed)

Comment 16 Eyal Edri 2015-02-17 17:13:39 UTC
rhev 3.5.0 was released. closing.


Note You need to log in before you can comment on or make changes to this bug.