Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1365242 - 3.4->3.5->3.6->4.0 SHE migration: ovirt-ha-agent not working correctly / state=AgentStopped
Summary: 3.4->3.5->3.6->4.0 SHE migration: ovirt-ha-agent not working correctly / stat...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-hosted-engine-ha
Classification: oVirt
Component: Agent
Version: 2.0.1
Hardware: Unspecified
OS: Unspecified
medium
medium vote
Target Milestone: ovirt-4.0.4
: 2.0.4
Assignee: Simone Tiraboschi
QA Contact: Jiri Belka
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-08 17:04 UTC by Jiri Belka
Modified: 2016-09-26 12:35 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
With 4.0 we moved to the jsonrpc protocol; adding additional checks on jsonrpc responses.
Clone Of:
Environment:
Last Closed: 2016-09-26 12:35:39 UTC
oVirt Team: Integration
rule-engine: ovirt-4.0.z+
rule-engine: exception+
ylavi: planning_ack+
sbonazzo: devel_ack+
mavital: testing_ack+


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
oVirt gerrit 62162 master MERGED jsonrpc: safely parsing empty responses 2016-08-30 10:06:40 UTC
oVirt gerrit 63001 v2.0.z MERGED jsonrpc: safely parsing empty responses 2016-09-02 10:01:00 UTC

Description Jiri Belka 2016-08-08 17:04:44 UTC
Description of problem:

After doing SHE migration path 3.4->3.5->3.6->4.0 and ending global maintenance, HE VM was not automatically started and hosted-engine --vm-status showed that agent was in 'state=AgentStopped'.

manually starting HE VM with hosted-engine --vm-start worked fine.

~~~
# hosted-engine --vm-status | sed 's/rhev.lab.eng.brq.redhat/example.com/'
/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/storage_backends.py:15: DeprecationWarning: vd
scli uses xmlrpc. since ovirt 3.6 xmlrpc is deprecated, please use vdsm.jsonrpcvdscli
  import vdsm.vdscli


--== Host 1 status ==--

Status up-to-date                  : False
Hostname                           : 10-34-60-151.example.com.com
Host ID                            : 1
Engine status                      : unknown stale-data
Score                              : 0
stopped                            : True
Local maintenance                  : False
crc32                              : e9f3cf55
Host timestamp                     : 231104
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=231104 (Mon Aug  8 15:53:34 2016)
        host-id=1
        score=0
        maintenance=False
        state=AgentStopped
        stopped=True


--== Host 2 status ==--

Status up-to-date                  : True
Hostname                           : 10-34-60-215.example.com.com
Host ID                            : 2
Engine status                      : {"reason": "vm not running on this host", "health": "bad", "vm": "dow
n", "detail": "unknown"}
Score                              : 0
stopped                            : False
Local maintenance                  : True
crc32                              : 2e113351
Host timestamp                     : 234997
Extra metadata (valid at timestamp):
        metadata_parse_version=1
        metadata_feature_version=1
        timestamp=234997 (Mon Aug  8 18:39:40 2016)
        host-id=2
        score=0
        maintenance=True
        state=LocalMaintenance
        stopped=False
~~~

There's a lot of "ovirt-ha-agent ovirt_hosted_engine_ha.agent.agent.Agent ERROR Error: ''items'' - trying to restart agent" in the log.

Both hosts were EL7 with 4.0 rpms.

Version-Release number of selected component (if applicable):
ovirt-hosted-engine-ha-2.0.1-1.el7ev.noarch

How reproducible:
hard to reproduce, if at all possible

Steps to Reproduce:
1. discovered as part of 3.4->3.5->3.6->4.0 SHE migration
2.
3.

Actual results:
HE VM was not started after ending global maintenance

Expected results:
HE VM should be started automatically.

Additional info:

Comment 2 Simone Tiraboschi 2016-08-09 15:43:45 UTC
The issue is here:

MainThread::WARNING::2016-08-08 15:51:03,712::hosted_engine::480::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Unexpected error
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 445, in start_monitoring
    self._initialize_storage_images()
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/agent/hosted_engine.py", line 667, in _initialize_storage_images
    img.prepare_images()
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/image.py", line 141, in prepare_images
    for volUUID in vm_vol_uuid_list['items']:
KeyError: 'items'
MainThread::INFO::2016-08-08 15:51:05,328::hosted_engine::496::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(start_monitoring) Sleeping 60 seconds
MainThread::INFO::2016-08-08 15:52:05,455::brokerlink::111::ovirt_hosted_engine_ha.lib.brokerlink.BrokerLink::(notify) Trying: notify time=1470664325.46 type=state_transition detail=GlobalMaintenance-ReinitializeFSM hostname='10-34-60-151.rhev.lab.eng.brq.redhat.com'

It seams that a certain time you got an image without a volume (still not sure how) and our code failed scanning it.

Comment 3 Jiri Belka 2016-09-19 23:27:31 UTC
ok, ovirt-hosted-engine-ha-2.0.4-1.el7ev.noarch

can't see the issue anymore as described in #2.


Note You need to log in before you can comment on or make changes to this bug.