Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1689838 - Ansible upgrade for RHHI cluster fails on the host running HE
Summary: Ansible upgrade for RHHI cluster fails on the host running HE
Keywords:
Status: NEW
Alias: None
Product: ovirt-ansible-roles
Classification: oVirt
Component: cluster-upgrade
Version: unspecified
Hardware: x86_64
OS: Unspecified
high
high
Target Milestone: ovirt-4.3.4
: ---
Assignee: Ondra Machacek
QA Contact: Lukas Svaty
URL:
Whiteboard:
Depends On:
Blocks: 1689853
TreeView+ depends on / blocked
 
Reported: 2019-03-18 09:25 UTC by bipin
Modified: 2019-03-26 08:38 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1689853 (view as bug list)
Environment:
Last Closed:
oVirt Team: Gluster


Attachments (Terms of Use)

Description bipin 2019-03-18 09:25:09 UTC
Description of problem:
======================
While upgrading the RHHI cluster using the ansible roles,the Host which has the HE running is stuck in "Preparing for Maintenance" state and fails. This is seen twice.


Version-Release number of selected component
============================================
ovirt-ansible-infra-1.1.12-1.el7ev.noarch
ovirt-ansible-cluster-upgrade-1.1.12-1.el7ev.noarch
ansible-2.7.9-1.el7ae.noarch
ovirt-ansible-shutdown-env-1.0.3-1.el7ev.noarch
ovirt-ansible-roles-1.1.6-1.el7ev.noarch
ovirt-engine-4.3.2.1-0.1.el7.noarch


How reproducible:
================
Twice

Steps to Reproduce:
==================
1.Try upgrading the RHV hosts 4.2 to 4.3 using ansible playbook
2.Create a upgrade yaml with the required details
3.Check the upgrade status on the Host which has HE running

Actual results:
==============
The upgrade fails

Expected results:
================
The HE should get migrated and the Host should be in Maintenance

Additional info:
===============
Once we manually migrate the HE, could see the ansible upgrade working fine.
But post that could see this issue though https://bugzilla.redhat.com/show_bug.cgi?id=1685951#c8

Comment 1 bipin 2019-03-18 09:34:49 UTC
Ansible log:
===========
TASK [ovirt.cluster-upgrade : Upgrade host] ***********************************************************************************************************************************************************************
task path: /usr/share/ansible/roles/ovirt.cluster-upgrade/tasks/upgrade.yml:1
<127.0.0.1> ESTABLISH LOCAL CONNECTION FOR USER: root
<127.0.0.1> EXEC /bin/sh -c 'echo ~root && sleep 0'
<127.0.0.1> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /root/.ansible/tmp/ansible-tmp-1552894166.44-226733322406964 `" && echo ansible-tmp-1552894166.44-226733322406964="` echo /root/.ansible/tmp/ansible-tmp-1552894166.44-226733322406964 `" ) && sleep 0'
Using module file /usr/share/ansible/roles/ovirt.cluster-upgrade/library/ovirt_host_28.py
<127.0.0.1> PUT /root/.ansible/tmp/ansible-local-8098X14nFa/tmp3E7lR2 TO /root/.ansible/tmp/ansible-tmp-1552894166.44-226733322406964/AnsiballZ_ovirt_host_28.py
<127.0.0.1> EXEC /bin/sh -c 'chmod u+x /root/.ansible/tmp/ansible-tmp-1552894166.44-226733322406964/ /root/.ansible/tmp/ansible-tmp-1552894166.44-226733322406964/AnsiballZ_ovirt_host_28.py && sleep 0'
<127.0.0.1> EXEC /bin/sh -c '/usr/bin/python2 /root/.ansible/tmp/ansible-tmp-1552894166.44-226733322406964/AnsiballZ_ovirt_host_28.py && sleep 0'

<127.0.0.1> EXEC /bin/sh -c 'rm -f -r /root/.ansible/tmp/ansible-tmp-1552894166.44-226733322406964/ > /dev/null 2>&1 && sleep 0'
The full traceback is:
Traceback (most recent call last):
  File "/tmp/ansible_ovirt_host_28_payload__UxiYv/__main__.py", line 531, in main
    reboot=module.params['reboot_after_upgrade'],
  File "/tmp/ansible_ovirt_host_28_payload__UxiYv/ansible_ovirt_host_28_payload.zip/ansible/module_utils/ovirt.py", line 749, in action
    poll_interval=self._module.params['poll_interval'],
  File "/tmp/ansible_ovirt_host_28_payload__UxiYv/ansible_ovirt_host_28_payload.zip/ansible/module_utils/ovirt.py", line 341, in wait
    raise Exception("Timeout exceed while waiting on result state of the entity.")
Exception: Timeout exceed while waiting on result state of the entity.


Engine log:
==========
2019-03-18 12:59:27,402+05 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-1) [1c051922-5b29-4bd9-abb6-0e83bb06ec8b] START, SetVdsStatusVDSCommand(HostName = rhsqa-grafton7-nic2.lab.eng.blr.redhat.com, SetVdsStatusVDSCommandParameters:{hostId='6fcdc52a-5ad6-41d2-b70a-35d9b2a721dd', status='PreparingForMaintenance', nonOperationalReason='NONE', stopSpmFailureLogged='true', maintenanceReason=''}), log id: 1609dee2
2019-03-18 12:59:27,116+05 INFO  [org.ovirt.engine.core.bll.hostdeploy.UpgradeHostCommand] (default task-11) [1c051922-5b29-4bd9-abb6-0e83bb06ec8b] Running command: UpgradeHostCommand internal: false. Entities affected :  ID: 6fcdc52a-5ad6-41d2-b70a-35d9b2a721dd Type: VDSAction group EDIT_HOST_CONFIGURATION with role type ADMIN
2019-03-18 12:59:27,265+05 INFO  [org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-1) [1c051922-5b29-4bd9-abb6-0e83bb06ec8b] Lock Acquired to object 'EngineLock:{exclusiveLocks='', sharedLocks='[7e686ac4-4933-11e9-ac3a-004755204901=POOL]'}'
2019-03-18 12:59:27,352+05 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-11) [] EVENT_ID: HOST_UPGRADE_STARTED(840), Host rhsqa-grafton7-nic2.lab.eng.blr.redhat.com upgrade was started (User: admin@internal-authz).
2019-03-18 12:59:27,397+05 INFO  [org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-1) [1c051922-5b29-4bd9-abb6-0e83bb06ec8b] Running command: MaintenanceNumberOfVdssCommand internal: true. Entities affected :  ID: 6fcdc52a-5ad6-41d2-b70a-35d9b2a721dd Type: VDSAction group MANIPULATE_HOST with role type ADMIN
2019-03-18 12:59:27,402+05 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-1) [1c051922-5b29-4bd9-abb6-0e83bb06ec8b] START, SetVdsStatusVDSCommand(HostName = rhsqa-grafton7-nic2.lab.eng.blr.redhat.com, SetVdsStatusVDSCommandParameters:{hostId='6fcdc52a-5ad6-41d2-b70a-35d9b2a721dd', status='PreparingForMaintenance', nonOperationalReason='NONE', stopSpmFailureLogged='true', maintenanceReason=''}), log id: 1609dee2
2019-03-18 12:59:27,402+05 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-1) [1c051922-5b29-4bd9-abb6-0e83bb06ec8b] VDS 'rhsqa-grafton7-nic2.lab.eng.blr.redhat.com' is spm and moved from up calling resetIrs.
2019-03-18 12:59:27,404+05 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.ResetIrsVDSCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-1) [1c051922-5b29-4bd9-abb6-0e83bb06ec8b] START, ResetIrsVDSCommand( ResetIrsVDSCommandParameters:{storagePoolId='7e686ac4-4933-11e9-ac3a-004755204901', ignoreFailoverLimit='false', vdsId='6fcdc52a-5ad6-41d2-b70a-35d9b2a721dd', ignoreStopFailed='false'}), log id: 1b071a98
2019-03-18 12:59:27,409+05 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-1) [1c051922-5b29-4bd9-abb6-0e83bb06ec8b] START, SpmStopVDSCommand(HostName = rhsqa-grafton7-nic2.lab.eng.blr.redhat.com, SpmStopVDSCommandParameters:{hostId='6fcdc52a-5ad6-41d2-b70a-35d9b2a721dd', storagePoolId='7e686ac4-4933-11e9-ac3a-004755204901'}), log id: 4d4a51b5
2019-03-18 12:59:27,415+05 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-1) [1c051922-5b29-4bd9-abb6-0e83bb06ec8b] SpmStopVDSCommand::Stopping SPM on vds 'rhsqa-grafton7-nic2.lab.eng.blr.redhat.com', pool id '7e686ac4-4933-11e9-ac3a-004755204901'
2019-03-18 12:59:27,423+05 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStopVDSCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-1) [1c051922-5b29-4bd9-abb6-0e83bb06ec8b] FINISH, SpmStopVDSCommand, return: , log id: 4d4a51b5
2019-03-18 12:59:27,428+05 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.ResetIrsVDSCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-1) [1c051922-5b29-4bd9-abb6-0e83bb06ec8b] FINISH, ResetIrsVDSCommand, return: , log id: 1b071a98
2019-03-18 12:59:27,438+05 INFO  [org.ovirt.engine.core.vdsbroker.SetVdsStatusVDSCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-1) [1c051922-5b29-4bd9-abb6-0e83bb06ec8b] FINISH, SetVdsStatusVDSCommand, return: , log id: 1609dee2
2019-03-18 12:59:27,441+05 INFO  [org.ovirt.engine.core.bll.MaintenanceNumberOfVdssCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-1) [1c051922-5b29-4bd9-abb6-0e83bb06ec8b] Lock freed to object 'EngineLock:{exclusiveLocks='', sharedLocks='[7e686ac4-4933-11e9-ac3a-004755204901=POOL]'}'
2019-03-18 12:59:27,524+05 INFO  [org.ovirt.engine.core.bll.MaintenanceVdsCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-1) [1c051922-5b29-4bd9-abb6-0e83bb06ec8b] Running command: MaintenanceVdsCommand internal: true. Entities affected :  ID: 6fcdc52a-5ad6-41d2-b70a-35d9b2a721dd Type: VDS
2019-03-18 12:59:27,563+05 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SetHaMaintenanceModeVDSCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-1) [1c051922-5b29-4bd9-abb6-0e83bb06ec8b] START, SetHaMaintenanceModeVDSCommand(HostName = rhsqa-grafton7-nic2.lab.eng.blr.redhat.com, SetHaMaintenanceModeVDSCommandParameters:{hostId='6fcdc52a-5ad6-41d2-b70a-35d9b2a721dd'}), log id: 40632e1f
2019-03-18 12:59:27,566+05 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.SetHaMaintenanceModeVDSCommand] (EE-ManagedThreadFactory-commandCoordinator-Thread-1) [1c051922-5b29-4bd9-abb6-0e83bb06ec8b] FINISH, SetHaMaintenanceModeVDSCommand, return: , log id: 40632e1f
2019-03-18 12:59:27,834+05 INFO  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-commandCoordinator-Thread-1) [1c051922-5b29-4bd9-abb6-0e83bb06ec8b] EVENT_ID: VDS_MAINTENANCE(15), Host rhsqa-grafton7-nic2.lab.eng.blr.redhat.com was switched to Maintenance Mode.
2019-03-18 12:59:28,057+05 INFO  [org.ovirt.engine.core.bll.ConcurrentChildCommandsExecutionCallback] (EE-ManagedThreadFactory-engineScheduled-Thread-45) [1c051922-5b29-4bd9-abb6-0e83bb06ec8b] Command 'MaintenanceNumberOfVdss' (id: 'a76d7e9f-1d92-4600-87e8-7966db533e02') waiting on child command id: '9744c2b4-7b7d-4be4-addb-e65ff6f4d645' type:'MaintenanceVds' to complete


Messages:
========
Mar 18 12:59:26 hostedenginesm3 python2: ansible-ovirt_host_28 Invoked with comment=None activate=True force=False power_management_enabled=None cluster=None fetch_nested=False hosted_engine=None id=None check_upgrade=True kdump_integration=None iscsi=None state=upgraded reboot_after_upgrade=True auth={'timeout': 0, 'url': 'https://hostedenginesm3.lab.eng.blr.redhat.com/ovirt-engine/api', 'insecure': True, 'kerberos': False, 'compress': True, 'headers': None, 'token': 'dC9d1hXBkFgfz-bgBkEI3rt4eF58m4qmyeD5pMBQIfYQ76zV9YPi-C8YyGbKYGgpke9N-NCMertX6M8be_pv3A', 'ca_file': '/etc/pki/ovirt-engine/ca.pem'} nested_attributes=[] address=None override_iptables=None password=NOT_LOGGING_PARAMETER wait=True public_key=False name=rhsqa-grafton7-nic2.lab.eng.blr.redhat.com spm_priority=None poll_interval=3 kernel_params=None timeout=3600 override_display=None
Mar 18 13:01:01 hostedenginesm3 systemd: Started Session 7 of user root.
Mar 18 13:59:32 hostedenginesm3 python2: ansible-ovirt_event_28 Invoked with origin=cluster_upgrade custom_id=320069251 storage_domain=None description=Upgrade of cluster Default failed. state=present severity=error user=None poll_interval=3 vm=None auth={'timeout': 0, 'url': 'https://hostedenginesm3.lab.eng.blr.redhat.com/ovirt-engine/api', 'insecure': True, 'kerberos': False, 'compress': True, 'headers': None, 'token': 'dC9d1hXBkFgfz-bgBkEI3rt4eF58m4qmyeD5pMBQIfYQ76zV9YPi-C8YyGbKYGgpke9N-NCMertX6M8be_pv3A', 'ca_file': '/etc/pki/ovirt-engine/ca.pem'} cluster=7e69dcec-4933-11e9-ac17-004755204901 fetch_nested=False nested_attributes=[] timeout=180 data_center=None host=None template=None id=None wait=True
Mar 18 13:59:32 hostedenginesm3 python2: ansible-ovirt_cluster Invoked with comment=None ha_reservation=None fence_skip_if_connectivity_broken=None mac_pool=None virt=None threads_as_cores=None gluster=None vm_reason=None fetch_nested=False migration_bandwidth_limit=None switch_type=None data_center=None ksm_numa=None scheduling_policy_properties=[{'name': 'HighUtilization', 'value': '80'}, {'name': 'CpuOverCommitDurationMinutes', 'value': '2'}] description=None cpu_arch=None rng_sources=None network=None state=present ksm=None external_network_providers=None migration_compressed=None ballooning=None migration_auto_converge=None fence_enabled=None migration_policy=None auth={'timeout': 0, 'url': 'https://hostedenginesm3.lab.eng.blr.redhat.com/ovirt-engine/api', 'insecure': True, 'kerberos': False, 'compress': True, 'headers': None, 'token': 'dC9d1hXBkFgfz-bgBkEI3rt4eF58m4qmyeD5pMBQIfYQ76zV9YPi-C8YyGbKYGgpke9N-NCMertX6M8be_pv3A', 'ca_file': '/etc/pki/ovirt-engine/ca.pem'} resilience_policy=None fence_connectivity_threshold=None spice_proxy=None nested_attributes=[] memory_policy=None migration_bandwidth=None fence_skip_if_sd_active=None scheduling_policy=none wait=True compatibility_version=None serial_policy_value=None name=Default host_reason=None poll_interval=3 cpu_type=None timeout=180 serial_policy=None trusted_service=None


Note You need to log in before you can comment on or make changes to this bug.