Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1685951 - [RFE] HC prerequisites are not carried out before cluster upgrade [NEEDINFO]
Summary: [RFE] HC prerequisites are not carried out before cluster upgrade
Keywords:
Status: ASSIGNED
Alias: None
Product: ovirt-ansible-roles
Classification: oVirt
Component: cluster-upgrade
Version: unspecified
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ovirt-4.3.4
: ---
Assignee: Ondra Machacek
QA Contact: SATHEESARAN
URL:
Whiteboard:
Depends On:
Blocks: 1500728
TreeView+ depends on / blocked
 
Reported: 2019-03-06 11:45 UTC by SATHEESARAN
Modified: 2019-03-21 11:32 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Clone Of: 1500728
Environment:
Last Closed: 2019-03-08 14:17:55 UTC
oVirt Team: Gluster
bshetty: needinfo? (omachace)
sasundar: ovirt-4.3?


Attachments (Terms of Use)

Description SATHEESARAN 2019-03-06 11:45:30 UTC
Description of problem:
-----------------------
For customers that have multiple RHHI clusters, an ansible based upgrade path would be easier. Requirement is to provide an ansible role that can be used to upgrade a cluster.

Version-Release number of selected component (if applicable):


How reproducible:
NA

--- Additional comment from Sahina Bose on 2018-11-29 07:52:58 UTC ---

We already have an ovirt-role to upgrade cluster. This needs to be tested. Moving to ON_QA to test this - https://github.com/oVirt/ovirt-ansible-cluster-upgrade/blob/master/README.md

--- Additional comment from bipin on 2019-02-26 09:04:28 UTC ---

Assigning back the bug since the verification failed. While running the playbook, could see the absence of gluster roles.
While upgrading could see none of the gluster bricks were stopped, and the PID were active though the rhev mount's were unmounted.
There should be a way where the gluster bricks should be killed before upgrading.


Filesystem                                                           Type            Size  Used Avail Use% Mounted on
/dev/mapper/rhvh_rhsqa--grafton7--nic2-rhvh--4.3.0.5--0.20190221.0+1 ext4            786G  2.6G  744G   1% /
devtmpfs                                                             devtmpfs        126G     0  126G   0% /dev
tmpfs                                                                tmpfs           126G   16K  126G   1% /dev/shm
tmpfs                                                                tmpfs           126G  566M  126G   1% /run
tmpfs                                                                tmpfs           126G     0  126G   0% /sys/fs/cgroup
/dev/mapper/rhvh_rhsqa--grafton7--nic2-var                           ext4             15G  4.2G  9.8G  31% /var
/dev/mapper/rhvh_rhsqa--grafton7--nic2-tmp                           ext4            976M  3.9M  905M   1% /tmp
/dev/mapper/rhvh_rhsqa--grafton7--nic2-home                          ext4            976M  2.6M  907M   1% /home
/dev/mapper/gluster_vg_sdc-gluster_lv_engine                         xfs             100G  6.9G   94G   7% /gluster_bricks/engine
/dev/sda1                                                            ext4            976M  253M  657M  28% /boot
/dev/mapper/gluster_vg_sdb-gluster_lv_vmstore                        xfs             4.0T   11G  3.9T   1% /gluster_bricks/vmstore
/dev/mapper/gluster_vg_sdb-gluster_lv_data                           xfs              12T  1.5T   11T  13% /gluster_bricks/data
rhsqa-grafton7-nic2.lab.eng.blr.redhat.com:/engine                   fuse.glusterfs  100G  7.9G   93G   8% /rhev/data-center/mnt/glusterSD/rhsqa-grafton7-nic2.lab.eng.blr.redhat.com:_engine
tmpfs                                                                tmpfs            26G     0   26G   0% /run/user/0


[root@rhsqa-grafton7 ~]# pidof glusterfs
41191 38408 38286 38000

Comment 1 SATHEESARAN 2019-03-06 11:48:17 UTC
HC pre-requisites includes:
1. stopping geo-rep session if anything is in progress
2. check for self-heal progress, if self-heal in progress, fail the upgrade.
3. check for brick quorum is met for the volume.
4. Stop glusterfs processes, glusterd service

Comment 2 Gobinda Das 2019-03-07 10:42:46 UTC
Is it easy to fix or needs more time?

Comment 3 Gobinda Das 2019-03-08 06:04:00 UTC
Hi sas,
Looks like problem with HE fqdn
In log i can see Error: Failed to read response: [(<pycurl.Curl object at 0x7fa4270569d8>, 6, 'Could not resolve host: hostedenginesm3.lab.eng.blr.********.com; Unknown error')]
So because of HE fqdn not resolved the api call failed I think

Here is full error:

2019-02-26 12:52:46,070 p=29986 u=root |  TASK [ovirt.cluster-upgrade : Get hosts] **************************************************************************************************************************************************************************
2019-02-26 12:52:46,070 p=29986 u=root |  task path: /usr/share/ansible/roles/ovirt.cluster-upgrade/tasks/main.yml:24
2019-02-26 12:52:46,276 p=29986 u=root |  Using module file /usr/lib/python2.7/site-packages/ansible/modules/cloud/ovirt/ovirt_host_facts.py
2019-02-26 12:52:46,517 p=29986 u=root |  The full traceback is:
Traceback (most recent call last):
  File "/tmp/ansible_ovirt_host_facts_payload_N4_GxY/__main__.py", line 88, in main
    all_content=module.params['all_content'],
  File "/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py", line 11714, in list
    return self._internal_get(headers, query, wait)
  File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 211, in _internal_get
    return future.wait() if wait else future
  File "/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py", line 54, in wait
    response = self._connection.wait(self._context)
  File "/usr/lib64/python2.7/site-packages/ovirtsdk4/__init__.py", line 496, in wait
    return self.__wait(context, failed_auth)
  File "/usr/lib64/python2.7/site-packages/ovirtsdk4/__init__.py", line 510, in __wait
    raise Error("Failed to read response: {}".format(err_list))
Error: Failed to read response: [(<pycurl.Curl object at 0x7fa4270569d8>, 6, 'Could not resolve host: hostedenginesm3.lab.eng.blr.********.com; Unknown error')]

2019-02-26 12:52:46,518 p=29986 u=root |  fatal: [localhost]: FAILED! => {
    "changed": false, 
    "invocation": {
        "module_args": {
            "all_content": false, 
            "fetch_nested": false, 
            "nested_attributes": [], 
            "pattern": "cluster=Default  name=* status=up"
        }
    }, 
    "msg": "Failed to read response: [(<pycurl.Curl object at 0x7fa4270569d8>, 6, 'Could not resolve host: hostedenginesm3.lab.eng.blr.********.com; Unknown error')]"

Comment 4 Ondra Machacek 2019-03-08 10:39:00 UTC
Not that easy, need more time. Should RHV infra team work on this or do you (gluster team) work on this?

Your issue is that you are using password which is also contained in hostname. So it's obfuscated for more info see:
https://github.com/ansible/ansible/issues/19278

Comment 5 Martin Perina 2019-03-08 14:17:55 UTC
(In reply to Ondra Machacek from comment #4)
> Not that easy, need more time. Should RHV infra team work on this or do you
> (gluster team) work on this?
> 
> Your issue is that you are using password which is also contained in
> hostname. So it's obfuscated for more info see:
> https://github.com/ansible/ansible/issues/19278

This is already known issue within Ansible no_log implementation, I don't think we should do anything about it with cluster-upgrade role, this needs to be fixed in Ansible itself:

https://github.com/ansible/ansible/issues/19278

My recommendation is to use safe passwords instead of well-known strings which can be part of FQDNS, domains, ...

Comment 6 SATHEESARAN 2019-03-11 11:28:34 UTC
(In reply to Martin Perina from comment #5)
> (In reply to Ondra Machacek from comment #4)
> > Not that easy, need more time. Should RHV infra team work on this or do you
> > (gluster team) work on this?
> > 
> > Your issue is that you are using password which is also contained in
> > hostname. So it's obfuscated for more info see:
> > https://github.com/ansible/ansible/issues/19278
> 
> This is already known issue within Ansible no_log implementation, I don't
> think we should do anything about it with cluster-upgrade role, this needs
> to be fixed in Ansible itself:
> 
> https://github.com/ansible/ansible/issues/19278
> 
> My recommendation is to use safe passwords instead of well-known strings
> which can be part of FQDNS, domains, ...


Thanks Martin & Ondra,
Yes, initially the password was part of the hostname used.

But that's not the problem here. RHHI-V needs set of pre-requisites to be done and that's
been taken care while testing with this cluster-upgrade role

Comment 7 SATHEESARAN 2019-03-11 11:29:48 UTC
Please check comment1 for the set of pre-requisites.
As gluster team is aware of these set of pre-requisites, this upgrade-cluster
role should be updated for HC environment

Comment 8 bipin 2019-03-14 10:24:54 UTC
While testing the upgrade, i see a exception error while the host goes for a reboot
But i see once the host comes up, its updated to the latest image and all the services running.


Error:
=====
******************************************
2019-03-14 14:14:56,893 p=61390 u=root |  ok: [localhost]
2019-03-14 14:14:56,955 p=61390 u=root |  TASK [ovirt.cluster-upgrade : Upgrade host] ***********************************************************************************************************************************************************************
2019-03-14 14:28:24,163 p=61390 u=root |  An exception occurred during task execution. To see the full traceback, use -vvv. The error was: Exception: Error while waiting on result state of the entity.
2019-03-14 14:28:24,163 p=61390 u=root |  fatal: [localhost]: FAILED! => {"changed": false, "msg": "Error while waiting on result state of the entity."}
2019-03-14 14:28:24,225 p=61390 u=root |  TASK [ovirt.cluster-upgrade : Log event about cluster upgrade failed] *********************************************************************************************************************************************
2019-03-14 14:28:24,654 p=61390 u=root |  changed: [localhost]
2019-03-14 14:28:24,716 p=61390 u=root |  TASK [ovirt.cluster-upgrade : Set original cluster policy] ********************************************************************************************************************************************************
2019-03-14 14:28:25,224 p=61390 u=root |  changed: [localhost]
2019-03-14 14:28:25,287 p=61390 u=root |  TASK [ovirt.cluster-upgrade : Start again stopped VMs] ************************************************************************************************************************************************************
2019-03-14 14:28:25,363 p=61390 u=root |  TASK [ovirt.cluster-upgrade : Start again pin to host VMs] ********************************************************************************************************************************************************
2019-03-14 14:28:25,442 p=61390 u=root |  TASK [ovirt.cluster-upgrade : Logout from oVirt] ******************************************************************************************************************************************************************
2019-03-14 14:28:25,457 p=61390 u=root |  skipping: [localhost]
2019-03-14 14:28:25,458 p=61390 u=root |  PLAY RECAP ********************************************************************************************************************************************************************************************************
2019-03-14 14:28:25,459 p=61390 u=root |  localhost                  : ok=22   changed=5    unreachable=0    failed=1   


Ondra,

Could you please take a look?Attaching the required files


Note You need to log in before you can comment on or make changes to this bug.