Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1685072 - Upgrade playbook fails at [openshift_node : stop docker to kill static pods] task in node with CRI-O
Summary: Upgrade playbook fails at [openshift_node : stop docker to kill static pods] ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Upgrade
Version: 3.11.0
Hardware: Unspecified
OS: Linux
unspecified
high
Target Milestone: ---
: 3.11.z
Assignee: Scott Dodson
QA Contact: Weihua Meng
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-04 10:09 UTC by Joel Rosental R.
Modified: 2019-04-11 05:38 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Previously the upgrade would attempt to stop docker on nodes that had been configured to only run cri-o which resulted in a playbook failure. Now we no longer attempt to stop docker on nodes that are configured only for cri-o ensuring successful upgrades.
Clone Of:
Environment:
Last Closed: 2019-04-11 05:38:34 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:0636 None None None 2019-04-11 05:38:43 UTC

Description Joel Rosental R. 2019-03-04 10:09:20 UTC
Description of problem:
While trying to upgrade from OCP 3.11.69 to 3.11.82 in a cluster running CRI-O instead of docker, the upgrade playbook fails with the following error:


2019-02-22 12:53:53,687 p=742 u=sys.openshift |  TASK [openshift_node : stop docker to kill static pods] **************************************************************************************************************
2019-02-22 12:53:53,687 p=742 u=sys.openshift |  task path: /usr/share/ansible/openshift-ansible/roles/openshift_node/tasks/upgrade/stop_services.yml:10
2019-02-22 12:53:53,687 p=742 u=sys.openshift |  Friday 22 February 2019  12:53:53 +0100 (0:00:00.657)       0:31:44.988 ******* 
2019-02-22 12:53:53,745 p=742 u=sys.openshift |  Running systemd
2019-02-22 12:53:53,820 p=742 u=sys.openshift |  Using module file /usr/lib/python2.7/site-packages/ansible/modules/system/systemd.py
2019-02-22 12:53:53,966 p=742 u=sys.openshift |  Escalation succeeded
2019-02-22 12:53:54,121 p=742 u=sys.openshift |  FAILED - RETRYING: stop docker to kill static pods (3 retries left).Result was: {
    "attempts": 1, 
    "changed": false, 
    "invocation": {
        "module_args": {
            "daemon_reload": false, 
            "enabled": null, 
            "force": null, 
            "masked": null, 
            "name": "docker", 
            "no_block": false, 
            "state": "stopped", 
            "user": false
        }
    }, 
    "msg": "Could not find the requested service docker: host", 
    "retries": 4
}

As per xx it looks like it's expecting that masters are running docker:

- name: stop docker to kill static pods
  service:
    name: docker
    state: stopped
  register: l_openshift_node_upgrade_docker_stop_result
  until: not (l_openshift_node_upgrade_docker_stop_result is failed)
  retries: 3
  delay: 30
  when: >
        inventory_hostname in groups['oo_masters_to_config']
        or (l_docker_upgrade is defined and l_docker_upgrade | bool)


Version-Release number of the following components:
openshift-ansible-playbooks-3.11.82-3.git.0.9718d0a.el7.noarch
ansible --version
ansible 2.6.13
config file = /etc/ansible/ansible.cfg
configured module search path = [u'/home/sys.openshift/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python2.7/site-packages/ansible
executable location = /usr/bin/ansible
python version = 2.7.5 (default, Sep 12 2018, 05:31:16) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)]


How reproducible:
Always

Steps to Reproduce:
1. ansible-playbook -i <hosts-file> playbooks/byo/openshift-cluster/upgrades/v3_11/upgrade.yml


Actual results:

2019-02-22 12:53:53,687 p=742 u=sys.openshift |  TASK [openshift_node : stop docker to kill static pods] **************************************************************************************************************
2019-02-22 12:53:53,687 p=742 u=sys.openshift |  task path: /usr/share/ansible/openshift-ansible/roles/openshift_node/tasks/upgrade/stop_services.yml:10
2019-02-22 12:53:53,687 p=742 u=sys.openshift |  Friday 22 February 2019  12:53:53 +0100 (0:00:00.657)       0:31:44.988 ******* 
2019-02-22 12:53:53,745 p=742 u=sys.openshift |  Running systemd
2019-02-22 12:53:53,820 p=742 u=sys.openshift |  Using module file /usr/lib/python2.7/site-packages/ansible/modules/system/systemd.py
2019-02-22 12:53:53,966 p=742 u=sys.openshift |  Escalation succeeded
2019-02-22 12:53:54,121 p=742 u=sys.openshift |  FAILED - RETRYING: stop docker to kill static pods (3 retries left).Result was: {
    "attempts": 1, 
    "changed": false, 
    "invocation": {
        "module_args": {
            "daemon_reload": false, 
            "enabled": null, 
            "force": null, 
            "masked": null, 
            "name": "docker", 
            "no_block": false, 
            "state": "stopped", 
            "user": false
        }
    }, 
    "msg": "Could not find the requested service docker: host", 
    "retries": 4
}
2019-02-22 12:54:24,154 p=742 u=sys.openshift |  Running systemd
2019-02-22 12:54:24,243 p=742 u=sys.openshift |  Using module file /usr/lib/python2.7/site-packages/ansible/modules/system/systemd.py
2019-02-22 12:54:24,523 p=742 u=sys.openshift |  Escalation succeeded
2019-02-22 12:54:24,707 p=742 u=sys.openshift |  FAILED - RETRYING: stop docker to kill static pods (2 retries left).Result was: {
    "attempts": 2, 
    "changed": false, 
    "invocation": {
        "module_args": {
            "daemon_reload": false, 
            "enabled": null, 
            "force": null, 
            "masked": null, 
            "name": "docker", 
            "no_block": false, 
            "state": "stopped", 
            "user": false
        }
    }, 
    "msg": "Could not find the requested service docker: host", 
    "retries": 4
}
2019-02-22 12:54:54,740 p=742 u=sys.openshift |  Running systemd
2019-02-22 12:54:54,825 p=742 u=sys.openshift |  Using module file /usr/lib/python2.7/site-packages/ansible/modules/system/systemd.py
2019-02-22 12:54:55,190 p=742 u=sys.openshift |  Escalation succeeded
2019-02-22 12:54:55,357 p=742 u=sys.openshift |  FAILED - RETRYING: stop docker to kill static pods (1 retries left).Result was: {
    "attempts": 3, 
    "changed": false, 
    "invocation": {
        "module_args": {
            "daemon_reload": false, 
            "enabled": null, 
            "force": null, 
            "masked": null, 
            "name": "docker", 
            "no_block": false, 
            "state": "stopped", 
            "user": false
        }
    }, 
    "msg": "Could not find the requested service docker: host", 
    "retries": 4
}
2019-02-22 12:55:25,360 p=742 u=sys.openshift |  Running systemd
2019-02-22 12:55:25,443 p=742 u=sys.openshift |  Using module file /usr/lib/python2.7/site-packages/ansible/modules/system/systemd.py
2019-02-22 12:55:25,744 p=742 u=sys.openshift |  Escalation succeeded
2019-02-22 12:55:25,989 p=742 u=sys.openshift |  fatal: [tux172.xyz.com]: FAILED! => {
    "attempts": 3, 
    "changed": false, 
    "invocation": {
        "module_args": {
            "daemon_reload": false, 
            "enabled": null, 
            "force": null, 
            "masked": null, 
            "name": "docker", 
            "no_block": false, 
            "state": "stopped", 
            "user": false
        }
    }, 
    "msg": "Could not find the requested service docker: host"
}
2019-02-22 12:55:25,993 p=742 u=sys.openshift |  	to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_11/upgrade.retry

2019-02-22 12:55:25,993 p=742 u=sys.openshift |  PLAY RECAP ***********************************************************************************************************************************************************
2019-02-22 12:55:25,993 p=742 u=sys.openshift |  localhost                  : ok=36   changed=0    unreachable=0    failed=0   
2019-02-22 12:55:25,993 p=742 u=sys.openshift |  tux123.xyz.com  : ok=431  changed=77   unreachable=0    failed=0   
2019-02-22 12:55:25,994 p=742 u=sys.openshift |  tux172.xyz.com  : ok=255  changed=48   unreachable=0    failed=1   
2019-02-22 12:55:25,994 p=742 u=sys.openshift |  tux173.xyz.com  : ok=242  changed=39   unreachable=0    failed=0   
2019-02-22 12:55:25,994 p=742 u=sys.openshift |  tux174.xyz.com  : ok=24   changed=1    unreachable=0    failed=0   
2019-02-22 12:55:25,994 p=742 u=sys.openshift |  tux175.xyz.com  : ok=24   changed=1    unreachable=0    failed=0   
2019-02-22 12:55:25,994 p=742 u=sys.openshift |  tux176.xyz.com  : ok=24   changed=1    unreachable=0    failed=0   
2019-02-22 12:55:25,994 p=742 u=sys.openshift |  tux177.xyz.com  : ok=24   changed=1    unreachable=0    failed=0   
2019-02-22 12:55:25,994 p=742 u=sys.openshift |  tux178.xyz.com  : ok=24   changed=1    unreachable=0    failed=0   
2019-02-22 12:55:25,994 p=742 u=sys.openshift |  tux179.xyz.com  : ok=24   changed=1    unreachable=0    failed=0   
2019-02-22 12:55:25,994 p=742 u=sys.openshift |  tux180.xyz.com  : ok=24   changed=1    unreachable=0    failed=0   
2019-02-22 12:55:25,995 p=742 u=sys.openshift |  tux181.xyz.com  : ok=24   changed=1    unreachable=0    failed=0   
2019-02-22 12:55:25,995 p=742 u=sys.openshift |  INSTALLER STATUS *****************************************************************************************************************************************************
2019-02-22 12:55:25,997 p=742 u=sys.openshift |  Initialization  : Complete (0:03:58)
2019-02-22 12:55:25,997 p=742 u=sys.openshift |  Friday 22 February 2019  12:55:25 +0100 (0:01:32.309)       0:33:17.298 ******* 
2019-02-22 12:55:25,997 p=742 u=sys.openshift |  =============================================================================== 
2019-02-22 12:55:26,001 p=742 u=sys.openshift |  openshift_node : update package meta data to speed install later. ------------------------------------------------------------------------------------------- 134.13s
/usr/share/ansible/openshift-ansible/roles/openshift_node/tasks/upgrade_pre.yml:13 ----------------------------------------------------------------------------------
2019-02-22 12:55:26,001 p=742 u=sys.openshift |  openshift_node : stop docker to kill static pods ------------------------------------------------------------------------------------------------------------- 92.31s
/usr/share/ansible/openshift-ansible/roles/openshift_node/tasks/upgrade/stop_services.yml:10 ------------------------------------------------------------------------
2019-02-22 12:55:26,001 p=742 u=sys.openshift |  Run variable sanity checks ----------------------------------------------------------------------------------------------------------------------------------- 59.98s
/usr/share/ansible/openshift-ansible/playbooks/init/sanity_checks.yml:14 --------------------------------------------------------------------------------------------
2019-02-22 12:55:26,001 p=742 u=sys.openshift |  openshift_node : Wait for master API to come back online ----------------------------------------------------------------------------------------------------- 59.88s
/usr/share/ansible/openshift-ansible/roles/openshift_node/tasks/upgrade/restart.yml:65 ------------------------------------------------------------------------------
2019-02-22 12:55:26,001 p=742 u=sys.openshift |  openshift_excluder : Get available excluder version ---------------------------------------------------------------------------------------------------------- 52.80s
/usr/share/ansible/openshift-ansible/roles/openshift_excluder/tasks/verify_excluder.yml:4 ---------------------------------------------------------------------------
2019-02-22 12:55:26,001 p=742 u=sys.openshift |  openshift_node : Clean up cri-o pods ------------------------------------------------------------------------------------------------------------------------- 39.42s
/usr/share/ansible/openshift-ansible/roles/openshift_node/tasks/upgrade/stop_services.yml:31 ------------------------------------------------------------------------
2019-02-22 12:55:26,001 p=742 u=sys.openshift |  openshift_node : Ensure cri-o is updated --------------------------------------------------------------------------------------------------------------------- 38.18s
/usr/share/ansible/openshift-ansible/roles/openshift_node/tasks/upgrade.yml:36 --------------------------------------------------------------------------------------
2019-02-22 12:55:26,001 p=742 u=sys.openshift |  Gathering Facts ---------------------------------------------------------------------------------------------------------------------------------------------- 34.59s
/usr/share/ansible/openshift-ansible/playbooks/openshift-node/private/registry_auth.yml:4 ---------------------------------------------------------------------------
2019-02-22 12:55:26,001 p=742 u=sys.openshift |  Gathering Facts ---------------------------------------------------------------------------------------------------------------------------------------------- 34.48s
/usr/share/ansible/openshift-ansible/playbooks/init/basic_facts.yml:7 -----------------------------------------------------------------------------------------------
2019-02-22 12:55:26,001 p=742 u=sys.openshift |  openshift_excluder : Install docker excluder - yum ----------------------------------------------------------------------------------------------------------- 33.99s
/usr/share/ansible/openshift-ansible/roles/openshift_excluder/tasks/install.yml:9 -----------------------------------------------------------------------------------
2019-02-22 12:55:26,001 p=742 u=sys.openshift |  Gathering Facts ---------------------------------------------------------------------------------------------------------------------------------------------- 33.26s
/usr/share/ansible/openshift-ansible/playbooks/openshift-node/private/registry_auth.yml:25 --------------------------------------------------------------------------
2019-02-22 12:55:26,002 p=742 u=sys.openshift |  openshift_excluder : Install openshift excluder - yum -------------------------------------------------------------------------------------------------------- 32.70s
/usr/share/ansible/openshift-ansible/roles/openshift_excluder/tasks/install.yml:34 ----------------------------------------------------------------------------------
2019-02-22 12:55:26,002 p=742 u=sys.openshift |  openshift_control_plane : Check status of control plane image pre-pull --------------------------------------------------------------------------------------- 31.31s
/usr/share/ansible/openshift-ansible/roles/openshift_control_plane/tasks/pre_pull_poll.yml:2 ------------------------------------------------------------------------
2019-02-22 12:55:26,002 p=742 u=sys.openshift |  openshift_node_group : Wait for the sync daemonset to become ready and available ----------------------------------------------------------------------------- 22.41s
/usr/share/ansible/openshift-ansible/roles/openshift_node_group/tasks/sync.yml:65 -----------------------------------------------------------------------------------
2019-02-22 12:55:26,002 p=742 u=sys.openshift |  Set fact of no_proxy_internal_hostnames ---------------------------------------------------------------------------------------------------------------------- 19.67s
/usr/share/ansible/openshift-ansible/playbooks/init/cluster_facts.yml:42 --------------------------------------------------------------------------------------------
2019-02-22 12:55:26,002 p=742 u=sys.openshift |  Run variable sanity checks ----------------------------------------------------------------------------------------------------------------------------------- 19.49s
/usr/share/ansible/openshift-ansible/playbooks/init/sanity_checks.yml:14 --------------------------------------------------------------------------------------------
2019-02-22 12:55:26,002 p=742 u=sys.openshift |  Gathering Facts ---------------------------------------------------------------------------------------------------------------------------------------------- 18.13s
/usr/share/ansible/openshift-ansible/playbooks/init/cluster_facts.yml:2 ---------------------------------------------------------------------------------------------
2019-02-22 12:55:26,002 p=742 u=sys.openshift |  Gathering Facts ---------------------------------------------------------------------------------------------------------------------------------------------- 18.13s
/usr/share/ansible/openshift-ansible/playbooks/init/version.yml:12 --------------------------------------------------------------------------------------------------
2019-02-22 12:55:26,002 p=742 u=sys.openshift |  Initialize openshift.node.sdn_mtu ---------------------------------------------------------------------------------------------------------------------------- 17.21s
/usr/share/ansible/openshift-ansible/playbooks/init/cluster_facts.yml:60 --------------------------------------------------------------------------------------------
2019-02-22 12:55:26,002 p=742 u=sys.openshift |  Gather Cluster facts ----------------------------------------------------------------------------------------------------------------------------------------- 17.05s
/usr/share/ansible/openshift-ansible/playbooks/init/cluster_facts.yml:27 --------------------------------------------------------------------------------------------
2019-02-22 12:55:26,003 p=742 u=sys.openshift |  Failure summary:


  1. Hosts:    tux172.xyz.com
     Play:     Update master nodes
     Task:     stop docker to kill static pods
     Message:  Could not find the requested service docker: host


Expected results:
It should check whether other runtimes (such as CRI-O) are installed instead of docker.

Additional info:

Comment 5 Weihua Meng 2019-03-25 09:56:51 UTC
Fixed.

openshift-ansible-3.11.98-1.git.0.3cfa7c3.el7


the task skipped for openshift_use_crio_only=True nodes 

TASK [openshift_node : stop docker to kill static pods] ************************
skipping: [qe-wmeng3r31169-np-1.0325-5g4.qe.rhcloud.com] => {
    "changed": false, 
    "skip_reason": "Conditional result was False"
}

Comment 7 errata-xmlrpc 2019-04-11 05:38:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:0636


Note You need to log in before you can comment on or make changes to this bug.