Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1695991 - During "undercloud images" stage, , Undercloud 100%CPU/hang/disconnect after starting neutron_api_healtcheck
Summary: During "undercloud images" stage, , Undercloud 100%CPU/hang/disconnect after ...
Keywords:
Status: NEW
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo
Version: 15.0 (Stein)
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Bernard Cafarelli
QA Contact: Arik Chernetsky
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-04 01:38 UTC by Alistair Tonner
Modified: 2019-04-15 13:56 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)
shell script to deploy openstack nodes in virt env. (deleted)
2019-04-04 01:40 UTC, Alistair Tonner
no flags Details

Description Alistair Tonner 2019-04-04 01:38:58 UTC
Description of problem:

   During OC deployment steps "undercloud images" -> Push repository to overcloud image
   Undercloud-0 goes to 100% CPU utilization and becomes unresponsive.  Noted by tailing /var/log/messages that this occurs immediately after neutron_api_healtcheck startup.


Version-Release number of selected component (if applicable):
RHEL 8 - 1845
OSP15 -> RHOS_TRUNK-15.0-RHEL-8-20190403.n.0


openstack-tripleo-common.noarch               10.6.1-0.20190403010356.ee080a6.el8ost               @rhelosp-15.0-trunk
openstack-tripleo-common-containers.noarch    10.6.1-0.20190403010356.ee080a6.el8ost               @rhelosp-15.0-trunk
openstack-tripleo-heat-templates.noarch       10.4.1-0.20190403000356.0596bda.el8ost               @rhelosp-15.0-trunk
openstack-tripleo-image-elements.noarch       10.3.1-0.20190325204940.253fe88.el8ost               @rhelosp-15.0-trunk
openstack-tripleo-puppet-elements.noarch      10.2.1-0.20190327211339.0f6cacb.el8ost               @rhelosp-15.0-trunk
openstack-tripleo-validations.noarch          10.3.1-0.20190403071532.1803506.el8ost               @rhelosp-15.0-trunk
puppet-tripleo.noarch                         10.3.1-0.20190402230344.8ba5ae4.el8ost               @rhelosp-15.0-trunk
python3-tripleo-common.noarch                 10.6.1-0.20190403010356.ee080a6.el8ost               @rhelosp-15.0-trunk
python3-tripleoclient.noarch                  11.3.1-0.20190402150355.0132e7d.el8ost               @rhelosp-15.0-trunk
python3-tripleoclient-heat-installer.noarch   11.3.1-0.20190402150355.0132e7d.el8ost               @rhelosp-15.0-trunk
ansible.noarch                                2.7.6-1.el8                                          @rhelosp-15.0-trunk
ansible-pacemaker.noarch                      1.0.4-0.20190129114541.0e4d7c0.el8ost                @rhelosp-15.0-trunk
ansible-role-atos-hsm.noarch                  0.1.1-0.20190306173142.f6f9c3f.el8ost                @rhelosp-15.0-trunk
ansible-role-chrony.noarch                    0.0.1-0.20190327040343.068668b.el8ost                @rhelosp-15.0-trunk
ansible-role-container-registry.noarch        1.0.1-0.20190219021249.d6a749a.el8ost                @rhelosp-15.0-trunk
ansible-role-redhat-subscription.noarch       1.0.2-0.20190215212927.13bf86d.el8ost                @rhelosp-15.0-trunk
ansible-role-thales-hsm.noarch                0.2.1-0.20190306204553.08b5efa.el8ost                @rhelosp-15.0-trunk
ansible-role-tripleo-modify-image.noarch      1.0.1-0.20190402220346.012209a.el8ost                @rhelosp-15.0-trunk
ansible-tripleo-ipsec.noarch                  9.0.1-0.20190220162047.f60ad6c.el8ost                @rhelosp-15.0-trunk
python3-heat-agent-ansible.noarch             1.8.1-0.20190402070337.ad2a5d1.el8ost                @rhelosp-15.0-trunk


How reproducible:

Consistent (4 occurrances on sealusa17.mobius.eng.lab.rdu2.redhat.com) over two days and both 
0403.n and 0329.n


Steps to Reproduce:

Run attached shell script to deploy full overcloud stack



Actual results:

Logged from /var/log/messages just to find out what last ran:

Apr  3 21:50:02 undercloud-0 platform-python[114729]: ansible-command Invoked with _raw_params=virt-copy-in -a overcloud-full.qcow2 /tmp/oc_repos/yum.repos.d /etc/ warn=True _uses_shell=False argv=None chdir=None executable=None creates=None removes=None stdin=None
Apr  3 21:50:02 undercloud-0 kvm[114770]: 1 guest now active
Apr  3 21:50:02 undercloud-0 kvm[114773]: 0 guests now active
Apr  3 21:50:02 undercloud-0 systemd[792]: Started D-Bus User Message Bus.
Apr  3 21:50:02 undercloud-0 podman[29465]: <50 set 4d0646d8802d16ec6ebcc1ca934d1bc7 2 60 220
Apr  3 21:50:02 undercloud-0 podman[29465]: >50 STORED
Apr  3 21:50:02 undercloud-0 podman[29465]: <50 set 4d0646d8802d16ec6ebcc1ca934d1bc7 2 60 220
Apr  3 21:50:02 undercloud-0 podman[29465]: >50 STORED
Apr  3 21:50:02 undercloud-0 podman[29465]: <50 set 4d0646d8802d16ec6ebcc1ca934d1bc7 2 60 220
Apr  3 21:50:02 undercloud-0 podman[29465]: >50 STORED
Apr  3 21:50:02 undercloud-0 podman[29465]: <50 get 4d0646d8802d16ec6ebcc1ca934d1bc7
Apr  3 21:50:02 undercloud-0 podman[29465]: >50 sending key 4d0646d8802d16ec6ebcc1ca934d1bc7
Apr  3 21:50:02 undercloud-0 podman[29465]: >50 END
Apr  3 21:50:02 undercloud-0 podman[29465]: <50 set 4adc4b7ca562a31314e6a907164f9a8c 2 60 277
Apr  3 21:50:02 undercloud-0 podman[29465]: >50 STORED
Apr  3 21:50:02 undercloud-0 podman[29465]: <50 get 4d0646d8802d16ec6ebcc1ca934d1bc7
Apr  3 21:50:02 undercloud-0 podman[29465]: >50 sending key 4d0646d8802d16ec6ebcc1ca934d1bc7
Apr  3 21:50:02 undercloud-0 podman[29465]: >50 END
Apr  3 21:50:02 undercloud-0 podman[29465]: <50 set c8f03b51e128817fd9c484a320466064 2 60 277
Apr  3 21:50:02 undercloud-0 podman[29465]: >50 STORED
Apr  3 21:50:02 undercloud-0 podman[29465]: <50 get 4d0646d8802d16ec6ebcc1ca934d1bc7
Apr  3 21:50:02 undercloud-0 podman[29465]: >50 sending key 4d0646d8802d16ec6ebcc1ca934d1bc7
Apr  3 21:50:02 undercloud-0 podman[29465]: >50 END
Apr  3 21:50:02 undercloud-0 podman[29465]: <50 delete 4adc4b7ca562a31314e6a907164f9a8c
Apr  3 21:50:02 undercloud-0 podman[29465]: >50 DELETED
Apr  3 21:50:02 undercloud-0 podman[29465]: <50 get 4d0646d8802d16ec6ebcc1ca934d1bc7
Apr  3 21:50:02 undercloud-0 podman[29465]: >50 sending key 4d0646d8802d16ec6ebcc1ca934d1bc7
Apr  3 21:50:02 undercloud-0 podman[29465]: >50 END
Apr  3 21:50:02 undercloud-0 podman[29465]: <50 delete c8f03b51e128817fd9c484a320466064
Apr  3 21:50:02 undercloud-0 podman[29465]: >50 DELETED
Apr  3 21:50:06 undercloud-0 kvm[114984]: 1 guest now active
Apr  3 21:50:06 undercloud-0 systemd[1]: Starting neutron_api healthcheck...
Apr  3 21:50:07 undercloud-0 podman[114988]: 200 192.168.24.1:9696 0.005388 seconds
Apr  3 21:50:07 undercloud-0 systemd[1]: Started neutron_api healthcheck.
packet_write_wait: Connection to 172.16.0.32 port 22: Broken pipe



Expected results:

  Overcloud deploy completes successfully



Additional info:


   Attaching: -> shell script to deploy, 
              -> neutron_api_healthceck logs
              -> podman logs.

Comment 1 Alistair Tonner 2019-04-04 01:40:14 UTC
Created attachment 1551619 [details]
shell script to deploy openstack nodes in virt env.

Comment 3 Alistair Tonner 2019-04-04 02:25:36 UTC
changed component as I used neutron *only* because it was the last thing I saw in the logs when I opened the bug

   at 21:49:56 - there is a systemd reload executed - why I'm not sure:

Apr  3 21:49:56 undercloud-0 systemd[1]: Started /usr/bin/systemctl start man-db-cache-update.
Apr  3 21:49:56 undercloud-0 systemd[1]: Starting man-db-cache-update.service...
Apr  3 21:49:56 undercloud-0 systemd[1]: Reloading.
Apr  3 21:49:57 undercloud-0 systemd-tmpfiles[112609]: [/usr/lib/tmpfiles.d/certmonger.conf:3] Line references path below legacy directory /var/run/, updating /var/run/certmonger → /run/certmonger; please update the tmpf
iles.d/ drop-in file accordingly.
Apr  3 21:49:57 undercloud-0 systemd-tmpfiles[112609]: [/usr/lib/tmpfiles.d/mdadm.conf:1] Line references path below legacy directory /var/run/, updating /var/run/mdadm → /run/mdadm; please update the tmpfiles.d/ drop-in
 file accordingly.
Apr  3 21:49:57 undercloud-0 systemd-tmpfiles[112609]: [/usr/lib/tmpfiles.d/radvd.conf:1] Line references path below legacy directory /var/run/, updating /var/run/radvd → /run/radvd; please update the tmpfiles.d/ drop-in
 file accordingly.
Apr  3 21:49:57 undercloud-0 systemd-tmpfiles[112609]: [/usr/lib/tmpfiles.d/subscription-manager.conf:1] Line references path below legacy directory /var/run/, updating /var/run/rhsm → /run/rhsm; please update the tmpfil
es.d/ drop-in file accordingly.
Apr  3 21:49:58 undercloud-0 systemd[1]: Started man-db-cache-update.service.
Apr  3 21:49:59 undercloud-0 platform-python[114296]: ansible-systemd Invoked with name=libvirtd state=restarted daemon_reload=False no_block=False enabled=None force=None masked=None user=None scope=None
Apr  3 21:49:59 undercloud-0 systemd[1]: Listening on Virtual machine lock manager socket.
Apr  3 21:49:59 undercloud-0 systemd[1]: Listening on Virtual machine log manager socket.
Apr  3 21:49:59 undercloud-0 systemd[1]: Starting Virtual Machine and Container Registration Service...
Apr  3 21:49:59 undercloud-0 systemd[1]: Started Virtual Machine and Container Registration Service.
Apr  3 21:49:59 undercloud-0 systemd[1]: Starting Virtualization daemon...
Apr  3 21:49:59 undercloud-0 systemd[1]: Started Virtualization daemon.
Apr  3 21:49:59 undercloud-0 systemd-udevd[547]: Network interface NamePolicy= disabled on kernel command line, ignoring.
Apr  3 21:49:59 undercloud-0 kvm[114355]: 1 guest now active
Apr  3 21:49:59 undercloud-0 kvm[114358]: 0 guests now active

  I cannot at the moment see exactly what happens after 21:50, but I'm going to suspect that systemd reload is where the problem is coming from


Note You need to log in before you can comment on or make changes to this bug.