Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1354669 - osp-director-9: Upgrade from OSP8 async -> OSP9 causes failed resources for 'httpd_monitor_60000' on controller nodes after attempting AODH migration as part of upgrade.
Summary: osp-director-9: Upgrade from OSP8 async -> OSP9 causes failed resources for '...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: rhosp-director
Version: 9.0 (Mitaka)
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: async
: 9.0 (Mitaka)
Assignee: Sofer Athlan-Guyot
QA Contact: Amit Ugol
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-07-11 22:23 UTC by mlammon
Modified: 2017-02-27 23:52 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-02-27 23:52:37 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description mlammon 2016-07-11 22:23:32 UTC
osp-director-9: Upgrade from OSP8 async -> OSP9 causes failed resources for 'httpd_monitor_60000' on controller nodes after attempting AODH migration as part of upgrade.

Environment:
openstack-heat-common-6.0.0-7.el7ost.noarch
openstack-heat-api-cfn-6.0.0-7.el7ost.noarch
python-heat-tests-6.0.0-7.el7ost.noarch
openstack-tripleo-heat-templates-kilo-2.0.0-14.el7ost.noarch
openstack-tripleo-heat-templates-2.0.0-14.el7ost.noarch
openstack-heat-engine-6.0.0-7.el7ost.noarch
heat-cfntools-1.3.0-2.el7ost.noarch
openstack-tripleo-heat-templates-liberty-2.0.0-14.el7ost.noarch
openstack-heat-api-cloudwatch-6.0.0-7.el7ost.noarch
openstack-heat-templates-0-0.3.96a0b0bgit.el7ost.noarch
python-heatclient-1.2.0-1.el7ost.noarch
openstack-heat-api-6.0.0-7.el7ost.noarch
pcs-0.9.143-15.el7.x86_64
httpd-tools-2.4.6-40.el7_2.1.x86_64
httpd-2.4.6-40.el7_2.1.x86_64

Description:
After one of the upgrade steps, PCS resources failed actions such as:
httpd_monitor_60000 on overcloud-controller-0 'not running' (7): call=251, status=complete, exitreason='none',last-rc-change='Mon Jul 11 21:17:51 2016', queued=0ms, exec=0ms

Workaround:
Attempt to perform a pcs resource clean will clear the problem

[root@overcloud-controller-0 heat-admin]# pcs resource cleanup httpd


The upgrade command:
openstack overcloud deploy --templates --control-scale 3 --compute-scale 1   --neutron-network-type vxlan --neutron-tunnel-types vxlan  --ntp-server clock.redhat.com --|
timeout 90 -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml -e /usr/share/openstack-tripleo-heat-templates/environments/network-isolation.yaml -e netwo|
rk-environment.yaml --ceph-storage-scale 1 -e /usr/share/openstack-tripleo-heat-templates/environments/major-upgrade-keystone-liberty-mitaka.yaml


[stack@instack ~]$ ssh heat-admin@192.0.2.9                                                                                                                                                |
Last login: Mon Jul 11 21:31:16 2016 from 192.0.2.1
[heat-admin@overcloud-controller-0 ~]$ sudo -s                                                                                                                                             |
[root@overcloud-controller-0 heat-admin]# pcs status
Cluster name: tripleo_cluster
Last updated: Mon Jul 11 21:53:07 2016          Last change: Mon Jul 11 21:15:44 2016 by root via cibadmin on overcloud-controller-0
Stack: corosync
Current DC: overcloud-controller-2 (version 1.1.13-10.el7_2.2-44eb2dd) - partition with quorum
3 nodes and 118 resources configured

Online: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Full list of resources:

 ip-192.0.2.6   (ocf::heartbeat:IPaddr2):       Started overcloud-controller-0
 Clone Set: haproxy-clone [haproxy]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 ip-192.168.200.180     (ocf::heartbeat:IPaddr2):       Started overcloud-controller-1
 ip-192.168.100.10      (ocf::heartbeat:IPaddr2):       Started overcloud-controller-2
 ip-192.168.110.10      (ocf::heartbeat:IPaddr2):       Started overcloud-controller-0
 ip-192.168.100.11      (ocf::heartbeat:IPaddr2):       Started overcloud-controller-1
 ip-192.168.120.10      (ocf::heartbeat:IPaddr2):       Started overcloud-controller-2
 Master/Slave Set: redis-master [redis]
     Masters: [ overcloud-controller-0 ]
     Slaves: [ overcloud-controller-1 overcloud-controller-2 ]
 Master/Slave Set: galera-master [galera]
     Masters: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: mongod-clone [mongod]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: rabbitmq-clone [rabbitmq]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: memcached-clone [memcached]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-nova-scheduler-clone [openstack-nova-scheduler]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: neutron-l3-agent-clone [neutron-l3-agent]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-heat-engine-clone [openstack-heat-engine]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-ceilometer-api-clone [openstack-ceilometer-api]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: neutron-metadata-agent-clone [neutron-metadata-agent]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: neutron-ovs-cleanup-clone [neutron-ovs-cleanup]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: neutron-netns-cleanup-clone [neutron-netns-cleanup]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-heat-api-clone [openstack-heat-api]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-cinder-scheduler-clone [openstack-cinder-scheduler]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-nova-api-clone [openstack-nova-api]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-heat-api-cloudwatch-clone [openstack-heat-api-cloudwatch]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-ceilometer-collector-clone [openstack-ceilometer-collector]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-nova-consoleauth-clone [openstack-nova-consoleauth]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-glance-registry-clone [openstack-glance-registry]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-ceilometer-notification-clone [openstack-ceilometer-notification]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-cinder-api-clone [openstack-cinder-api]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: neutron-dhcp-agent-clone [neutron-dhcp-agent]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-glance-api-clone [openstack-glance-api]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: neutron-openvswitch-agent-clone [neutron-openvswitch-agent]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-nova-novncproxy-clone [openstack-nova-novncproxy]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: delay-clone [delay]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: neutron-server-clone [neutron-server]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: httpd-clone [httpd]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-ceilometer-central-clone [openstack-ceilometer-central]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-heat-api-cfn-clone [openstack-heat-api-cfn]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 openstack-cinder-volume        (systemd:openstack-cinder-volume):      Started overcloud-controller-0
 Clone Set: openstack-nova-conductor-clone [openstack-nova-conductor]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 my-stonith-xvm-controller0     (stonith:fence_xvm):    Started overcloud-controller-1
 my-stonith-xvm-controller1     (stonith:fence_xvm):    Started overcloud-controller-1
 my-stonith-xvm-controller2     (stonith:fence_xvm):    Started overcloud-controller-0
 Clone Set: openstack-aodh-listener-clone [openstack-aodh-listener]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-aodh-notifier-clone [openstack-aodh-notifier]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-aodh-evaluator-clone [openstack-aodh-evaluator]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]
 Clone Set: openstack-core-clone [openstack-core]
     Started: [ overcloud-controller-0 overcloud-controller-1 overcloud-controller-2 ]

Failed Actions:
* httpd_monitor_60000 on overcloud-controller-0 'not running' (7): call=251, status=complete, exitreason='none',
    last-rc-change='Mon Jul 11 21:17:51 2016', queued=0ms, exec=0ms
* httpd_monitor_60000 on overcloud-controller-2 'not running' (7): call=248, status=complete, exitreason='none',
    last-rc-change='Mon Jul 11 21:18:10 2016', queued=0ms, exec=0ms
* httpd_monitor_60000 on overcloud-controller-1 'not running' (7): call=251, status=complete, exitreason='none',
    last-rc-change='Mon Jul 11 21:18:03 2016', queued=0ms, exec=0ms


PCSD Status:
  overcloud-controller-0: Online
  overcloud-controller-1: Online
  overcloud-controller-2: Online

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@overcloud-controller-0 heat-admin]# pcs resource cleanup httpd_monitor_60000
Error: Unable to cleanup resource: httpd_monitor_60000
Resource 'httpd_monitor_60000' not found: No such device or address
Error performing operation: No such device or address

[root@overcloud-controller-0 heat-admin]# pcs resource cleanup httpd_monitor                                                                                                               |
Error: Unable to cleanup resource: httpd_monitor
Resource 'httpd_monitor' not found: No such device or address
Error performing operation: No such device or address

[root@overcloud-controller-0 heat-admin]# pcs resource cleanup httpd                                                                                                                       |
Waiting for 3 replies from the CRMd... OK
Cleaning up httpd:0 on overcloud-controller-0, removing fail-count-httpd
Cleaning up httpd:0 on overcloud-controller-1, removing fail-count-httpd
Cleaning up httpd:0 on overcloud-controller-2, removing fail-count-httpd

Comment 2 Jaromir Coufal 2017-01-25 19:45:39 UTC
Concerning upgrades to 9, fixing DFG, priority and flags.

Workaround available, not blocking critical procedure, we will investigate asap.

Comment 3 Sofer Athlan-Guyot 2017-01-26 10:08:45 UTC
Hi,

this is an old story:

 - https://bugzilla.redhat.com/show_bug.cgi?id=1382170 -> check for pcs status in between steps;

 - https://bugzilla.redhat.com/show_bug.cgi?id=1397918#c13 -> I ask for adding explicitly that the user does a  pcs resource cleanup if any resource is not in a clean state in the documentation.

This will prevent the user from being alerted by those transient errors.  If the error persist after the cleanup then a bz is required.

Would it be a correct solution for you Lammon ?  I mean the upgrade step did not failed: you had an UPDATE_COMPLETE after aodh step, and you could continue after the cleanup ?

Regards,

Comment 4 Jaromir Coufal 2017-02-27 23:52:37 UTC
Based on information we received this seems to be a duplicate of two already resolved documentation issues (from previous comment). Please re-open if the defect is different and add more information.


Note You need to log in before you can comment on or make changes to this bug.