Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1690787 - [OSP13] Controller-replacement fails (controller-removal) because : /var/log/containers/nova/nova-manage.log is owned by root:root
Summary: [OSP13] Controller-replacement fails (controller-removal) because : /var/log/...
Keywords:
Status: ON_QA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 14.0 (Rocky)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: z6
: 13.0 (Queens)
Assignee: Martin Schuppert
QA Contact: Joe H. Rahme
URL:
Whiteboard:
Depends On: 1685506
Blocks: 1690784
TreeView+ depends on / blocked
 
Reported: 2019-03-20 09:02 UTC by Martin Schuppert
Modified: 2019-04-11 16:31 UTC (History)
11 users (show)

Fixed In Version: openstack-tripleo-heat-templates-8.2.0-16.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1685506
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Launchpad 1820590 None None None 2019-03-20 09:02:52 UTC
OpenStack gerrit 644548 None None None 2019-03-20 09:02:52 UTC

Description Martin Schuppert 2019-03-20 09:02:52 UTC
+++ This bug was initially created as a clone of Bug #1685506 +++

Description of problem:
Controller-replacement fails (controller-removal) because : /var/log/containers/nova/nova-manage.log is owned by root:root

Version-Release number of selected component (if applicable):
OSP14 2019-02-27.1

How reproducible:
always

Steps to Reproduce:

Via automaion: 
run : https://rhos-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/view/DFG/view/network/view/octavia/job/DFG-network-octavia-14_director-rhel-virthost-3cont_2comp-ipv4-vxlan-controller_replacement-normal/

Manually:
1.deploy an HA OSP14
2.try to remove one controller : 
rerun the overcloud_deploy.sh with an added : 
-e /home/stack/remove-controller.yaml \

cat /home/stack/remove-controller.yaml
parameters:
  ControllerRemovalPolicies:
    [{'resource_list': ['0']}]


Actual results:
Overcloud controller removal fails with : 
http://pastebin.test.redhat.com/731244
Controller-1 nova_api container fails to start because :

  "IOError: [Errno 13] Permission denied: '/var/log/nova/nova-manage.log'", 
(in deployment log file)

Expected results:
Controller removal succeeds, finishes without errors, and all overcloud
agents are up and operational.

--- Additional comment from  on 2019-03-05 11:05:18 UTC ---

sos reports and stack home are at : 
http://rhos-release.virt.bos.redhat.com/log/pkomarov_sosreports/BZ1685506/

--- Additional comment from  on 2019-03-05 11:07:24 UTC ---

As can be seen below the rest of nova's containers logs are owned by Kolla : userid=>42436 (as it should)
but nova-manage.log is owned by root:

[root@controller-1 ~]# ls -l /var/log/containers/nova
total 33072
-rw-r--r--. 1 42436 42436  6120842 Mar  5 10:41 nova-api.log
-rw-r--r--. 1 42436 42436 10828856 Mar  5 09:00 nova-api.log.1
[...]
-rw-r--r--. 1 root  root         0 Mar  4 17:23 nova-manage.log
-rw-r--r--. 1 42436 42436        0 Mar  5 00:01 nova-metadata-api.log
-rw-r--r--. 1 42436 42436   761548 Mar  5 00:01 nova-metadata-api.log.1


[stack@undercloud-0 ~]$ ansible controller-1 -mshell -b -a'ls -l /var/log/containers/nova|grep manage'

controller-1 | SUCCESS | rc=0 >>
-rw-r--r--. 1 root  root         0 Mar  4 17:23 nova-manage.log

[stack@undercloud-0 ~]$ ansible controller-2 -mshell -b -a'ls -l /var/log/containers/nova|grep manage'

controller-2 | SUCCESS | rc=0 >>
-rw-r--r--. 1 42436 42436        0 Mar  5 00:01 nova-manage.log

--- Additional comment from Artem Hrechanychenko on 2019-03-06 15:00:15 UTC ---

Hello Pini,
reproduced on my env too - https://rhos-ci-staging-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/DFG-df-controller_replacement-14-virthost-3cont_3comp_3ceph-yes_UC_SSL-yes_OC_SSL-ceph-ipv4-vxlan-replace_controller-RHELOSP-31864/

OSP14 puddle - 2019-02-27.1

--- Additional comment from Martin Schuppert on 2019-03-15 11:01:31 UTC ---

As the sosreports miss system logs, I tried to reproduce the issue with 2019-02-27.1 , but don't see the wrong permission on the nova-manage log

After deploy:

The only nova-manage log on controller-0:

[root@controller-0 ~]#  ls -la /var/log/containers/nova/ |grep manage
-rw-r--r--.  1 42436 42436        0 Mar 15 00:00 nova-manage.log
-rw-r--r--.  1 42436 42436   274848 Mar 15 00:00 nova-manage.log.1

After replacement:

(undercloud) [stack@undercloud-0 ~]$ nova list
+--------------------------------------+--------------+--------+------------+-------------+------------------------+
| ID                                   | Name         | Status | Task State | Power State | Networks               |
+--------------------------------------+--------------+--------+------------+-------------+------------------------+
| 0ab774a5-233f-46e9-a429-0949b969d6db | compute-0    | ACTIVE | -          | Running     | ctlplane=192.168.24.10 |
| 51344592-2b97-455f-a936-b7826eae7b30 | controller-0 | ACTIVE | -          | Running     | ctlplane=192.168.24.8  |
| 0c5fe3a4-8a66-4f45-b85b-0855b02f277f | controller-2 | ACTIVE | -          | Running     | ctlplane=192.168.24.21 |
| 4a046882-093e-4e00-8bed-46fcf6e72603 | controller-3 | ACTIVE | -          | Running     | ctlplane=192.168.24.18 |
+--------------------------------------+--------------+--------+------------+-------------+------------------------+

[root@controller-0 ~]# ls -la /var/log/containers/nova/ |grep manage
-rw-r--r--.  1 42436 42436        0 Mar 15 00:00 nova-manage.log
-rw-r--r--.  1 42436 42436   274848 Mar 15 00:00 nova-manage.log.1

Does that job run any nova-manage commands as root outside the tripleo workflow? If initially one got triggered on Controller-1 as root the nova-manage log gets created as from the description and then the reported issue can happen.

In any case we'll submit a patch to chown the logs in case something get triggered as root manually.


Note You need to log in before you can comment on or make changes to this bug.