Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1692608 - restart nova_virtlogd makes console logs cannot be updated [NEEDINFO]
Summary: restart nova_virtlogd makes console logs cannot be updated
Keywords:
Status: NEW
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 13.0 (Queens)
Hardware: Unspecified
OS: All
high
high
Target Milestone: ---
: ---
Assignee: nova-maint
QA Contact: nova-maint
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-26 02:52 UTC by Meiyan Zheng
Modified: 2019-04-12 01:45 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
sbaker: needinfo? (beagles)


Attachments (Terms of Use)

Description Meiyan Zheng 2019-03-26 02:52:15 UTC
Description of problem:
After restarting nova_virtlogd on compute node instance is running, 
the instance on this compute node cannot write logs to console.log file anymore. 


Version-Release number of selected component (if applicable):
container image version: openstack-nova-libvirt:13.0-79.1548959794

How reproducible:

Steps to Reproduce:
1. Restart nova_virtlogd with "docker restart nova_virtlogd"
2. Reboot instance running on that compute node with running command "reboot" in instance
3. Monitor /var/lib/nova/instances/<uuid>/console.log 

Actual results:
No addition console logs in /var/lib/nova/instances/<uuid>/console.log 

Expected results:
Addition booting logs should be recorded in /var/lib/nova/instances/<uuid>/console.log 

Additional info:

Comment 2 Martin Schuppert 2019-03-29 11:47:24 UTC
[root@overcloud-novacompute-0 909ca1c3-a036-4b76-ab3c-2ebbb63290ac]# lsof -n 2>/dev/null |grep console 
virtlogd   26776               root   16w      REG              252,2      25156   61000435 /var/lib/nova/instances/909ca1c3-a036-4b76-ab3c-2ebbb63290ac/console.log

[root@overcloud-novacompute-0 909ca1c3-a036-4b76-ab3c-2ebbb63290ac]# docker restart nova_virtlogd
nova_virtlogd

[root@overcloud-novacompute-0 909ca1c3-a036-4b76-ab3c-2ebbb63290ac]# lsof -n 2>/dev/null |grep console 
-> log closed

* hard reboot
[root@overcloud-novacompute-0 909ca1c3-a036-4b76-ab3c-2ebbb63290ac]# ll
total 2344
-rw-------. 1 root  root    25156 Mar 27 13:34 console.log

(overcloud) [stack@undercloud ~]$ nova reboot --hard test1
Request to reboot server test1 (909ca1c3-a036-4b76-ab3c-2ebbb63290ac) has been accepted.

[root@overcloud-novacompute-0 909ca1c3-a036-4b76-ab3c-2ebbb63290ac]# ll
total 2400
-rw-------. 1 root  root    19000 Mar 29 09:04 console.log

[root@overcloud-novacompute-0 909ca1c3-a036-4b76-ab3c-2ebbb63290ac]# lsof -n 2>/dev/null |grep console 
virtlogd  414619               root   16w      REG              252,2         0   61000435 /var/lib/nova/instances/909ca1c3-a036-4b76-ab3c-2ebbb63290ac/console.log


* Same issue on non container env:
[root@localhost ~(keystone_admin)]# lsof -n |grep console.log
virtlogd  15270              root   16w      REG              253,1     17012   75503680 /var/lib/nova/instances/6ffb1c72-2dbb-4601-a71a-f48561418bd4/console.log
[root@localhost ~(keystone_admin)]# nova list
+--------------------------------------+------+--------+------------+-------------+-------------------+
| ID                                   | Name | Status | Task State | Power State | Networks          |
+--------------------------------------+------+--------+------------+-------------+-------------------+
| 6ffb1c72-2dbb-4601-a71a-f48561418bd4 | test | ACTIVE | -          | Running     | private=10.0.0.16 |
+--------------------------------------+------+--------+------------+-------------+-------------------+
[root@localhost ~(keystone_admin)]# systemctl restart virtlogd
[root@localhost ~(keystone_admin)]# lsof -n |grep console.log

To reopen the log a full qemu process stop/start is required, can be achieved either by hard reboot, or instance stop/start.

Comment 3 Martin Schuppert 2019-03-29 11:57:36 UTC
The behavior is expected when you restart virtlogd in general. If there are changed to virtlogd on a life system, a signal can be send to the process to keep the current logs open instead of a full restart:

~~~
On receipt of SIGUSR1 virtlogd will re-exec() its binary, while maintaining all current logs and clients. This allows for live upgrades of the virtlogd service.
~~~

What is the use case? Updating a compute node where we get a new container version with virtlogd? If yes, then the instances should be migrated off the compute before upgrading the compute.

Comment 4 Martin Schuppert 2019-03-29 16:02:55 UTC
Dan,

from virtlogd/libvirtd site, is there anything which can be done to reopen the console logs when virtlogd gets restarted? For non container envs you could send the signal as mentioned in my last update, but when updating containers this won't work.

Thanks!

Comment 5 Daniel Berrange 2019-03-29 16:12:51 UTC
No, this is explicitly *NOT* supported. The virtlogd daemon must *never* be stopped while there are running VMs. It is explicitly split out into a separate daemon from libvirtd so that it can upgrade its own software while keeping FDs open by re-exec'ing itself. If running virtlogd in a container this container must never be restarted while VMs are running.

AFAIK this shouldn't be a problem on OSP in general, as the recommended software upgrade process involves live migrating all VMs off to a new host, before upgrading any of the containers / software on the original host.

Comment 8 Steve Baker 2019-04-12 01:45:39 UTC
To me it sounds like virtlogd should not be managed by paunch at all, and it should be treated in a similar way to the neutron l3[1] and dhcp[2] agents.

I'll set a NEEDINFO for beagles to provide an opinion on whether the wrapper approach is appropriate for virtlogd, and how it might be done.

[1] https://github.com/openstack/puppet-tripleo/blob/master/manifests/profile/base/neutron/l3_agent_wrappers.pp
    https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/neutron/neutron-l3-container-puppet.yaml#L172-L186
[2] https://github.com/openstack/puppet-tripleo/blob/master/manifests/profile/base/neutron/dhcp_agent_wrappers.pp
    https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/neutron/neutron-dhcp-container-puppet.yaml


Note You need to log in before you can comment on or make changes to this bug.