Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1600472 - Possible issue in services starting order after boot
Summary: Possible issue in services starting order after boot
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-nova
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 10.0 (Newton)
Assignee: nova-maint
QA Contact: nova-maint
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-12 11:00 UTC by ojanas
Modified: 2019-04-11 13:17 UTC (History)
21 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)

Comment 2 Alan Bishop 2018-07-12 12:49:00 UTC
I looked at the cinder logs available in the customer case, and things look OK from cinder's perspective. If there is a startup sequence issue on the compute node then this needs to be looked at by that team (cinder services do not run on compute nodes).

Comment 5 Artom Lifshitz 2018-07-24 17:32:57 UTC
Going to figure out myself how we can get Dell/EMC to look at this.

Comment 6 Dave Cain 2018-07-27 19:33:23 UTC
Adding Rajini Karthik from Dell EMC for help.

Comment 8 Artom Lifshitz 2018-07-27 19:37:17 UTC
Thanks Dave!

Hi Rajini,

So, we think that this would be best solved by adding a pre-dependency on nova-compute to the scini pacakging, to force it to be started before nova-compute starts. Would that make sense to you as well?

Cheers!

Comment 9 Rajini Karthik 2018-07-27 20:05:20 UTC
Is this related to the bug we reported earlier?
https://bugzilla.redhat.com/show_bug.cgi?id=1600641

Comment 10 Rajini Karthik 2018-07-27 20:05:35 UTC
Is this related to the bug we reported earlier?
https://bugzilla.redhat.com/show_bug.cgi?id=1600641

Comment 11 Artom Lifshitz 2018-07-27 20:11:07 UTC
I don't think they're related. This bug was raised to us by a customer - and writing that, I'm realising that you might not be seeing the original description because it's private. I'll reproduce the bug description here:

Description of problem:

There is an issue with some instances which are using scaleIO driver as a cinder backend. They do not come up after boot.

We suspect this could be because the cinder backend driver scaleIO.

The appropriate driver scinia takes probably too much time to be activated / loaded:

Jul 10 19:09:52 cpt0-dpdk-rmctl scini: scinia is not ready yet...    <<<<====
Jul 10 19:09:53 cpt0-dpdk-rmctl scini: scinia is not ready yet...
Jul 10 19:09:54 cpt0-dpdk-rmctl scini: scinia is not ready yet...
Jul 10 19:09:55 cpt0-dpdk-rmctl scini: scinia is not ready yet...
Jul 10 19:09:56 cpt0-dpdk-rmctl scini: scinia is not ready yet...
Jul 10 19:09:57 cpt0-dpdk-rmctl scini: scinia is not ready yet...
Jul 10 19:09:58 cpt0-dpdk-rmctl scini: scinia is not ready yet...
Jul 10 19:09:59 cpt0-dpdk-rmctl scini: scinia is not ready yet...
Jul 10 19:10:00 cpt0-dpdk-rmctl scini: scinia is not ready yet...
Jul 10 19:10:01 cpt0-dpdk-rmctl scini: scinia is not ready yet...
Jul 10 19:10:01 cpt0-dpdk-rmctl scini: Success configuring module    <<<<==== 9 seconds

which possibly results in:
 (nova compute log)

~~~
2018-07-11 13:17:55.708 3140 DEBUG os_brick.utils [req-b0ddaa85-0837-4d18-ba25-94dbc45b1644 - - - - -] Failed attempt 15 _print_stop /usr/lib/python2.7/site-packages/os_brick/utils.py:45
2018-07-11 13:17:55.708 3140 DEBUG os_brick.utils [req-b0ddaa85-0837-4d18-ba25-94dbc45b1644 - - - - -] Have been at this for 14.017 seconds _print_stop /usr/lib/python2.7/site-packages/os_brick/utils.py:47
2018-07-11 13:17:55.708 3140 DEBUG oslo_concurrency.lockutils [req-b0ddaa85-0837-4d18-ba25-94dbc45b1644 - - - - -] Lock "scaleio" released by "os_brick.initiator.connectors.scaleio.connect_volume" :: held 14.649s inner /usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py:282
2018-07-11 13:17:55.708 3140 DEBUG os_brick.initiator.connectors.scaleio [req-b0ddaa85-0837-4d18-ba25-94dbc45b1644 - - - - -] <== connect_volume: exception (14649ms) BrickException(u'ScaleIO volume 85bddbfe000000a1 not found.',) trace_logging_wrapper /usr/lib/python2.7/site-packages/os_brick/utils.py:151
2018-07-11 13:17:55.708 3140 WARNING nova.compute.manager [req-b0ddaa85-0837-4d18-ba25-94dbc45b1644 - - - - -] [instance: 99914920-f023-4bcf-8100-96fc787d2d84] Failed to resume instance
~~~

If the instance is stopped / started when the hypervisor is already booted up & running, the restart process is without issues.
----

Is it possible to somehow specify this dependency into nova-compute systemd service?

The hiccup here could be, that the scini is not managed via systemd.

Is there anything we can do about this?

Comment 12 Rajini Karthik 2018-07-30 14:21:33 UTC
We couldn’t find issues from the logs that point why it takes time for the sio driver to load. 
Could we start with the suggestion from the previous emails to set dependency that the Nova will start after the scini driver started.

Comment 13 Artom Lifshitz 2018-07-30 14:26:40 UTC
> Could we start with the suggestion from the previous emails to set
> dependency that the Nova will start after the scini driver started.

Well, I would agree, but that'd be setting a precedent that Nova doesn't really want to be setting, namely, depending (in the systemd service init sense) on every possible backend/driver that needs to be started before nova-compute, even if nova-compute the package doesn't depend (in the package dependency sense) on those packages.

Would it be really difficult to add a Before rule to scini's systemd unit files (see [1], 'Unit order') to make it fully start before nova-compute is started?

[1] https://fedoramagazine.org/systemd-unit-dependencies-and-order/

Comment 14 Daniel Berrange 2018-07-30 14:29:40 UTC
Note that even if nova's default unit file doesn't include the dependency, it is possible to augment the standard rules by creating files in /etc/systemd/system/nova-compute.service.d/XXXX.conf. Everything in this $UNITFILE.d directory would be treated as if it were part of the main $UNITFILE. IOW, there's no need to modify scini's unit file either. See

https://www.freedesktop.org/software/systemd/man/systemd.unit.html

Heading "Example 2. Overriding vendor settings"

Comment 15 Artom Lifshitz 2018-07-30 14:34:09 UTC
(In reply to Daniel Berrange from comment #14)
> Note that even if nova's default unit file doesn't include the dependency,
> it is possible to augment the standard rules by creating files in
> /etc/systemd/system/nova-compute.service.d/XXXX.conf. Everything in this
> $UNITFILE.d directory would be treated as if it were part of the main
> $UNITFILE. IOW, there's no need to modify scini's unit file either. See
> 
> https://www.freedesktop.org/software/systemd/man/systemd.unit.html
> 
> Heading "Example 2. Overriding vendor settings"

Maybe I'm misunderstanding, but wouldn't this have to be repeated on every system where this bug manifests itself as a local fix/workaround? In other words, it's not a permanent systematic fix. To get the latter, one of scini's or nova's unit files would have to be modified.

Comment 16 Daniel Berrange 2018-07-30 14:36:44 UTC
I was thinking about it from the POV of your point that we don't wnt to hardcode the dependency because its not applicable to all deployment scenarios. It is the kind of thing that something like OSP-Director could dynamically create on host which need the dependency, rather than having to modify nova's default unit file.

Comment 18 Alan Bishop 2018-08-31 18:09:34 UTC
If I'm reading [1] correctly, then this is what I imagine.

[1] https://www.freedesktop.org/software/systemd/man/systemd.unit.html

Leave /usr/lib/systemd/system/openstack-nova-compute.service alone. Instead, have the scini package install this file:

/etc/systemd/system/openstack-nova-compute.service.d/scini.conf

With contents:

  [Unit]
  After=scini.service

systemd will merge this into the openstack-nova-compute.service settings. That way neither package touches the other's stuff, so upgrades shouldn't be an issue.


Note You need to log in before you can comment on or make changes to this bug.