Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1367525 - Instance HA - invalid nova host name, myst be <short> for instance recovery to function
Summary: Instance HA - invalid nova host name, myst be <short> for instance recovery t...
Keywords:
Status: CLOSED DUPLICATE of bug 1380314
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: resource-agents
Version: 7.2
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: pre-dev-freeze
: ---
Assignee: Andrew Beekhof
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-16 16:03 UTC by Andreas Karis
Modified: 2016-10-06 11:42 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-10-05 09:34:07 UTC


Attachments (Terms of Use)

Description Andreas Karis 2016-08-16 16:03:56 UTC
Description of problem:
The nova-compute-wait can not start with the option domain=localdomain with "Invalid Nova host name, must be XXXX in order for instance recovery to function"


Version-Release number of selected component (if applicable):
OSP 8.0

How reproducible:
In an environment where nova.conf records the FQDN as parameter `host=`, which leads NOVA_HOST != uname -n | awk -F. '{print $1}', causing the nova-compute-wait failed.

Steps to Reproduce:
1. Configure `host=` in nova.conf with the FQDN
2. configure instance HA according to https://access.redhat.com/articles/1544823#comment-1045511
3. (see private comment by Chen Chen from May 7th)

Actual results:


Expected results:


Additional info:
~~~
NOVA_HOST=$(openstack-config --get /etc/nova/nova.conf DEFAULT host 2>/dev/null)
if [ "x${OCF_RESKEY_domain}" != x ]; then
            short_host=$(uname -n | awk -F. '{print $1}')
            if [ "x$NOVA_HOST" != "x${short_host}" ]; then
                ocf_exit_reason "Invalid Nova host name, must be ${short_host} in order for instance recovery to function"
                rc=$OCF_ERR_CONFIGURED
            fi
~~~

Comment 2 Andreas Karis 2016-08-16 16:07:50 UTC
~~
# pcs resource show nova-compute-checkevacuate
 Resource: nova-compute-checkevacuate (class=ocf provider=openstack type=nova-compute-wait)
  Attributes: auth_url=https://xyz.domainname.com:5000/v2.0 username=admin password=XXXXXXXXXXX tenant_name=admin domain=abc.domainname.com
  Operations: stop interval=0s timeout=300 (nova-compute-checkevacuate-stop-interval-0s)
              monitor interval=10 timeout=20 (nova-compute-checkevacuate-monitor-interval-10)
              start interval=0s timeout=300 (nova-compute-checkevacuate-start-interval-0s)
~~~
so domain is correctly configured in the above.

Compare that to the description of the resource
~~~
[root@overcloud-controller-0 lib]# pcs resource describe ocf:openstack:nova-compute-wait
ocf:openstack:nova-compute-wait - OpenStack Nova Compute Server

OpenStack Nova Compute Server.

Resource options:
  auth_url (required): Authorization URL for connecting to keystone in admin context
  username (required): Username for connecting to keystone in admin context
  password (required): Password for connecting to keystone in admin context
  tenant_name (required): Tenant name for connecting to keystone in admin context. Note that with Keystone V3 tenant names are only unique within a domain.
  domain: DNS domain in which hosts live, useful when the cluster uses short names and nova uses FQDN
  endpoint_type: Nova API location (internal, public or admin URL)
  no_shared_storage: Disable shared storage recovery for instances. Use at your own risk!
  evacuation_delay: How long to wait for nova to finish evacuating instances elsewhere before starting nova-compute. Only used when the agent detects evacuations might be in progress. You may need to increase the start timeout when
                    increasing this value.
[root@overcloud-controller-0 lib]# 
~~~

Compare that to the code
~~~
    # we take a chance here and hope that host is either not configured
    # or configured in nova.conf

    NOVA_HOST=$(openstack-config --get /etc/nova/nova.conf DEFAULT host 2>/dev/null)
    if [ $? = 1 ]; then
(... we don't care, this won't be executed ...)
    fi

# We only need to check a configured value, calculated ones are fine
    openstack-config --get /etc/nova/nova.conf DEFAULT host 2>/dev/null
    if [ $? = 0 ]; then
        if [ "x${OCF_RESKEY_domain}" != x ]; then        
            short_host=$(uname -n | awk -F. '{print $1}')
            if [ "x$NOVA_HOST" != "x${short_host}" ]; then
                ocf_exit_reason "Invalid Nova host name, must be ${short_host} in order for instance recovery to function"
                rc=$OCF_ERR_CONFIGURED
            fi

        elif [ "x$NOVA_HOST" != "x$(uname -n)" ]; then
            ocf_exit_reason "Invalid Nova host name, must be $(uname -n) in order for instance recovery to function"
            rc=$OCF_ERR_CONFIGURED
        fi
    fi

~~~

First of all, the above is a bit ugly. What about if / else?
~~~
  NOVA_HOST=$(crudini --get /etc/nova/nova.conf DEFAULT host 2>/dev/null)
    if [ $? = 1 ]; then
(...)
   fi

    # We only need to check a configured value, calculated ones are fine
    crudini --get /etc/nova/nova.conf DEFAULT host 2>/dev/null
    if [ $? = 0 ]; then
(...)
~~~

Could simply be written as 
~~~
NOVA_HOST=$(crudini --get /etc/nova/nova.conf DEFAULT host 2>/dev/null)
if [ $? = 1 ]; then
(...)
else
(...)
~~~
Which means that the code would be way more readable.

And then,  the following verification isn't very logical? if "x<domainname> != x" then we execute the following? this can't be right! we want to compare our NOVA_HOST against the full uname -n, no? Because domain name is set, so we _know_ that NOVA_HOST will contain an FQDN)
~~~
 if [ "x${OCF_RESKEY_domain}" != x ]; then        
            short_host=$(uname -n | awk -F. '{print $1}')
            if [ "x$NOVA_HOST" != "x${short_host}" ]; then
~~~
We know that we use a domain name, because we configured it, and we know that we are using the value in NOVA_HOST, which hence likely will contain the same domain name. So this verification is either too much or needs to be modified? 

If we want to keep everything "short", then let's strip the domain name from NOVA_HOST with sed kind of like this:
elif [ `echo "x$NOVA_HOST" | sed -e "s/\.${OCF_RESKEY_domain}$//"`  != "x$(uname -n)" ]; then
~~~

I'll try to look into this and provide a patch.

Comment 3 Sven Anderson 2016-08-19 14:10:33 UTC
@akaris: Is this still an issue, and is it in fact nova related? Because the Customer Portal ticket has been closed.

Comment 6 Sadique Puthen 2016-10-04 11:18:06 UTC
There are more customers adopting instance HA and it's broken in osp8 and above due to this bug. Releasing a fix for this should be taken on high priority.

Comment 8 Andreas Karis 2016-10-04 22:10:03 UTC
This is in /usr/lib/ocf/resource.d/openstack/nova-compute-wait

Comment 10 Fabio Massimo Di Nitto 2016-10-05 09:34:07 UTC

*** This bug has been marked as a duplicate of bug 1380314 ***


Note You need to log in before you can comment on or make changes to this bug.