Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1357292 - [ceph-ansible] : unable to add mon after upgrade from ceph 1.3 to ceph 2.0 as it generates different keyring [NEEDINFO]
Summary: [ceph-ansible] : unable to add mon after upgrade from ceph 1.3 to ceph 2.0 as...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: Ceph-Ansible
Version: 3.0
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: rc
: 3.0
Assignee: Andrew Schoen
QA Contact: ceph-qe-bugs
Bara Ancincova
URL:
Whiteboard:
: 1357291 (view as bug list)
Depends On:
Blocks: 1322504 1383917 1412948
TreeView+ depends on / blocked
 
Reported: 2016-07-17 21:26 UTC by Rachana Patel
Modified: 2017-06-29 13:52 UTC (History)
10 users (show)

Fixed In Version: ceph-ansible-1.0.5-30.el7scon
Doc Type: Known Issue
Doc Text:
.Ansible fails to add a monitor to an upgraded cluster An attempt to add a monitor to a cluster by using the Ansible automation application after upgrading the cluster from Red Hat Ceph Storage 1.3 to 2 fails on the following task: ---- TASK: [ceph-mon | collect admin and bootstrap keys] ---- This happens because the original monitor keyring was created with the `mds "allow"` capability while the newly added monitor requires a keyring with the `mds "allow *"` capability. To work around this issue, after installing the `ceph-mon` package, manually copy the administration keyring from an already existing monitor node to the new monitor node: ---- scp /etc/ceph/<cluster_name>.client.admin.keyring <target_host_name>:/etc/ceph ---- For example: ---- # scp /etc/ceph/ceph.client.admin.keyring node4:/etc/ceph ---- Then use Ansible to add the monitor as described in the https://access.redhat.com/documentation/en/red-hat-ceph-storage/2/single/administration-guide#adding_a_monitor_with_ansible[Adding a Monitor with Ansible] section of the https://access.redhat.com/documentation/en/red-hat-ceph-storage/2/single/administration-guide[Administration Guide] for Red Hat Ceph Storage 2.
Clone Of:
Environment:
Last Closed: 2017-06-29 13:52:50 UTC
kdreyer: needinfo? (racpatel)


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Ceph Project Bug Tracker 16255 None None None 2016-07-25 14:22:02 UTC

Description Rachana Patel 2016-07-17 21:26:46 UTC
Description of problem:
=======================
Upgrade cluster from ceph 1.3 to ceph 2.0 and then add mon using ceph-ansible. It was hung at task 
TASK: [ceph-mon | collect admin and bootstrap keys]


Version-Release number of selected component (if applicable):
=============================================================
ceph-ansible-1.0.5-27.el7scon.noarch
ceph-mon-10.2.2-16.el7cp.x86_64


How reproducible:
=================
always

Steps to Reproduce:
====================
1.follow Document - https://access.redhat.com/documentation/en/red-hat-ceph-storage/version-1.3/installation-guide-for-red-hat-enterprise-linux/
create ceph cluster with 3 MON, 3 OSD, 1 admin node/calamari and one RGW node

2. upgrade ceph 1.3 to ceph 2,0 - follow Document
https://access.qa.redhat.com/documentation/en/red-hat-ceph-storage/2/installation-guide-for-red-hat-enterprise-linux
(chown method)

3. After upgrade Install the ceph-ansible-1.0.5-27.el7scon version of ceph-ansible.ceph-ansible files should be installed at /usr/share/ceph-ansible

4. Copy the sample `group_vars/all.sample`` to ``group_vars/all``
`cp /usr/share/ceph-ansible/group_vars/all.sample /usr/share/ceph-ansible/group_vars/all`

5. Set `generate_fsid: false` in `group_vars/all`
Get your current cluster fsid with `ceph fsid` and set `fsid` accordingly in `group_vars/all`

6. Modify the ansible inventory at /etc/ansible/hosts to include your ceph hosts. Add monitors under a [mons] section, and OSDs under an [osds] section to identify their roles to Ansible.

7. from ansible node you should have passwordless ssh to all node in cluster

8. From the `/usr/share/ceph-ansible` directory run the playbook like so: `ansible-playbook take-over-existing-cluster.yml` (made changes to remove syntax error)

9. now add one more node in host file under mon section. Do all preflight operation on that node

10. modify group_vars/all and group_vars/osds to as mentioned in https://access.qa.redhat.com/documentation/en/red-hat-ceph-storage/2/installation-guide-for-ubuntu/#installing_ceph_ansible
(except fetch_directory and fsid - do not set fetch_directory and fsid is set on previous steps)

11. run ansible-playbook site.yml -i /etc/ansible/hosts


Actual results:
===============
Installation was hung at task
TASK: [ceph-mon | collect admin and bootstrap keys]


Expected results:
=================
It should install mon successfully.


Additional info:
================
1. on all MON nodes which were part of upgrade has same value for 
"/var/lib/ceph/mon/ceph-<ID>/keyring

while newly added MON has different value for that file. (ceph-ansible generates new keyring for that one)


2. ceph -s or mon_status never shows newly added MON as part of quorom.

3. once we overwrite that file with file from other MON. new MON becomes part of cluster and quorom

Comment 2 Ken Dreyer (Red Hat) 2016-07-18 13:33:48 UTC
*** Bug 1357291 has been marked as a duplicate of this bug. ***

Comment 3 Andrew Schoen 2016-07-18 15:13:53 UTC
PR opened upstream: https://github.com/ceph/ceph-ansible/pull/887

Comment 13 Gregory Meno 2016-07-22 19:37:08 UTC
Moving out of 2.0 because this only affects adding mons to an upgraded cluster and we have a set of steps to work around it.

Comment 18 Ken Dreyer (Red Hat) 2017-03-03 16:26:47 UTC
The upstream ticket at http://tracker.ceph.com/issues/16255 says the Ceph bug was fixed in Ceph v10.2.4. Would you please retest with the latest ceph-ansible and ceph packages?


Note You need to log in before you can comment on or make changes to this bug.