Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.

Bug 1366807

Summary: [RFE] ceph-ansible: remove MON and OSD nodes
Product: Red Hat Ceph Storage Reporter: Federico Lucifredi <flucifre>
Component: Ceph-AnsibleAssignee: seb
Status: CLOSED ERRATA QA Contact: Vasishta <vashastr>
Severity: urgent Docs Contact: Bara Ancincova <bancinco>
Priority: urgent    
Version: 3.0CC: adeza, anharris, aschoen, ceph-eng-bugs, flucifre, hnallurv, kdreyer, nlevine, nthomas, racpatel, sankarshan, seb, shan, vashastr
Target Milestone: rcKeywords: FutureFeature
Target Release: 3.0   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: RHEL: ceph-ansible-3.0.0-0.1.rc6.el7cp Ubuntu: ceph-ansible_3.0.0~rc6-2redhat1 Doc Type: Enhancement
Doc Text:
.Ansible now supports removing Monitors and OSDs You can use the `ceph-ansible` utility to remove Monitors and OSDs from a Ceph cluster. For details, see the link:[Removing Monitors with Ansible] and link:[Removing OSDs with Ansible] sections in the Red Hat Ceph Storage 3 Administration Guide. The same procedures apply also for removing Monitors and OSDs from a containerized Ceph cluster.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-12-05 23:31:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1322504, 1383917, 1412948, 1494421    
Description Flags
File contains contents ansible-playbook log, conf file after removing a monitor
File contains contents ansible-playbook log and conf file from different nodes none

Description Federico Lucifredi 2016-08-12 22:29:59 UTC
Description of problem:

The current version of ceph-ansible does not support removal of MON and OSD nodes. 

This is a regression against ceph-deploy functionality.

Shrinking a cluster is not supported by Console, but we need to provide a way to remove nodes from the cluster at least on the CLI.


Sébastien is implementing code to accomplish this in ceph-ansible, the latest upstream version will be able to perform this functionality and we should package it as an async.

Comment 2 Federico Lucifredi 2016-08-12 22:31:46 UTC
This should be targeted at the first Async — but I only see targets 2 and 3....

Comment 5 seb 2016-10-07 08:38:59 UTC
Yup fixed in v1.0.8

Comment 6 Federico Lucifredi 2016-10-07 16:58:56 UTC
This will ship concurrently with RHCS 2.1.

Comment 7 Ken Dreyer (Red Hat) 2017-03-03 16:29:23 UTC
What automated tests cover this feature as implemented today?

From discussion with Andrew, it sounds like the current implementation requires the admin to run Ansible run *on* the Ceph cluster nodes? (runs local commands?) If so, we need to change that.

Comment 9 Ken Dreyer (Red Hat) 2017-03-03 16:50:03 UTC
*** Bug 1335569 has been marked as a duplicate of this bug. ***

Comment 13 Drew Harris 2017-06-29 14:01:26 UTC
*** Bug 1414092 has been marked as a duplicate of this bug. ***

Comment 16 Vasishta 2017-09-11 08:35:08 UTC
Created attachment 1324368 [details]
File contains contents ansible-playbook log, conf file after removing a monitor

Hi all,

I worked on shrinking MON from the cluster. playbook run was successful, but 
1) Monitor was still in the cluster though "verify the monitor is out of the cluster" completed without any errors  
2) Configuration file still had entry of removed monitor.

By referring steps mentioned in Admin Doc to remove a monitor from the cluster, I expect ansible need to remove the mon from the cluster and modify, re-distribute the config file to increase the usability of the feature.

I'm moving the BZ back to ASSIGNED state, please let me know if my expectation is not appropriate. I've attached a file containing ansible-log and conf file after removing a mon.

(Terminal log after removing a MON from node magna051)

# sudo ceph -s --cluster 12_3a
    health: HEALTH_WARN
            1/3 mons down, quorum magna033,magna040
    mon: 3 daemons, quorum magna033,magna040, out of quorum: magna051
$ sudo ceph mon stat --cluster 12_3a
e2: 3 mons at {magna033=,magna040=,magna051=}, election epoch 12, leader 0 magna033, quorum 0,1 magna033,magna040

$ sudo ceph mon remove magna051 --cluster 12_3a
removing mon.magna051 at, there will be 2 monitors

$ sudo ceph mon stat --cluster 12_3a
e3: 2 mons at {magna033=,magna040=}, election epoch 14, leader 0 magna033, quorum 0,1 magna033,magna040


Comment 17 seb 2017-09-12 17:46:17 UTC
It's weird, can you retry and run ansible in debug mode? with -vvvv please?
I need to make sure the command was issued properly.


Comment 18 seb 2017-09-13 22:24:16 UTC
FYI I haven't been able to reproduce.

Comment 19 Vasishta 2017-09-14 04:26:51 UTC
Created attachment 1325704 [details]
File contains contents ansible-playbook log and conf file from different nodes

Hi Sebastien,

This time it worked partially. Mon was removed from the cluster as expected but conf file in rest of the cluster were not updated.

I've copied those conf files and ansible log with verbose enabled. Can you please check this once ?

Comment 20 seb 2017-09-14 05:38:18 UTC
This is expected that the user will update the ceph.conf. It's difficult for us to do the update and re-distribute because this means modifying their inventory.

Modifying the inventory is not possible, even if we override it, the next-ansible run will override it again.

Comment 21 seb 2017-09-15 13:13:24 UTC
Since you've been able to make it work eventually I'm moving this back to POST.
Also as described in my earlier comment, I don't think we can do much more than what we currently do.


Comment 22 Ken Dreyer (Red Hat) 2017-09-18 14:48:19 UTC
Vasishta is this still an issue in rc7?

Comment 26 Federico Lucifredi 2017-10-13 20:27:53 UTC
It is acceptable, and yes, let's please add this step to the docs.

A prompt indicating ceph.conf needs to be updated may also be in order (Seb's call).

Comment 27 leseb 2017-10-16 07:59:10 UTC
At the end of the play, we prompt the user with a message saying:

"The monitor has been successfully removed from the cluster.
 Please remove the monitor entry from the rest of your ceph configuration files, cluster wide."

Comment 28 Vasishta 2017-10-16 10:52:28 UTC
Hi Ken,

Can you please move this BZ to ON_QA ?


Comment 30 Vasishta 2017-10-17 11:08:42 UTC
Tried with ceph-ansible-3.0.2-1.el7cp.noarch, and observed that a message being displayed asking user to remove the monitor entry from the rest of your ceph configuration files, cluster wide.

Looks good to me, moving to VERIFIED state.

Comment 34 errata-xmlrpc 2017-12-05 23:31:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.