Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1360972 - [RHCS 2.0 container]: Rebooting OSD node doesn't respawn osd containers
Summary: [RHCS 2.0 container]: Rebooting OSD node doesn't respawn osd containers
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: Build
Version: 2.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: rc
: 2.3
Assignee: Daniel Gryniewicz
QA Contact: Rachana Patel
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-07-28 05:47 UTC by krishnaram Karthick
Modified: 2017-06-19 13:22 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-06-19 13:22:26 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:1498 normal SHIPPED_LIVE updated rhceph-2-rhel7 container image 2017-06-19 17:22:05 UTC

Description krishnaram Karthick 2016-07-28 05:47:04 UTC
Description of problem:
After deploying and configuring RHCS2.0 containers with 1 mon and 3 OSD nodes, rebooting OSD nodes doesn't respawn the OSD disk containers automatically. As a result, the containers hosted within the nodes go unusable.

Version-Release number of selected component (if applicable):

docker images 
REPOSITORY                        TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
docker.io/hchen/rhceph2           latest              fc7e7ea69710        4 weeks ago         671.4 MB

rpm -qa | grep 'atomic'
redhat-release-atomic-host-7.2-20150928.0.atomic.el7.2.x86_64
ostree-2015.9-2.atomic.el7.x86_64
libgsystem-2015.1-1.atomic.el7.x86_64
rpm-ostree-client-2015.9-2.atomic.el7.1.x86_64
atomic-1.6-6.gitca1e384.el7.x86_64
glusterfs-fuse-3.7.1-17.atomic.1.el7.x86_64
nss-altfiles-0-2.atomic.git20131217gite2a80593.el7.x86_64
glusterfs-client-xlators-3.7.1-17.atomic.1.el7.x86_64
stub-redhat-lsb-core-only-for-ceph-2015.1-1.atomic.el7.noarch
tuned-profiles-atomic-2.5.1-4.el7.noarch
ostree-grub2-2015.9-2.atomic.el7.x86_64
glusterfs-libs-3.7.1-17.atomic.1.el7.x86_64
glusterfs-3.7.1-17.atomic.1.el7.x86_64

ceph --version
ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff

How reproducible:
Always

Steps to Reproduce:
1. Deploy RHCS 2.0 manually [1 mon and 3 osd nodes should be enough] using this guide - https://docs.google.com/document/d/1Ef5a_-Yjozy5Ue3C0M7mMQNn6zWZe0-514bhxKwFHI8/edit?ts=576a3d95#heading=h.f29ffgu8vo7g
2. create at least 4 osd containers within each osd nodes, so there are at least 12 osd containers
2. create a block device and run IOs to the block device
3. reboot one of the OSD nodes - No interruption to IO was seen
4. Wait for the rebooted node to turn up
5. check if the containers spawned in that node is started automatically 

Actual results:
containers aren't started

Expected results:
containers should be started automatically

Additional info:

Comment 3 Daniel Gryniewicz 2016-09-22 12:13:31 UTC
I do not believe that reboot handling is withing the scope of the manual instructions.  They are intended for people to try things out, not for deploying production servers.

Adding Jim to get his opinion.

Comment 5 Daniel Gryniewicz 2017-04-05 12:24:40 UTC
Aren't manual instructions being removed?

Comment 6 Gregory Meno 2017-04-06 15:56:13 UTC
They will be replaced with instructions that use ceph-ansible AND I would expect that this functionality will need to work in that context.

Andrew do we have any tests that cover this?
If not Dan would you be willing to create some tests like that?

Comment 7 Andrew Schoen 2017-04-06 16:06:37 UTC
> Andrew do we have any tests that cover this?

We have tests that test the deployment of containers, but not ones that restart the daemons and verify that they come back up.

However, ceph-ansible does use systemd to control the containers and it looks like we do enable those. https://github.com/ceph/ceph-ansible/blob/master/roles/ceph-osd/tasks/docker/start_docker_osd.yml#L88

Comment 8 Daniel Gryniewicz 2017-04-06 16:21:44 UTC
My understanding is that this issue doesn't apply to systemd runs.

Comment 9 Ken Dreyer (Red Hat) 2017-04-06 23:13:40 UTC
Andrew, Dan, what is the next step for this BZ? Do we need QE to re-test this with the latest ceph docker image?

Comment 10 Gregory Meno 2017-04-07 19:25:08 UTC
Yes let's dev ack and test that it works with systemd

Comment 14 Rachana Patel 2017-05-19 13:10:42 UTC
Verified with :ceph-2-rhel-7-docke-candidate-20170516014056

Comment 16 errata-xmlrpc 2017-06-19 13:22:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1498


Note You need to log in before you can comment on or make changes to this bug.