Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1364427 - ceph osd tree shows incorrect status after stopping osd processes
Summary: ceph osd tree shows incorrect status after stopping osd processes
Keywords:
Status: CLOSED DUPLICATE
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RADOS
Version: 2.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 2.0
Assignee: Samuel Just
QA Contact: ceph-qe-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-05 10:28 UTC by shilpa
Modified: 2017-07-30 15:12 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-08-05 14:29:28 UTC


Attachments (Terms of Use)

Description shilpa 2016-08-05 10:28:14 UTC
Description of problem:
Stopped the osd services on all the nodes. The last two osd services that I attempted to stop, still shows up even though the service is down and the process is not running.

Version-Release number of selected component (if applicable):
ceph-base-10.2.2-32.el7cp.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Stop the osd services on each node. 
2. Check if the osd process has stopped. Once confirmed check "ceph osd tree" status to see if all the osd's are marked down.

Actual results:

min down reporters is 2. 

# ceph --admin-daemon `pwd`/ceph-mon.magna062.asok config show  | grep mon_osd_min_down_reporters 
"mon_osd_min_down_reporters": "2"

# ceph osd tree
ID WEIGHT  TYPE NAME         UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-1 8.09541 root default                                        
-2 2.69847     host magna113                                   
 0 0.89949         osd.0          up  1.00000          1.00000 
 2 0.89949         osd.2        down        0          1.00000 
 5 0.89949         osd.5          up  1.00000          1.00000 
-3 2.69847     host magna096                                   
 1 0.89949         osd.1        down        0          1.00000 
 3 0.89949         osd.3        down        0          1.00000 
 4 0.89949         osd.4        down        0          1.00000 
-4 2.69847     host magna065                                   
 6 0.89949         osd.6        down        0          1.00000 
 7 0.89949         osd.7        down        0          1.00000 
 8 0.89949         osd.8        down        0          1.00000 



2016-08-05 10:03:27.357647 7f5a7ce14700  0 log_channel(cluster) log [INF] : osd.2 out (down for 302.771209)
2016-08-05 10:03:27.422622 7f5a7e510700  1 mon.magna062@0(leader).osd e163 e163: 9 osds: 2 up, 2 in
2016-08-05 10:03:27.464328 7f5a7e510700  0 log_channel(cluster) log [INF] : osdmap e163: 9 osds: 2 up, 2 in
2016-08-05 10:03:27.498321 7f5a7e510700  0 log_channel(cluster) log [INF] : pgmap v1125658: 200 pgs: 1 stale+undersized+degraded+inconsistent+p
eered, 82 stale+undersized+degraded+peered, 117 undersized+degraded+peered; 169 GB data, 127 GB used, 1714 GB / 1842 GB avail; 116294/174441 ob
jects degraded (66.667%)

Comment 4 Samuel Just 2016-08-05 14:29:28 UTC
This is not a blocker.  OSDs do failure detection, once none are alive, none are around to mark the remaining ones down.  This isn't really important since no IO is happening anyway.  This is also a dup.

*** This bug has been marked as a duplicate of bug 1358928 ***


Note You need to log in before you can comment on or make changes to this bug.