Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.

Bug 1364427

Summary: ceph osd tree shows incorrect status after stopping osd processes
Product: Red Hat Ceph Storage Reporter: shilpa <smanjara>
Component: RADOSAssignee: Samuel Just <sjust>
Status: CLOSED DUPLICATE QA Contact: ceph-qe-bugs <ceph-qe-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 2.0CC: ceph-eng-bugs, dzafman, kchai, mbenjamin, owasserm, sweil, tserlin
Target Milestone: rc   
Target Release: 2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-08-05 14:29:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description shilpa 2016-08-05 10:28:14 UTC
Description of problem:
Stopped the osd services on all the nodes. The last two osd services that I attempted to stop, still shows up even though the service is down and the process is not running.

Version-Release number of selected component (if applicable):
ceph-base-10.2.2-32.el7cp.x86_64

How reproducible:
Always

Steps to Reproduce:
1. Stop the osd services on each node. 
2. Check if the osd process has stopped. Once confirmed check "ceph osd tree" status to see if all the osd's are marked down.

Actual results:

min down reporters is 2. 

# ceph --admin-daemon `pwd`/ceph-mon.magna062.asok config show  | grep mon_osd_min_down_reporters 
"mon_osd_min_down_reporters": "2"

# ceph osd tree
ID WEIGHT  TYPE NAME         UP/DOWN REWEIGHT PRIMARY-AFFINITY 
-1 8.09541 root default                                        
-2 2.69847     host magna113                                   
 0 0.89949         osd.0          up  1.00000          1.00000 
 2 0.89949         osd.2        down        0          1.00000 
 5 0.89949         osd.5          up  1.00000          1.00000 
-3 2.69847     host magna096                                   
 1 0.89949         osd.1        down        0          1.00000 
 3 0.89949         osd.3        down        0          1.00000 
 4 0.89949         osd.4        down        0          1.00000 
-4 2.69847     host magna065                                   
 6 0.89949         osd.6        down        0          1.00000 
 7 0.89949         osd.7        down        0          1.00000 
 8 0.89949         osd.8        down        0          1.00000 



2016-08-05 10:03:27.357647 7f5a7ce14700  0 log_channel(cluster) log [INF] : osd.2 out (down for 302.771209)
2016-08-05 10:03:27.422622 7f5a7e510700  1 mon.magna062@0(leader).osd e163 e163: 9 osds: 2 up, 2 in
2016-08-05 10:03:27.464328 7f5a7e510700  0 log_channel(cluster) log [INF] : osdmap e163: 9 osds: 2 up, 2 in
2016-08-05 10:03:27.498321 7f5a7e510700  0 log_channel(cluster) log [INF] : pgmap v1125658: 200 pgs: 1 stale+undersized+degraded+inconsistent+p
eered, 82 stale+undersized+degraded+peered, 117 undersized+degraded+peered; 169 GB data, 127 GB used, 1714 GB / 1842 GB avail; 116294/174441 ob
jects degraded (66.667%)

Comment 4 Samuel Just 2016-08-05 14:29:28 UTC
This is not a blocker.  OSDs do failure detection, once none are alive, none are around to mark the remaining ones down.  This isn't really important since no IO is happening anyway.  This is also a dup.

*** This bug has been marked as a duplicate of bug 1358928 ***