Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1597048 - ceph osd df not showing correct disk size and causing cluster to go to full state
Summary: ceph osd df not showing correct disk size and causing cluster to go to full s...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RADOS
Version: 3.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: rc
: 3.*
Assignee: Brad Hubbard
QA Contact: ceph-qe-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-01 21:37 UTC by Vikhyat Umrao
Modified: 2018-07-18 21:59 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-07-04 00:53:53 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Vikhyat Umrao 2018-07-01 21:37:56 UTC
Description of problem:
ceph osd df not showing correct disk size and causing the cluster to go to full state

[root@storage-004 ~]# df -h /var/lib/ceph/osd/ceph-0
Filesystem                           Size  Used Avail Use% Mounted on
/dev/nvme0n1p1                       3.7T  9.8G  3.7T   1% /var/lib/ceph/osd/ceph-0

[root@storage-004 ~]# ceph -s
  cluster:
    id:     03e3321d-071f-4b28-a3f9-0256f384bdca
    health: HEALTH_ERR
            full flag(s) set
            1 full osd(s)

  services:
    mon: 3 daemons, quorum storage-004,storage-005,storage-009
    mgr: storage-009(active), standbys: storage-005, storage-004
    osd: 102 osds: 96 up, 96 in; 103 remapped pgs
         flags full
    rgw: 2 daemons active


From ceph osd df:
=======================

  0   ssd 3.63199  1.00000 10240M   9467M  772M 92.45 20.78 131 <===
                              ^^
    
  5   ssd 3.63199  1.00000  3719G   1025G 2693G 27.57  6.20 419
 10   ssd 3.63199  1.00000  3719G   1220G 2498G 32.81  7.38 458
 16   ssd 3.63199  1.00000  3719G   1114G 2604G 29.98  6.74 428
 21   ssd 3.63199  1.00000  3719G   1004G 2714G 27.02  6.07 417

From ceph osd tree:
========================

 -9        18.15994     host storage-004
  0   ssd   3.63199         osd.0            up  1.00000 1.00000
  5   ssd   3.63199         osd.5            up  1.00000 1.00000
 10   ssd   3.63199         osd.10           up  1.00000 1.00000
 16   ssd   3.63199         osd.16           up  1.00000 1.00000
 21   ssd   3.63199         osd.21           up  1.00000 1.00000



Version-Release number of selected component (if applicable):
Red Hat Ceph Storage 3

How reproducible:
Always at the customer site.

Comment 2 Brad Hubbard 2018-07-02 00:25:44 UTC
Assuming this is filestore can we see the output of "stat -f /var/lib/ceph/osd/ceph-0" please?

Comment 8 Vikhyat Umrao 2018-07-18 21:55:36 UTC
Resolution - this disk was deployed as bluestore by mistake and in bluestore also it was not deployed properly.

[root@storage-004 ~]# cat /var/lib/ceph/osd/ceph-*/type
bluestore
filestore
filestore
filestore
filestore

So only OSD.0 was bluestore.

[root@storage-004 ~]# blockdev --getsize64 /dev/nvme0n1p1
3995417255424

root@storage-004 ceph-0]# ls -l
total 10421404
-rw-r--r--. 1 ceph ceph         447 Mar 21 15:44 activate.monmap
-rw-r--r--. 1 ceph ceph           3 Mar 21 15:44 active
-rw-r--r--. 1 ceph ceph 10737418240 Jul  2 16:51 block <==============
-rw-r--r--. 1 ceph ceph           2 Mar 21 15:44 bluefs
-rw-r--r--. 1 ceph ceph          37 Mar 21 15:43 ceph_fsid
-rw-r--r--. 1 ceph ceph          37 Mar 21 15:43 fsid
-rw-------. 1 ceph ceph          56 Mar 21 15:44 keyring
-rw-r--r--. 1 ceph ceph           8 Mar 21 15:44 kv_backend
-rw-r--r--. 1 ceph ceph          21 Mar 21 15:43 magic
-rw-r--r--. 1 ceph ceph           4 Mar 21 15:44 mkfs_done
-rw-r--r--. 1 ceph ceph           6 Mar 21 15:44 ready
-rw-r--r--. 1 ceph ceph           0 Jul  1 21:52 systemd
-rw-r--r--. 1 ceph ceph          10 Mar 21 15:43 type
-rw-r--r--. 1 ceph ceph           2 Mar 21 15:44 whoami 


The BlueStore block device was a file named with a block, not a symlink to block device partition of this disk and that file size was 10G hence it was showing the size of the OSD as 10G.

Redeploying the OSD with filestore fixed the issue.


Note You need to log in before you can comment on or make changes to this bug.