Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1358346 - gdisk does not recognize ceph journal partition types
Summary: gdisk does not recognize ceph journal partition types
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: Ceph-Disk
Version: 1.3.2
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: rc
: 2.2
Assignee: Loic Dachary
QA Contact: ceph-qe-bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-07-20 14:45 UTC by Tomas Rusnak
Modified: 2017-07-30 14:58 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-12-01 07:39:29 UTC


Attachments (Terms of Use)

Description Tomas Rusnak 2016-07-20 14:45:43 UTC
Description of problem:
As ceph-deploy is trying to change journal disk partition type to 45B0969E-9B03-4F30-B4C6-B4B80CEFF106 (ceph journal), it fail on RHEL7 with outdated gdisk.

Version-Release number of selected component (if applicable):
gdisk-0.8.6-5.el7.x86_64

How reproducible:
always

Steps to Reproduce:
# sgdisk --info=1 /dev/sdb | grep 45B0969E-9B03-4F30-B4C6-B4B80CEFF106
Partition GUID code: 45B0969E-9B03-4F30-B4C6-B4B80CEFF106 (Unknown)

sgdisk -t 1:45B0969E-9B03-4F30-B4C6-B4B80CEFF106 /dev/sdb

with gdisk-1.0.1:

# sgdisk --info=1 /dev/sdb | grep 45B0969E-9B03-4F30-B4C6-B4B80CEFF106
Partition GUID code: 45B0969E-9B03-4F30-B4C6-B4B80CEFF106 (Ceph journal)

There is also problem to set partition type to ceph journal with old version of gdisk.

Actual results:
no ceph support in gdisk

Expected results:
bump gdisk version

Comment 2 Ken Dreyer (Red Hat) 2016-07-20 14:47:25 UTC
What version of ceph-deploy and ceph are you using?

Comment 3 Tomas Rusnak 2016-08-02 13:33:20 UTC
ceph-deploy-1.5.34
ceph-10.2.2

Comment 6 Loic Dachary 2016-09-27 07:21:56 UTC
It's worth a bug report in gdisk indeed: the reproducer seems easy to provide. How come this did not cause trouble previously ?

Comment 7 Ken Dreyer (Red Hat) 2016-09-28 02:50:17 UTC
Loic, would you please open a new bug in RHEL's gdisk component to describe exactly what we need there? https://bugzilla.redhat.com/enter_bug.cgi?product=Red%20Hat%20Enterprise%20Linux%207

Comment 8 Loic Dachary 2016-09-29 09:46:58 UTC
@Tomas could you please provide me with a reproducer so that I can report a bug against gdisk ? Thanks :-)

Comment 9 Tomas Rusnak 2016-10-03 11:07:48 UTC
Main issue here is a little bit complex situation between gdisk, ceph-disk, ceph-osd and udev rule.

Imagine you are creating partitions with ceph-disk. With old gdisk, those partitions cannot be marked as Ceph journal - UUID 45b0969e-9b03-4f30-b4c6-b4b80ceff106. Ceph-disk will end up with warning only.

As journal need to be owned by ceph.ceph, we should use this rule:

# JOURNAL_UUID
ACTION=="add", SUBSYSTEM=="block", \
  ENV{DEVTYPE}=="partition", \
  ENV{ID_PART_ENTRY_TYPE}=="45b0969e-9b03-4f30-b4c6-b4b80ceff106", \
  OWNER:="ceph", GROUP:="ceph", MODE:="660", \
  RUN+="/usr/sbin/ceph-disk --log-stdout -v trigger /dev/$name"
ACTION=="change", SUBSYSTEM=="block", \
  ENV{ID_PART_ENTRY_TYPE}=="45b0969e-9b03-4f30-b4c6-b4b80ceff106", \
  OWNER="ceph", GROUP="ceph", MODE="660"

With old gdisk you are unable to get journal working after reboot, because partition UUID doesn't match udev rule and ceph-osd will became down as device have wrong owner.

This was somehow solved upstream: http://tracker.ceph.com/issues/13833

I think, this should be a part of wider discussion, how to activate journals properly. on MBR, as you can read in upstream bug.

Comment 10 Loic Dachary 2016-10-03 15:00:26 UTC
@Tomas thanks for the additional information. I did not realize that was a combination of actions that leads to a problem. Since a pure sgdisk based reproducer is not possible, could you write down a reproducer with ceph ? As soon as I'm able to successfully run the reproducer, I'll find a solution or a workardound. Thanks !

Comment 11 Harish NV Rao 2016-10-04 06:54:44 UTC
@Loic, will this be fixed in 2.1?

Comment 12 Loic Dachary 2016-10-04 07:15:35 UTC
@Harish I'll be able to tell after running the reproducer. It's unlikely to be a big issue, just a tricky one ;-)

Comment 13 Tomas Rusnak 2016-10-04 08:19:15 UTC
Pure sgdisk based reproducer is in Comment #1. Only what you need to check is if Ceph journal UUID is supported in sgdisk or not.

More complex reproducer is:
1. use ceph-deploy to create ceph cluster with journal (GPT needed)
2. reboot
3. ceph-osd is down due to wrong owner on /dev/sdX (journal cannot be activated)
   $ ceph osd tree
4. check with sgdisk if UUID of the partition is set to Ceph journal
   $ sgdisk --info=1 /dev/sdX
5. check udev rules, if journal device owner is changing
   $ udevadm trigger --verbose | grep sd
   $ ls -la /dev/sdX (owner should be ceph.ceph)

Comment 14 Loic Dachary 2016-10-04 17:52:21 UTC
@Tomas I'm sorry to be needy, really. I would help a lot of you could describe the reproducer as a sequence of commands to run as well as the version of the operating system, ceph-deploy and ceph. Thanks for your understanding.

Comment 15 Harish NV Rao 2016-10-06 07:18:14 UTC
@Loic and @Ken, this bug is targeted for 2.1. In 2.x we don't support ceph-deploy. The steps to reproduce in comment 13 has reference to ceph-deploy. Shouldn't this be re-targeted to 1.3.x?

Comment 16 Loic Dachary 2016-10-06 07:35:57 UTC
@Tomas here is what I have, I'd be grateful if you could fill the blanks:

* On a pristine RHEL7 machine (which version exactly ?) named foo with a spare, unformated disk /dev/sdd
* Install ceph-deploy-1.5.34 (the installation method does not matter)
* Install ceph-10.2.2 (via ceph-deploy ? from which repository ?)
* gdisk-0.8.6-5.el7.x86_64 has been installed as a dependency of Ceph
* sudo ceph-deploy new foo
* sudo ceph-deploy mon create foo
* sudo ceph-deploy gatherkeys foo
* sudo ceph-deploy osd create /dev/sdd
* the OSD is up and running (please confirm ?)
* reboot
* the OSD is not running

With that I should be able to repeat the problem.

Thanks !

Comment 17 Loic Dachary 2016-10-06 07:37:07 UTC
@Harish ceph-deploy is unlikely to be the cause of the problem here, it's only used as a mean to reproduce the problem.

Comment 18 Loic Dachary 2016-10-06 10:01:58 UTC
I have run the above sequence on CentOS 7.2 with ceph-deploy-1.5.36, gdisk.x86_64 0.8.6-5.el7, ceph-10.2.3 installed via ceph-deploy and the OSD comes back up as it should. Either ceph-10.2.3 has a patch that solves the problem which ceph-10.2.2 does not have or I'm missing something in the reproducer. Trying with 10.2.2 now.

Comment 19 Tomas Rusnak 2016-10-06 11:02:09 UTC
You are missing journal. Create ceph cluster with osds with separate journal partition.

* On a pristine RHEL7 machine (which version exactly ?) named foo with a spare, unformated disk /dev/sdd

you need to have at least 2 disks, one for data, another for journal!

* Install ceph-deploy-1.5.34 (the installation method does not matter)
 - that doesn't matter
* Install ceph-10.2.2 (via ceph-deploy ? from which repository ?)
 - I used jewel
* gdisk-0.8.6-5.el7.x86_64 has been installed as a dependency of Ceph
* sudo ceph-deploy new foo
* sudo ceph-deploy mon create foo
* sudo ceph-deploy gatherkeys foo
* sudo ceph-deploy osd create /dev/sdd

 - to enable separate journal, please use this:
$ ceph-deploy osd prepare host1:/dev/sdc1:/dev/sdb1 host2:/dev/sdd1:/dev/sdb2 host3:/dev/sde1:/dev/sdb3
$ ceph-deploy osd activate host1:/dev/sdc1:/dev/sdb1 host2:/dev/sdd1:/dev/sdb2 host3:/dev/sde1:/dev/sdb3

* the OSD is up and running (please confirm ?)
 - yes
* reboot
* the OSD is not running
 - yes, that's our problem: the osd is down due to not activated Ceph journal partition

Comment 20 Loic Dachary 2016-10-06 11:19:00 UTC
Jewel now is ceph-10.2.3 reason why I had this installed. I tried --dev v10.2.2 but ran into http://tracker.ceph.com/issues/17523 and found a workaround. I'll try with the provided information. Thanks !

Comment 21 Loic Dachary 2016-10-06 13:16:29 UTC
> ceph-deploy osd prepare host1:/dev/sdc1:/dev/sdb1 host2:/dev/sdd1:/dev/sdb2 host3:/dev/sde1:/dev/sdb3

Here you are specifying existing partitions instead of whole disks, which is a border case where bugs may exist :-) The most common way to provision disks with a separate journal is with something like:

> ceph-deploy osd prepare host1:/dev/sdc:/dev/sdb

And ceph-disk will create the partitions as well as the partition table if needed.

In order to repeat the problem, I would need to know exactly how the partition tables for /dev/sdc and /dev/sdb on host1 were created. Just tell me which commands to type, that will be enough, no need to explain it in plain english.

Thanks !

P.S. I ran ceph-deploy osd prepare host1:/dev/sdc:/dev/sdb and the osd came back up after a reboot (still using ceph-10.2.3 though, but it's probably not the issue here.

Comment 22 Loic Dachary 2016-10-06 13:19:14 UTC
For the record here is what I get when trying to run the command above with unformatted disks:

[ubuntu@mira121 ~]$ sudo ceph-disk list
...
/dev/sde other, unknown
/dev/sdf other, unknown
/dev/sdg other, unknown
/dev/sdh other, unknown
[ubuntu@mira121 ~]$ sudo ceph-deploy osd create mira121:/dev/sde1:/dev/sdf1
[ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
[ceph_deploy.cli][INFO  ] Invoked (1.5.36): /bin/ceph-deploy osd create mira121:/dev/sde1:/dev/sdf1
[ceph_deploy.cli][INFO  ] ceph-deploy options:
[ceph_deploy.cli][INFO  ]  username                      : None
[ceph_deploy.cli][INFO  ]  disk                          : [('mira121', '/dev/sde1', '/dev/sdf1')]
[ceph_deploy.cli][INFO  ]  dmcrypt                       : False
[ceph_deploy.cli][INFO  ]  verbose                       : False
[ceph_deploy.cli][INFO  ]  bluestore                     : None
[ceph_deploy.cli][INFO  ]  overwrite_conf                : False
[ceph_deploy.cli][INFO  ]  subcommand                    : create
[ceph_deploy.cli][INFO  ]  dmcrypt_key_dir               : /etc/ceph/dmcrypt-keys
[ceph_deploy.cli][INFO  ]  quiet                         : False
[ceph_deploy.cli][INFO  ]  cd_conf                       : <ceph_deploy.conf.cephdeploy.Conf instance at 0x250cc68>
[ceph_deploy.cli][INFO  ]  cluster                       : ceph
[ceph_deploy.cli][INFO  ]  fs_type                       : xfs
[ceph_deploy.cli][INFO  ]  func                          : <function osd at 0x24fed70>
[ceph_deploy.cli][INFO  ]  ceph_conf                     : None
[ceph_deploy.cli][INFO  ]  default_release               : False
[ceph_deploy.cli][INFO  ]  zap_disk                      : False
[ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks mira121:/dev/sde1:/dev/sdf1
[mira121][DEBUG ] connected to host: mira121 
[mira121][DEBUG ] detect platform information from remote host
[mira121][DEBUG ] detect machine type
[mira121][DEBUG ] find the location of an executable
[ceph_deploy.osd][INFO  ] Distro info: CentOS Linux 7.2.1511 Core
[ceph_deploy.osd][DEBUG ] Deploying osd to mira121
[mira121][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
[ceph_deploy.osd][DEBUG ] Preparing host mira121 disk /dev/sde1 journal /dev/sdf1 activate True
[mira121][DEBUG ] find the location of an executable
[mira121][INFO  ] Running command: /usr/sbin/ceph-disk -v prepare --cluster ceph --fs-type xfs -- /dev/sde1 /dev/sdf1
[mira121][WARNIN] Traceback (most recent call last):
[mira121][WARNIN]   File "/usr/sbin/ceph-disk", line 9, in <module>
[mira121][WARNIN]     load_entry_point('ceph-disk==1.0.0', 'console_scripts', 'ceph-disk')()
[mira121][WARNIN]   File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 5011, in run
[mira121][WARNIN]     main(sys.argv[1:])
[mira121][WARNIN]   File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 4962, in main
[mira121][WARNIN]     args.func(args)
[mira121][WARNIN]   File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 1791, in main
[mira121][WARNIN]     Prepare.factory(args).prepare()
[mira121][WARNIN]   File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 1787, in factory
[mira121][WARNIN]     return PrepareFilestore(args)
[mira121][WARNIN]   File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 1799, in __init__
[mira121][WARNIN]     self.data = PrepareFilestoreData(args)
[mira121][WARNIN]   File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 2389, in __init__
[mira121][WARNIN]     self.set_type()
[mira121][WARNIN]   File "/usr/lib/python2.7/site-packages/ceph_disk/main.py", line 2397, in set_type
[mira121][WARNIN]     dmode = os.stat(self.args.data).st_mode
[mira121][WARNIN] OSError: [Errno 2] No such file or directory: '/dev/sde1'
[mira121][ERROR ] RuntimeError: command returned non-zero exit status: 1
[ceph_deploy.osd][ERROR ] Failed to execute command: /usr/sbin/ceph-disk -v prepare --cluster ceph --fs-type xfs -- /dev/sde1 /dev/sdf1
[ceph_deploy][ERROR ] GenericError: Failed to create 1 OSDs

Comment 23 RHEL Product and Program Management 2016-10-06 15:55:04 UTC
Development Management has reviewed and declined this request.
You may appeal this decision by reopening this request.

Comment 24 Loic Dachary 2016-10-12 12:59:25 UTC
@Tomas, thanks for providing details on how the disk was partitionned prior to being used by Ceph


--------- Snippet of your mail ------------
$ sudo gdisk -l /dev/sdb...
Found valid GPT with protective MBR; using GPT.
Disk /dev/sdb: 234441648 sectors, 111.8 GiB
…
Total free space is 45697901 sectors (21.8 GiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048        62916607   30.0 GiB    8300  Linux filesystem
   2        62916608       125831167   30.0 GiB    8300  Linux filesystem
   3       125831168       188745727   30.0 GiB    8300  Linux filesystem

// create one partition, full size, on all other drives and format it
$ mkfs.xfs -f /dev/sdc1
$ mkfs.xfs -f /dev/sdd1
$ mkfs.xfs -f /dev/sde1 

$ ceph-deploy osd prepare node1:/dev/sdc1:/dev/sdb1 node1:/dev/sdd1:/dev/sdb2 node1:/dev/sde1:/dev/sdb3 

---------------------------------------------

I'm digging into why this will fail but I'm pretty sure this is a real bug. The feature implemented in ceph-disk to re-use an existing partition was meant with another use case in mind: re-using a partition that was previously formatted and tagged (uuid) for ceph. There are no tests (and therefore most probably a bug) for re-using a partition that has been created by something that's not ceph.

The proper course of action (let me know if you agree with this) is to document this limitation and provide a workaround. For instance:

-----------------------------------
When specifying a existing partition for Ceph to use (for instance ceph-deploy osd prepare node1:/dev/sdc1:/dev/sdb1 or ceph-disk osd prepare /dev/sdc1 /dev/sdb1) it must be a partition previously created by Ceph. If it was created manually or by something else, it will fail to be formatted correctly.

If the whole disk is to be used for data, it can be zapped so that ceph-disk will recreate the partition:

ceph-disk zap /dev/sdc
ceph-disk osd prepare /dev/sdc /dev/sdb1

If there are other partitions on the data disk and zapping the disk is out of the question, the target data partition should be manually set with the desired name, uuid and type, using the sgdisk command. This requires an intimate knowledge of what and OSD expects.
------------------------------------

Or it could be to document the fact that re-using non-ceph partitions is not supported, because it is not implemented at the moment.

Comment 25 Tomas Rusnak 2016-10-18 12:16:25 UTC
Yes, this is definitely the fact which should be documented. 

There is one related problem with ceph-disk, still. If you let ceph-disk to re-partition journal for you, it will use default size for it - 5GB currently. Doing so, journal will be just a few GB in size of total disk space on SSD and rest will be unused - depends on how many OSDs you want to use. That's why it's good idea to let customer prepare journal partitions before, use all space available on SSD. From my point of view this should be supported.

Dumping gdisk to newer version should solve the problem for us. As journal UUID is currently supported in 1.0.1, the ceph-disk will not complain anymore.

Comment 26 Loic Dachary 2016-10-18 14:31:18 UTC
@Tomas note that the following:

ceph-disk prepare /dev/sdb /dev/sdf
ceph-disk prepare /dev/sdc /dev/sdf

will actually lead to the creation of two partitions on /dev/sdf, one for the journal of /dev/sdb and one for the journal of /dev/sdc.

Comment 27 Tomas Rusnak 2016-10-19 11:59:48 UTC
Yes, it will create two journal partition on /dev/sdf, but both will be only 5GB in size, by default. Imagine you are creating journals for 8 OSD and you have 200GB SSD. In such case you want to use all available size, so journal should be 200/8=25GB to fill all available space on SSD.
That's the reason to create it by hand.

Comment 28 Loic Dachary 2016-10-19 16:07:58 UTC
@Tomas an alternative is to set osd_journal_size in /etc/ceph/ceph.conf to the desired journal size. If the only motivation is to not waste space, this is not required since wear leveling will eventually use all there is.

Comment 29 Tomas Rusnak 2016-10-24 13:17:33 UTC
I was not aware of osd_journal_size option. Thank you for that. That's can be called supported. The other question is, if manual repartition is/could be the supported way, too. Especially if the solution is just to bump gdisk version.
Wear leveling is not the case here.


Note You need to log in before you can comment on or make changes to this bug.