Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1511868 - Non-Homogeneous Distribution of Bricks across drives in Backend
Summary: Non-Homogeneous Distribution of Bricks across drives in Backend
Keywords:
Status: NEW
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: heketi
Version: cns-3.6
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Michael Adam
QA Contact: Rachael
URL:
Whiteboard:
Depends On:
Blocks: OCS-3.11.1-devel-triage-done 1543779
TreeView+ depends on / blocked
 
Reported: 2017-11-10 10:30 UTC by Shekhar Berry
Modified: 2019-04-11 08:25 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:


Attachments (Terms of Use)

Description Shekhar Berry 2017-11-10 10:30:14 UTC
Description of problem:

There could be a use case in CNS environment where we have large number of small size volumes. In such case bricks corresponding to each volume ends up landing in one or two drives and not on all the backend drives.

In our 3 node CNS environment when we scale from 100 5GB Volumes to 1000 5GB Volumes, corresponding 100 to 1000 bricks created on each CNS nodes doesn't distribute across all 12 drives in the backend. 

For the case of 100, 200 Volume only 2 out of 12 HDDs were utilized whereas in case of 500, 1000 Volume test only 6 out of 12 HDDs were utilized. Lesser the number of HDDs used in the backend less the performance so its imperative we change the code of Heketi to make sure bricks are distributed homogeneously.

The following link points to utilization of HDDs for 100, 200, 500 and 1000 volume case:

http://perf1.perf.lab.eng.bos.redhat.com/pub/shberry/disk_utilization/

If you see the Images in link above you can see how mane drives are actually working when Write IO is being performed on them .This will show how many drives are actually hosting bricks.


Version-Release number of selected component (if applicable):

8 servers were used to setup Openshift with 1 of them being master server. Master node was schedulable
3 servers out of 8 above were dedicated for CNS deployment. These 3 servers were non-schedulable i.e. only hosting storage pods and no application pods on them.
48 GB RAM, 2 CPU sockets with 6 cores each, 12 processors in total on all 8 servers
3 CNS nodes comprised 12 7200 RPMs Hard Drives of 930GB capacity. All were part of CNS topology giving it a total capacity ~11TB (replica 3 setup)


kernal                : 3.10.0-693.el7.x86_64
penshift Version      : v3.6.173.0.7
Kubernetes            : v1.6.1+5115d708d7
Docker                : 1.12.6 , docker-1.12.6-31.1.git97ba2c0.el7.x86_64
rhgs server image     : rhgs3/rhgs-server-rhel7:3.3.0-24
volmanager            : rhgs3/rhgs-volmanager-rhel7:3.3.0-27
heketi                : 5.0.0-11.el7rhgs.x86_64 and heketi-client-5.0.0-11.el7rhgs.x86_64
cns-deploy            : cns-deploy-5.0.0-41.el7rhgs.x86_64



How reproducible:

Always


Steps to Reproduce:
1. Scale 100 small size Volumes in a 3 nodes CNS environment (make sure there is significant number of HDDs in the config)
2. Check where the bricks land
3.

Actual results:

Bricks land in 1-2 HDDs out of 12 HDDs

Expected results:

Bricks should be homogeneously distributed across all the HDDs in the backend. 


Additional info:

Comment 3 Michael Adam 2018-01-11 10:05:54 UTC
Thanks for reporting! We will look into improving this soon.

Apart from the situation of multiple disks per node on a 3-node cluster, there is also the situation of more than three nodes, where inhomogeneity could occur.

Comment 7 Niels de Vos 2018-05-08 08:19:10 UTC
John, IIRC you re-modelled the brick allocation logic. Do you see a chance to improve the distribution so that the devices will host the same amount of bricks?

I am not sure how the algorithm is implemented now, but it may be more random and a round-robin way could result in a more 'equal' distribution.

Comment 8 John Mulligan 2018-05-08 13:59:08 UTC
It can be done, and certainly should be done eventually. However, it's not a small job IMO. Recent refactoring should make this easier but all current "placers" still rely on the same basic code that can produce these uneven layouts.

Comment 9 Niels de Vos 2018-05-08 14:45:08 UTC
Moving this out to cns-3.11.0, might become a glusterd2 enhancement later.


Note You need to log in before you can comment on or make changes to this bug.