Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1596401 - garbage collection activity drops client performance in half
Summary: garbage collection activity drops client performance in half
Keywords:
Status: ASSIGNED
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RGW
Version: 3.1
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: rc
: 4.0
Assignee: Mark Kogan
QA Contact: Tiffany Nguyen
Bara Ancincova
URL:
Whiteboard:
Depends On:
Blocks: 1629656 1641792 1584264
TreeView+ depends on / blocked
 
Reported: 2018-06-28 20:14 UTC by John Harrigan
Modified: 2019-04-11 07:32 UTC (History)
13 users (show)

Fixed In Version:
Doc Type: Known Issue
Doc Text:
.Ceph Object Gateway garbage collection decreases client performance by up to 50% during mixed workload In testing during a mixed workload of 60% read operations, 16% write operations, 14% delete operations, and 10% list operations, at 18 hours into the testing run, client throughput and bandwidth drop to half their earlier levels.
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)
PERF CHART COSbench (deleted)
2018-06-28 20:14 UTC, John Harrigan
no flags Details
garbage collection logfile (deleted)
2018-06-28 20:18 UTC, John Harrigan
no flags Details
ioWorkload.xml jobfile (deleted)
2018-06-28 20:21 UTC, John Harrigan
no flags Details
refrence large ceph deployment information (deleted)
2018-11-13 10:06 UTC, Mark Kogan
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Bugzilla 1601068 None CLOSED rgw_gc_list may fail to report end of list 2019-02-20 14:43:41 UTC

Internal Links: 1601068

Description John Harrigan 2018-06-28 20:14:31 UTC
Created attachment 1455401 [details]
PERF CHART COSbench

Description of problem:
while running amixed operation customer representative RGW workload, the client
performance levels (throughput and bandwidth) decrease significantly at a time which aligns with RGW garbage collection activity.

Version-Release number of selected component (if applicable):
RHCEPH-3.1-RHEL-7-20180530.ci.0

Steps to Reproduce:
1. The COSbench workload contains: 60% reads; 16% writes; 14% deletes and 10% lists (see attached ioWorkload.xml)
2. Let the workload run for specified runtime of 24 hours
3. At 18hrs into the runtime, client throughput and bandwidth drop to half
   their earlier levels (sharp cliff). See attachment PERF CHART
4. review attached 'garbage collection logfile'. 

Job starts at timestamp:
   2018/06/27:16:54:38: Pending GC's == 55106

Pending GC count climbs steadily until timestamp (18 hours into the run):
   2018/06/28:10:47:05: %RAW USED 55.81; Pending GCs 3404570
Starting then 'Pending GC' count gets reduced, indicating increased RGW garbage collection activity. On the PERF CHART there is a sharp decline in performance levels at that same time (sample #12937) and the earlier performance levels don't return. COSbench is using 5sec sampling interval so roughly 18hrs into the run.

Actual results:
Cluster performance slashed in half during long running mixed operation workload.

Expected results:
Cluster sustains reasonably consistent performance for a long running mixed operation workload.


Attachments
1) PERF CHART COSbench
2) ioWorkload.xml
3) garbage collection logfile

Comment 3 John Harrigan 2018-06-28 20:18:09 UTC
Created attachment 1455402 [details]
garbage collection logfile

Comment 4 John Harrigan 2018-06-28 20:21:28 UTC
Created attachment 1455403 [details]
ioWorkload.xml      jobfile

Comment 27 Mark Kogan 2018-11-13 10:06:40 UTC
Created attachment 1505182 [details]
refrence large ceph deployment information


Note You need to log in before you can comment on or make changes to this bug.