Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1358129 - Multisite some of the object sync operations were skipped
Summary: Multisite some of the object sync operations were skipped
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Ceph Storage
Classification: Red Hat
Component: RGW
Version: 2.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: rc
: 2.0
Assignee: Casey Bodley
QA Contact: shilpa
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-07-20 06:41 UTC by shilpa
Modified: 2017-07-31 14:15 UTC (History)
10 users (show)

Fixed In Version: RHEL: ceph-10.2.2-26.el7cp Ubuntu: ceph_10.2.2-20redhat1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-08-23 19:44:48 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1755 normal SHIPPED_LIVE Red Hat Ceph Storage 2.0 bug fix and enhancement update 2016-08-23 23:23:52 UTC
Ceph Project Bug Tracker 16742 None None None 2016-07-20 06:41:23 UTC

Description shilpa 2016-07-20 06:41:24 UTC
Description of problem:
Upload multipart objects on both zones. A few objects were skipped from being synced on one of the zones and retry was not attempted. 

Version-Release number of selected component (if applicable):
ceph-radosgw-10.2.2-23.el7cp.x86_64

How reproducible:
Saw this issue only on the new build ceph-radosgw-10.2.2-23.el7cp.x86_64


Actual results:

The initial multipart upload of bucket3/big.txt started on magna115 here:

2016-07-19 10:07:35.218618 7f0acbfff700  1 ====== starting new request
req=0x7f0acbff9710 =====
2016-07-19 10:07:35.218629 7f0acbfff700  2 req 3236:0.000011::PUT
/bucket3/big.txt::initializing for trans_id =
tx000000000000000000ca4-00578dfbe7-5e46-us-1

and finished here:

2016-07-19 10:08:17.049985 7f0ac47f0700  1 ====== req done
req=0x7f0ac47ea710 op status=0 http_status=200 ======
2016-07-19 10:08:17.050015 7f0ac47f0700  1 civetweb: 0x7f0b780009b0:
10.8.128.74 - - [19/Jul/2016:10:08:16 +0000] "PUT /bucket3/big.txt
HTTP/1.1" 200 0 - Boto/2.41.0 Python/2.7.5 Linux/3.10.0-327.el7.x86_64


magna059 sees it in the sync log:

2016-07-19 10:08:41.390609 7f33daffd700 20 bucket sync single entry
(source_zone=f5717851-2682-475a-b24b-7bcdec728cbe)
b=bucket3:f5717851-2682-475a-b24b-7bcdec728cbe.14122.41/big.txt[0]
log_entry=00000000204.11489.3 op=0 op_state=1
2016-07-19 10:08:41.390699 7f33daffd700 20
cr:s=0x7f3354197230:op=0x7f335463bf40:26RGWBucketSyncSingleEntryCRISs11rgw_obj_keyE:
operate()
2016-07-19 10:08:41.390706 7f33daffd700  5 bucket sync: sync obj:
f5717851-2682-475a-b24b-7bcdec728cbe/bucket3(@{i=us-2.rgw.buckets.index,e=us-1.rgw.buckets.non-ec}us-2.rgw.buckets.data[f5717851-2682-475a-b24b-7bcdec728cbe.14122.41])/big.txt[0]
2016-07-19 10:08:41.390711 7f33daffd700  5
Sync:f5717851:data:Object:bucket3:f5717851-2682-475a-b24b-7bcdec728cbe.14122.41/big.txt[0]:fetch
...
2016-07-19 10:08:41.397525 7f33f47e8700 20 sending request to
http://magna115:80/bucket3/big.txt?rgwx-zonegroup=0bf0fc77-43ce-4a44-8b16-8f5fcfa84c95&rgwx-prepend-metadata=0bf0fc77-43ce-4a44-8b16-8f5fcfa84c95


magna115 sees GET request from magna059:

2016-07-19 10:08:37.179355 7f0b31ffb700  2 req 3332:0.001339:s3:GET
/bucket3/big.txt:get_obj:executing

... (almost 50 minutes later) ...

2016-07-19 10:57:59.221172 7f0b31ffb700  0 ERROR: flush_read_list():
d->client_c->handle_data() returned -5
2016-07-19 10:57:59.221193 7f0b31ffb700 20 get_obj_data::cancel_all_io()
2016-07-19 10:57:59.221629 7f0b31ffb700  0 WARNING: set_req_state_err
err_no=5 resorting to 500
2016-07-19 10:57:59.221737 7f0b31ffb700  2 req 3332:2962.043722:s3:GET
/bucket3/big.txt:get_obj:completing
2016-07-19 10:57:59.221748 7f0b31ffb700  2 req 3332:2962.043733:s3:GET
/bucket3/big.txt:get_obj:op status=-5
2016-07-19 10:57:59.221752 7f0b31ffb700  2 req 3332:2962.043737:s3:GET
/bucket3/big.txt:get_obj:http status=500
2016-07-19 10:57:59.221759 7f0b31ffb700  1 ====== req done
req=0x7f0b31ff5710 op status=-5 http_status=500 ======
2016-07-19 10:57:59.221786 7f0b31ffb700 20 process_request() returned -5


magna059 gets error reply:

2016-07-19 10:59:52.175208 7f33f47e8700  0 store->fetch_remote_obj()
returned r=-5
2016-07-19 10:59:52.175604 7f33f47e8700  1 heartbeat_map reset_timeout
'RGWAsyncRadosProcessor::m_tp thread 0x7f33f47e8700' had timed out after 600
...
2016-07-19 10:59:52.177590 7f33daffd700 20
cr:s=0x7f3354197230:op=0x7f3354473800:19RGWFetchRemoteObjCR: operate()
2016-07-19 10:59:52.177596 7f33daffd700 20
cr:s=0x7f3354197230:op=0x7f3354473800:19RGWFetchRemoteObjCR: operate()
returned r=-5
...
2016-07-19 10:59:52.178023 7f33daffd700 20
cr:s=0x7f3354197230:op=0x7f335463bf40:26RGWBucketSyncSingleEntryCRISs11rgw_obj_keyE:
operate()
2016-07-19 10:59:52.178028 7f33daffd700  5
Sync:f5717851:data:Object:bucket3:f5717851-2682-475a-b24b-7bcdec728cbe.14122.41/big.txt[0]:done,
retcode=-5
2016-07-19 10:59:52.178031 7f33daffd700  0 ERROR: failed to sync object:
bucket3:f5717851-2682-475a-b24b-7bcdec728cbe.14122.41/big.txt
...
2016-07-19 11:00:05.774443 7f33daffd700 20
cr:s=0x7f3354197230:op=0x7f335463bf40:26RGWBucketSyncSingleEntryCRISs11rgw_obj_keyE:
operate()
2016-07-19 11:00:05.774445 7f33daffd700 20
cr:s=0x7f3354197230:op=0x7f335463bf40:26RGWBucketSyncSingleEntryCRISs11rgw_obj_keyE:
operate() returned r=-5
2016-07-19 11:00:05.774448 7f33daffd700  5
Sync:f5717851:data:Object:bucket3:f5717851-2682-475a-b24b-7bcdec728cbe.14122.41/big.txt[0]:finish
2016-07-19 11:00:05.774451 7f33daffd700 20 stack->operate() returned ret=-5

Despite the error, we still update incremental bucket sync position.

Comment 6 John Poelstra 2016-07-20 15:47:57 UTC
Will get pulled into next build

Comment 11 shilpa 2016-08-01 12:28:32 UTC
The issue has not been seen since ceph-10.2.2-26. Moving to verified

Comment 13 errata-xmlrpc 2016-08-23 19:44:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1755.html


Note You need to log in before you can comment on or make changes to this bug.