Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1062138 - Dist-geo-rep : too many "connection to peer is broken" which resulted in failures in removing from slave.
Summary: Dist-geo-rep : too many "connection to peer is broken" which resulted in fail...
Keywords:
Status: CLOSED EOL
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: geo-replication
Version: 2.1
Hardware: x86_64
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: Bug Updates Notification Mailing List
QA Contact: storage-qa-internal@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-02-06 09:41 UTC by Vijaykumar Koppad
Modified: 2015-11-25 08:51 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-11-25 08:49:25 UTC


Attachments (Terms of Use)
sosreports of all the nodes of master and slave cluster (deleted)
2014-02-06 09:49 UTC, Vijaykumar Koppad
no flags Details

Description Vijaykumar Koppad 2014-02-06 09:41:15 UTC
Description of problem: geo-rep session between master and slave keeps on getting disconnected because of unknown reasons. The disconnections would result in restart of gsyncd and kick in hybrid crawl. Since the hybrid crawl can't sync deletes and renames to slaves, these disconnection could become major problem, if the master is going through lot of renames and deletes.     

logs from geo-rep log file

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[2014-02-06 14:43:56.596056] E [syncdutils(/bricks/brick11):223:log_raise_exception] <top>: connection to peer is bro
ken
[2014-02-06 14:43:56.610052] E [resource(/bricks/brick11):204:errlog] Popen: command "ssh -oPasswordAuthentication=no
 -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -oControlMaster=auto -S /tmp/gsyncd-aux-s
sh-LtOvaT/0a2c0d8cd2752e32a15bade57111bc93.sock root@10.70.37.141 /nonexistent/gsyncd --session-owner f8c73824-3fa5-4
439-bfbb-50760f8773c8 -N --listen --timeout 120 gluster://localhost:imaster" returned with 255, saying:
[2014-02-06 14:43:56.610382] E [resource(/bricks/brick11):207:logerr] Popen: ssh> [2014-02-06 09:04:26.161084] I [soc
ket.c:3505:socket_init] 0-glusterfs: SSL support is NOT enabled
[2014-02-06 14:43:56.610667] E [resource(/bricks/brick11):207:logerr] Popen: ssh> [2014-02-06 09:04:26.161111] I [soc
ket.c:3520:socket_init] 0-glusterfs: using system polling thread
[2014-02-06 14:43:56.610968] E [resource(/bricks/brick11):207:logerr] Popen: ssh> [2014-02-06 09:04:26.161735] I [soc
ket.c:3505:socket_init] 0-glusterfs: SSL support is NOT enabled
[2014-02-06 14:43:56.611215] E [resource(/bricks/brick11):207:logerr] Popen: ssh> [2014-02-06 09:04:26.161752] I [soc
ket.c:3520:socket_init] 0-glusterfs: using system polling thread
[2014-02-06 14:43:56.611485] E [resource(/bricks/brick11):207:logerr] Popen: ssh> [2014-02-06 09:04:26.353283] I [soc
ket.c:2235:socket_event_handler] 0-transport: disconnecting now
[2014-02-06 14:43:56.611813] E [resource(/bricks/brick11):207:logerr] Popen: ssh> [2014-02-06 09:04:26.354623] I [cli
-rpc-ops.c:5338:gf_cli_getwd_cbk] 0-cli: Received resp to getwd
[2014-02-06 14:43:56.612189] E [resource(/bricks/brick11):207:logerr] Popen: ssh> [2014-02-06 09:04:26.354680] I [inp
ut.c:36:cli_batch] 0-: Exiting with: 0
[2014-02-06 14:43:56.612432] E [resource(/bricks/brick11):207:logerr] Popen: ssh> Killed by signal 15.
[2014-02-06 14:43:56.613003] I [syncdutils(/bricks/brick11):192:finalize] <top>: exiting.
[2014-02-06 14:43:56.616291] E [syncdutils(/bricks/brick3):223:log_raise_exception] <top>: connection to peer is brok
en
[2014-02-06 14:43:56.617547] I [monitor(monitor):81:set_state] Monitor: new state: faulty
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>


Version-Release number of selected component (if applicable):glusterfs-3.4.0.59rhs-1


How reproducible: doesn't happen everytime.


Steps to Reproduce:
No proper steps, can happen anytime

1.create and start a geo-rep relationship between master(6x2) and slave(6x2) 
2.keep creating and deleting files from master
3.check geo-rep logs for disconnections logs. 

Actual results: too many disconnections between master and slave. 


Expected results: there shouldn't so many disconnection without a reason. 


Additional info:

Comment 1 Vijaykumar Koppad 2014-02-06 09:49:50 UTC
Created attachment 860080 [details]
sosreports of all the nodes of master and slave cluster

Comment 4 Aravinda VK 2015-11-25 08:49:25 UTC
Closing this bug since RHGS 2.1 release reached EOL. Required bugs are cloned to RHGS 3.1. Please re-open this issue if found again.

Comment 5 Aravinda VK 2015-11-25 08:51:05 UTC
Closing this bug since RHGS 2.1 release reached EOL. Required bugs are cloned to RHGS 3.1. Please re-open this issue if found again.


Note You need to log in before you can comment on or make changes to this bug.