Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 990330 - geo-replication fails for longer fqdn's
Summary: geo-replication fails for longer fqdn's
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: geo-replication
Version: 3.3.2
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Harshavardhana
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 990331
TreeView+ depends on / blocked
 
Reported: 2013-07-30 23:53 UTC by Harshavardhana
Modified: 2015-03-23 01:04 UTC (History)
3 users (show)

Fixed In Version: glusterfs-3.5.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 990331 (view as bug list)
Environment:
Last Closed: 2014-04-17 11:44:38 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Harshavardhana 2013-07-30 23:53:40 UTC
Description of problem:

Geo-replication fails with long fqdn's - work around is to use IP instead - but this case should be handled properly - or documented? 

[2013-07-30 19:21:42.168776] E [resource:191:errlog] Popen: command "ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glus
terd/geo-replication/secret.pem -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-JKeqLt/gsycnd-ssh-%r@%h:%p root@hp-dl380pgen8-05.osas.lab.eng.rdu2.r
edhat.com /nonexistent/gsyncd --session-owner 1dfa13ca-0db1-4c4b-b7ee-2cc6d031e737 -N --listen --timeout 120 file:///mnt/geoslave" returned with
 127, saying:
[2013-07-30 19:21:42.168994] E [resource:194:logerr] Popen: ssh> ControlPath "/tmp/gsyncd-aux-ssh-JKeqLt/gsycnd-ssh-root@hp-dl380pgen8-05.osas.l
ab.eng.rdu2.redhat.com:22.x9XCkMnhGWYQjcg7" too long for Unix domain socket
[2013-07-30 19:21:42.169449] I [syncdutils:142:finalize] <top>: exiting.
[2013-07-30 19:21:52.181953] I [monitor(monitor):80:monitor] Monitor: ------------------------------------------------------------
[2013-07-30 19:21:52.182389] I [monitor(monitor):81:monitor] Monitor: starting gsyncd worker
[2013-07-30 19:21:52.277359] I [gsyncd:354:main_i] <top>: syncing: gluster://localhost:iso -> ssh://root@hp-dl380pgen8-05.osas.lab.eng.rdu2.redh
at.com:/mnt/geoslave
[2013-07-30 19:21:52.528315] E [syncdutils:173:log_raise_exception] <top>: connection to peer is broken
[2013-07-30 19:21:52.529310] E [resource:191:errlog] Popen: command "ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glus
terd/geo-replication/secret.pem -oControlMaster=auto -S /tmp/gsyncd-aux-ssh-YBDwod/gsycnd-ssh-%r@%h:%p root@hp-dl380pgen8-05.osas.lab.eng.rdu2.r
edhat.com /nonexistent/gsyncd --session-owner 1dfa13ca-0db1-4c4b-b7ee-2cc6d031e737 -N --listen --timeout 120 file:///mnt/geoslave" returned with
 127, saying:
[2013-07-30 19:21:52.529525] E [resource:194:logerr] Popen: ssh> ControlPath "/tmp/gsyncd-aux-ssh-YBDwod/gsycnd-ssh-root@hp-dl380pgen8-05.osas.lab.eng.rdu2.redhat.com:22.lb5rK1GpczmxJSDb" too long for Unix domain socket
[2013-07-30 19:21:52.530002] I [syncdutils:142:finalize] <top>: exiting.


Version-Release number of selected component (if applicable):
3.3.2

How reproducible:
Always with socket path

$ echo "/tmp/gsyncd-aux-ssh-YBDwod/gsycnd-ssh-root@hp-dl380pgen8-05.osas.lab.eng.rdu2.redhat.com:22.lb5rK1GpczmxJSDb" | wc -c
109

The actual limit is -

/usr/include/linux/un.h:#define UNIX_PATH_MAX   108

Find a 'hostname' with fqdn with 46 characters and you should be able to see this issue. 

Expected results:
Handle this issue and document it.

Comment 2 Anand Avati 2013-08-02 07:49:31 UTC
REVIEW: http://review.gluster.org/5470 (geo-replication: Use a simple control file instead of long control_path) posted (#1) for review on master by Harshavardhana (harsha@harshavardhana.net)

Comment 3 Anand Avati 2013-08-02 07:54:26 UTC
REVIEW: http://review.gluster.org/5470 (geo-replication: Use a simple control file instead of long control_path) posted (#2) for review on master by Harshavardhana (harsha@harshavardhana.net)

Comment 4 Anand Avati 2013-08-02 08:47:26 UTC
REVIEW: http://review.gluster.org/5470 (geo-replication: Use a simple control file instead of long control_path) posted (#3) for review on master by Harshavardhana (harsha@harshavardhana.net)

Comment 5 Anand Avati 2013-08-09 03:09:54 UTC
REVIEW: http://review.gluster.org/5470 (geo-replication: Use a simple control file instead of long control_path) posted (#4) for review on master by Harshavardhana (harsha@harshavardhana.net)

Comment 6 Anand Avati 2013-08-10 01:04:33 UTC
REVIEW: http://review.gluster.org/5470 (geo-replication: Use a simple control file instead of long control_path) posted (#5) for review on master by Harshavardhana (harsha@harshavardhana.net)

Comment 7 Anand Avati 2013-08-14 00:49:38 UTC
REVIEW: http://review.gluster.org/5470 (geo-replication: Use a simple control file instead of long control_path) posted (#6) for review on master by Harshavardhana (harsha@harshavardhana.net)

Comment 8 Anand Avati 2013-08-21 23:36:13 UTC
REVIEW: http://review.gluster.org/5681 (geo-replication: Use a md5 based unique control path) posted (#1) for review on master by Harshavardhana (harsha@harshavardhana.net)

Comment 9 Anand Avati 2013-08-22 18:49:11 UTC
REVIEW: http://review.gluster.org/5681 (geo-replication: Use a md5 based unique control path) posted (#2) for review on master by Harshavardhana (harsha@harshavardhana.net)

Comment 10 Anand Avati 2013-08-22 19:39:43 UTC
REVIEW: http://review.gluster.org/5681 (geo-replication: Use a md5 based unique control path) posted (#3) for review on master by Harshavardhana (harsha@harshavardhana.net)

Comment 11 Anand Avati 2013-08-27 23:19:23 UTC
REVIEW: http://review.gluster.org/5681 (geo-replication: Use a md5 based unique control path) posted (#4) for review on master by Harshavardhana (harsha@harshavardhana.net)

Comment 12 Anand Avati 2013-08-28 14:10:55 UTC
REVIEW: http://review.gluster.org/5681 (geo-replication: Use a md5 based unique control path) posted (#5) for review on master by Harshavardhana (harsha@harshavardhana.net)

Comment 13 Anand Avati 2013-09-03 09:32:49 UTC
REVIEW: http://review.gluster.org/5681 (geo-replication: Use a md5 based unique control path) posted (#6) for review on master by Harshavardhana (harsha@harshavardhana.net)

Comment 14 Anand Avati 2013-09-04 19:29:46 UTC
COMMIT: http://review.gluster.org/5681 committed in master by Anand Avati (avati@redhat.com) 
------
commit fa095c24979db2d0a3a6413aa431fe7256be5206
Author: Harshavardhana <harsha@harshavardhana.net>
Date:   Wed Aug 21 16:28:41 2013 -0700

    geo-replication: Use a md5 based unique control path
    
    A hostname fqdn can be of length 255 according to RFC1123
    ------------------------->
    /usr/include/bits/posix1_lim.h:#define _POSIX_HOST_NAME_MAX  255
    <-------------------------
    On linux this length is 64
    ------------------------->
    /usr/include/bits/local_lim.h:#define HOST_NAME_MAX 64
    <-------------------------
    
    When a given hostname is > 45 (characters) - SSH fails with
    
    -------------------------->
    "ControlPath too long for Unix domain socket".
    <--------------------------
    
    Indicating that the total length of ControlPath which is
    on linux should be 108
    
    ------------------------->
    /usr/include/linux/un.h:#define UNIX_PATH_MAX   108
    <-------------------------
    
    This leads to "faulty" geo-replication status.
    
    This patch brings in a new file called manifest which carries
    given a geo-rep session some unique information - with which
    a unique `md5` is generated in a 32length digest, this ensures
    that we don't exceed UNIX_PATH_MAX limitations instead we use
    a conservative approach and still be able to provide a unique
    socket path.
    
    Change-Id: I3a6a27d605d751a86e7c82eace4561d9b0134fe1
    BUG: 990330
    Signed-off-by: Harshavardhana <harsha@harshavardhana.net>
    Reviewed-on: http://review.gluster.org/5681
    Tested-by: Gluster Build System <jenkins@build.gluster.com>
    Reviewed-by: Csaba Henk <csaba@redhat.com>

Comment 15 Niels de Vos 2014-04-17 11:44:38 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.5.0, please reopen this bug report.

glusterfs-3.5.0 has been announced on the Gluster Developers mailinglist [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/6137
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user


Note You need to log in before you can comment on or make changes to this bug.