Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.

Bug 1519315

Summary: glusterfs 3.12.3 crashes with segmentation fault in glusterd_submit_request or rpcsvc_dump
Product: [Community] GlusterFS Reporter: Erik Zscheile <erik.zscheile.ytrizja>
Component: rpcAssignee: Mohit Agrawal <moagrawa>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.12CC: bugs, chewi, kkeithle, kvigor, rgowdapp
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Fixed In Version: glusterfs-4.1.3 (or later) Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-08-29 03:36:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On: 1521004    
Bug Blocks:    
Description Flags
output of `emerge --info` none

Description Erik Zscheile 2017-11-30 15:22:07 UTC
Description of problem:
GlusterFS version 3.13.3 crashes with segmentation fault in xdr_gf_dump_req
in Gentoo Linux (latest version on Gentoo).
But I think the bug is not in xdr_gf_dump_req, it is called with wrong arguments.
A problem is that glusterfs version 3.13.3 is the only version of glusterfs currently available in gentoo, as the old ones (3.6.5) are removed from the repository due to being vulnerable.
This bug isn't in GlusterFS version 3.6.5, which works.

Version-Release number of selected component (if applicable):
3.13.3 on gentoo linux

How reproducible:
install glusterfs version 3.13.3 on gentoo linux

Steps to Reproduce:
1. emerge =sys-cluster/glusterfs-3.13.3
2. /etc/init.d/glusterd restart

Actual results:
glusterd is killed with SIGSEGV

Expected results:
glusterd starts

Additional info:

gentoo package info page:

initital post:

Archive of coredump, strace and gdb backtrace:

#0  __GI_xdr_uint64_t (xdrs=0x7fda46ac5b20, uip=0x7fda46ac5c60) at xdr_intXX_t.c:71
#1  0x00007fda504e6a29 in xdr_gf_dump_req (xdrs=<optimized out>, objp=<optimized out>) at rpc-common-xdr.c:167
#2  0x00007fda5070fa83 in xdr_sizeof () from /lib64/
#3  0x00007fda4a9057aa in glusterd_submit_request (rpc=0x1495450, req=req@entry=0x7fda46ac5c60, frame=frame@entry=0x7fda38001ec0, prog=prog@entry=0x7fda4ac4e2c0 <glusterd_dump_prog>, procnum=procnum@entry=1, iobref=iobref@entry=0x0, this=0x142a680,
    cbkfn=0x7fda4a942040 <glusterd_peer_dump_version_cbk>, xdrproc=0x7fda504e6a20 <xdr_gf_dump_req>) at glusterd-utils.c:428
#4  0x00007fda4a9473ca in glusterd_peer_dump_version (this=this@entry=0x142a680, rpc=rpc@entry=0x1495450, peerctx=peerctx@entry=0x1494400) at glusterd-handshake.c:2319
#5  0x00007fda4a8ed516 in __glusterd_peer_rpc_notify (rpc=rpc@entry=0x1495450, mydata=mydata@entry=0x1494400, event=event@entry=RPC_CLNT_CONNECT, data=data@entry=0x0) at glusterd-handler.c:6295
#6  0x00007fda4a8e404d in glusterd_big_locked_notify (rpc=0x1495450, mydata=0x1494400, event=RPC_CLNT_CONNECT, data=0x0, notify_fn=0x7fda4a8ed200 <__glusterd_peer_rpc_notify>) at glusterd-handler.c:70
#7  0x00007fda50933f7c in rpc_clnt_notify (trans=<optimized out>, mydata=0x1495480, event=<optimized out>, data=0x1495680) at rpc-clnt.c:1004
#8  0x00007fda50930143 in rpc_transport_notify (this=this@entry=0x1495680, event=event@entry=RPC_TRANSPORT_CONNECT, data=data@entry=0x1495680) at rpc-transport.c:538
#9  0x00007fda47954f8f in socket_connect_finish (this=this@entry=0x1495680) at socket.c:2404
#10 0x00007fda47959511 in socket_event_handler (fd=fd@entry=13, idx=idx@entry=4, gen=gen@entry=1, data=data@entry=0x1495680, poll_in=0, poll_out=4, poll_err=0) at socket.c:2456
#11 0x00007fda50bc23da in event_dispatch_epoll_handler (event=0x7fda46ac5e7c, event_pool=0x1417770) at event-epoll.c:583
#12 event_dispatch_epoll_worker (data=0x1496e60) at event-epoll.c:659
#13 0x00007fda500b7839 in start_thread (arg=0x7fda46ac6700) at pthread_create.c:456
#14 0x00007fda4fdf5adf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

XDRS x_ops:
*(xdrs->x_ops) = {x_getlong = 0x7fda5070f900, x_putlong = 0x7fda5070f880, x_getbytes = 0x7fda5070f900, x_putbytes = 0x7fda5070f8a0, x_getpostn = 0x7fda5070f8c0, x_setpostn = 0x7fda5070f8e0, x_inline = 0x7fda5070f960, x_destroy = 0x7fda5070f920, x_getint32 = 0x0,
  x_putint32 = 0x165296f147c52f00}

Comment 1 Erik Zscheile 2017-11-30 15:26:27 UTC
possible related to:

Comment 2 Erik Zscheile 2017-11-30 15:40:52 UTC
Created attachment 1360977 [details]
output of `emerge --info`

Comment 3 Erik Zscheile 2017-11-30 19:57:11 UTC
compiled sources (with applied gentoo patches):

Comment 4 Erik Zscheile 2017-11-30 21:17:28 UTC
libtirpc version: 1.0.2-r1 (gentoo),
there might be a bug in xdr_sizeof, which setups the x_ops structure,
but doesn't set x_ops->x_getint32.

Comment 5 Erik Zscheile 2017-11-30 21:41:17 UTC
relevant code snippets

parts from glibc xdr_intXX_t.c:

---- __GI_xdr_uint64_t

/* XDR 64bit integers */
xdr_int64_t (XDR *xdrs, int64_t *ip)
  int32_t t1, t2;
  switch (xdrs->x_op)
    case XDR_ENCODE:
      t1 = (int32_t) ((*ip) >> 32);
      t2 = (int32_t) (*ip);
      return (XDR_PUTINT32(xdrs, &t1) && XDR_PUTINT32(xdrs, &t2));
    case XDR_DECODE: /*** SEGFAULT HERE ***/
      if (!XDR_GETINT32(xdrs, &t1) || !XDR_GETINT32(xdrs, &t2))
        return FALSE;
      *ip = ((int64_t) t1) << 32;
      *ip |= (uint32_t) t2;        /* Avoid sign extension.  */
      return TRUE;
    case XDR_FREE:
      return TRUE;
      return FALSE;
libc_hidden_nolink_sunrpc (xdr_int64_t, GLIBC_2_1_1)
xdr_quad_t (XDR *xdrs, quad_t *ip)
  return xdr_int64_t (xdrs, (int64_t *) ip);
libc_hidden_nolink_sunrpc (xdr_quad_t, GLIBC_2_3_4)


parts from

---- xdr_sizeof

unsigned long
xdr_sizeof (xdrproc_t func, void *data)
        XDR x;
        struct xdr_ops ops;
        bool_t stat;

        typedef bool_t (*dummyfunc1) (XDR *, int *);
        typedef bool_t (*dummyfunc1) (XDR *, long *);
        typedef bool_t (*dummyfunc2) (XDR *, caddr_t, u_int);

        ops.x_putlong = x_putlong;
        ops.x_putbytes = x_putbytes;
        ops.x_inline = x_inline;
        ops.x_getpostn = x_getpostn;
        ops.x_setpostn = x_setpostn;
        ops.x_destroy = x_destroy;

        /* the other harmless ones */
        ops.x_getlong = (dummyfunc1) harmless;
        ops.x_getbytes = (dummyfunc2) harmless;
        /*** ops.x_getint32 NOT SET ***/

        x.x_op = XDR_ENCODE;
        x.x_ops = &ops;
        x.x_handy = 0;
        x.x_private = (caddr_t) NULL;
        x.x_base = (caddr_t) 0;

        stat = func (&x, data, 0);
        if (x.x_private)
                free (x.x_private);
        return (stat == TRUE ? (unsigned) x.x_handy : 0);


Comment 6 Erik Zscheile 2017-11-30 21:42:17 UTC
NOTE: glibc source taken from

Comment 7 Erik Zscheile 2017-12-01 19:47:03 UTC
glusterd compiled without libtirpc works.

Comment 8 Erik Zscheile 2017-12-01 20:21:05 UTC
This bug only happens when glusterfs-3.12.3 is compiled against libtirpc-1.0.2-r1. It works with libtirpc-1.0.1-r1. So this is a bug in libtirpc.

Comment 9 Erik Zscheile 2017-12-06 21:54:19 UTC

Comment 10 James Le Cuirot 2017-12-09 16:44:34 UTC
I contributed the patch for explicitly using libtirpc to master and backported it to 3.12.3 for Gentoo. I had been under the impression that libtirpc is just a drop-in replacement for glibc's RPC but after investigating this report, I have found that it segfaults unless you give --with-ipv6-default, which is new in 3.13.0.

More specifically, the crash is avoided if I change addr_family in rpc_transport_inet_options_build (rpc/rpc-lib/src/rpc-transport.c) from inet to inet6. I don't understand why the former causes a crash. Is it not possible to use libtirpc without IPv6? Our libtirpc package allows you to build it with IPv6 support disabled, though this doesn't seem to make any difference to the crash. Do I also have to change the other instances of inet to inet6 or is the new --with-libtirpc flag effectively redundant because all the --with-ipv6-default code is actually required?

Sorry for being slightly clueless here. I have used Gluster occasionally but I'm not the official Gentoo package maintainer, just a dev who thought he'd give the package some attention. I'm not yet familiar with IPv6 either. I am concerned that the flag I've added to master is effectively only good for causing segfaults so I'd like to resolve this before it ends up in a release.

CC'ing Kevin Vigor because he added the --with-ipv6-default flag.

Comment 11 Erik Zscheile 2017-12-09 20:22:07 UTC
backlink to a new backtrace, happens with applied patch:

The bug is probably in the part which uses RPC / XDR to communicate with peers and depends on libtirpc, the availablity of peers and IPv4/IPv6.

Comment 12 James Le Cuirot 2017-12-10 19:09:28 UTC
Following Erik's testing, we have determined that changing rpc_transport_inet_options_build alone is not sufficient. Still looking for guidance here.

Comment 13 Worker Ant 2018-01-25 22:16:08 UTC
REVIEW: (build: Fix redefinitions when using libtirpc without IPv6 by default) posted (#1) for review on master by James Le Cuirot

Comment 14 James Le Cuirot 2018-01-25 22:23:18 UTC
I initially thought the above commit would fix this issue. It doesn't but it is still related.

Comment 15 James Le Cuirot 2018-01-25 22:34:42 UTC
This is actually fixed by

Comment 16 Amar Tumballi 2018-08-29 03:36:09 UTC
This update is done in bulk based on the state of the patch and the time since last activity. If the issue is still seen, please reopen the bug.