Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1519315 - glusterfs 3.12.3 crashes with segmentation fault in glusterd_submit_request or rpcsvc_dump
Summary: glusterfs 3.12.3 crashes with segmentation fault in glusterd_submit_request o...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: rpc
Version: 3.12
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Mohit Agrawal
QA Contact:
URL:
Whiteboard:
Depends On: 1521004
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-30 15:22 UTC by Erik Zscheile
Modified: 2018-08-29 03:36 UTC (History)
5 users (show)

Fixed In Version: glusterfs-4.1.3 (or later)
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-08-29 03:36:09 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)
output of `emerge --info` (deleted)
2017-11-30 15:40 UTC, Erik Zscheile
no flags Details


Links
System ID Priority Status Summary Last Updated
Gentoo 639838 None None None 2017-12-06 21:57:45 UTC
Red Hat Bugzilla 1504255 None CLOSED CVE-2017-15096 glusterfs: Null pointer dereference in send_brick_req function in glusterfsd/src/gf_attach.c 2019-01-30 02:36:39 UTC
Red Hat Bugzilla 1536186 None CLOSED build: glibc has removed legacy rpc headers and rpcgen in Fedora28, use libtirpc 2019-01-30 02:36:38 UTC

Internal Links: 1536186

Description Erik Zscheile 2017-11-30 15:22:07 UTC
Description of problem:
GlusterFS version 3.13.3 crashes with segmentation fault in xdr_gf_dump_req
in Gentoo Linux (latest version on Gentoo).
But I think the bug is not in xdr_gf_dump_req, it is called with wrong arguments.
A problem is that glusterfs version 3.13.3 is the only version of glusterfs currently available in gentoo, as the old ones (3.6.5) are removed from the repository due to being vulnerable.
This bug isn't in GlusterFS version 3.6.5, which works.

Version-Release number of selected component (if applicable):
3.13.3 on gentoo linux

How reproducible:
install glusterfs version 3.13.3 on gentoo linux

Steps to Reproduce:
1. emerge =sys-cluster/glusterfs-3.13.3
2. /etc/init.d/glusterd restart

Actual results:
glusterd is killed with SIGSEGV

Expected results:
glusterd starts

Additional info:

gentoo package info page:
https://packages.gentoo.org/packages/sys-cluster/glusterfs

initital post:
https://twitter.com/EZscheile/status/934595665283428354

Archive of coredump, strace and gdb backtrace:
http://ezscheile.bplaced.net/glusterd-segv-pack.tar.gz

Backtrace:
#0  __GI_xdr_uint64_t (xdrs=0x7fda46ac5b20, uip=0x7fda46ac5c60) at xdr_intXX_t.c:71
#1  0x00007fda504e6a29 in xdr_gf_dump_req (xdrs=<optimized out>, objp=<optimized out>) at rpc-common-xdr.c:167
#2  0x00007fda5070fa83 in xdr_sizeof () from /lib64/libtirpc.so.3
#3  0x00007fda4a9057aa in glusterd_submit_request (rpc=0x1495450, req=req@entry=0x7fda46ac5c60, frame=frame@entry=0x7fda38001ec0, prog=prog@entry=0x7fda4ac4e2c0 <glusterd_dump_prog>, procnum=procnum@entry=1, iobref=iobref@entry=0x0, this=0x142a680,
    cbkfn=0x7fda4a942040 <glusterd_peer_dump_version_cbk>, xdrproc=0x7fda504e6a20 <xdr_gf_dump_req>) at glusterd-utils.c:428
#4  0x00007fda4a9473ca in glusterd_peer_dump_version (this=this@entry=0x142a680, rpc=rpc@entry=0x1495450, peerctx=peerctx@entry=0x1494400) at glusterd-handshake.c:2319
#5  0x00007fda4a8ed516 in __glusterd_peer_rpc_notify (rpc=rpc@entry=0x1495450, mydata=mydata@entry=0x1494400, event=event@entry=RPC_CLNT_CONNECT, data=data@entry=0x0) at glusterd-handler.c:6295
#6  0x00007fda4a8e404d in glusterd_big_locked_notify (rpc=0x1495450, mydata=0x1494400, event=RPC_CLNT_CONNECT, data=0x0, notify_fn=0x7fda4a8ed200 <__glusterd_peer_rpc_notify>) at glusterd-handler.c:70
#7  0x00007fda50933f7c in rpc_clnt_notify (trans=<optimized out>, mydata=0x1495480, event=<optimized out>, data=0x1495680) at rpc-clnt.c:1004
#8  0x00007fda50930143 in rpc_transport_notify (this=this@entry=0x1495680, event=event@entry=RPC_TRANSPORT_CONNECT, data=data@entry=0x1495680) at rpc-transport.c:538
#9  0x00007fda47954f8f in socket_connect_finish (this=this@entry=0x1495680) at socket.c:2404
#10 0x00007fda47959511 in socket_event_handler (fd=fd@entry=13, idx=idx@entry=4, gen=gen@entry=1, data=data@entry=0x1495680, poll_in=0, poll_out=4, poll_err=0) at socket.c:2456
#11 0x00007fda50bc23da in event_dispatch_epoll_handler (event=0x7fda46ac5e7c, event_pool=0x1417770) at event-epoll.c:583
#12 event_dispatch_epoll_worker (data=0x1496e60) at event-epoll.c:659
#13 0x00007fda500b7839 in start_thread (arg=0x7fda46ac6700) at pthread_create.c:456
#14 0x00007fda4fdf5adf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

XDRS x_ops:
*(xdrs->x_ops) = {x_getlong = 0x7fda5070f900, x_putlong = 0x7fda5070f880, x_getbytes = 0x7fda5070f900, x_putbytes = 0x7fda5070f8a0, x_getpostn = 0x7fda5070f8c0, x_setpostn = 0x7fda5070f8e0, x_inline = 0x7fda5070f960, x_destroy = 0x7fda5070f920, x_getint32 = 0x0,
  x_putint32 = 0x165296f147c52f00}

Comment 1 Erik Zscheile 2017-11-30 15:26:27 UTC
possible related to: https://bugs.gentoo.org/635172

Comment 2 Erik Zscheile 2017-11-30 15:40:52 UTC
Created attachment 1360977 [details]
output of `emerge --info`

Comment 3 Erik Zscheile 2017-11-30 19:57:11 UTC
compiled sources (with applied gentoo patches):
http://ezscheile.bplaced.net/glusterd-segv-work.tar.gz

Comment 4 Erik Zscheile 2017-11-30 21:17:28 UTC
libtirpc version: 1.0.2-r1 (gentoo),
there might be a bug in xdr_sizeof, which setups the x_ops structure,
but doesn't set x_ops->x_getint32.

Comment 5 Erik Zscheile 2017-11-30 21:41:17 UTC
relevant code snippets

parts from glibc xdr_intXX_t.c:

---- __GI_xdr_uint64_t

/* XDR 64bit integers */
bool_t
xdr_int64_t (XDR *xdrs, int64_t *ip)
{
  int32_t t1, t2;
  switch (xdrs->x_op)
    {
    case XDR_ENCODE:
      t1 = (int32_t) ((*ip) >> 32);
      t2 = (int32_t) (*ip);
      return (XDR_PUTINT32(xdrs, &t1) && XDR_PUTINT32(xdrs, &t2));
    case XDR_DECODE: /*** SEGFAULT HERE ***/
      if (!XDR_GETINT32(xdrs, &t1) || !XDR_GETINT32(xdrs, &t2))
        return FALSE;
      *ip = ((int64_t) t1) << 32;
      *ip |= (uint32_t) t2;        /* Avoid sign extension.  */
      return TRUE;
    case XDR_FREE:
      return TRUE;
    default:
      return FALSE;
    }
}
libc_hidden_nolink_sunrpc (xdr_int64_t, GLIBC_2_1_1)
bool_t
xdr_quad_t (XDR *xdrs, quad_t *ip)
{
  return xdr_int64_t (xdrs, (int64_t *) ip);
}
libc_hidden_nolink_sunrpc (xdr_quad_t, GLIBC_2_3_4)

----

parts from
  libtirpc-1.0.2/src/
  glusterfs-3.12.3/contrib/sunrpc/
xdr_sizeof.c

---- xdr_sizeof

unsigned long
xdr_sizeof (xdrproc_t func, void *data)
{
        XDR x;
        struct xdr_ops ops;
        bool_t stat;

#ifdef GF_DARWIN_HOST_OS
        typedef bool_t (*dummyfunc1) (XDR *, int *);
#else
        typedef bool_t (*dummyfunc1) (XDR *, long *);
#endif
        typedef bool_t (*dummyfunc2) (XDR *, caddr_t, u_int);

        ops.x_putlong = x_putlong;
        ops.x_putbytes = x_putbytes;
        ops.x_inline = x_inline;
        ops.x_getpostn = x_getpostn;
        ops.x_setpostn = x_setpostn;
        ops.x_destroy = x_destroy;

        /* the other harmless ones */
        ops.x_getlong = (dummyfunc1) harmless;
        ops.x_getbytes = (dummyfunc2) harmless;
        /*** ops.x_getint32 NOT SET ***/

        x.x_op = XDR_ENCODE;
        x.x_ops = &ops;
        x.x_handy = 0;
        x.x_private = (caddr_t) NULL;
        x.x_base = (caddr_t) 0;

        stat = func (&x, data, 0);
        if (x.x_private)
                free (x.x_private);
        return (stat == TRUE ? (unsigned) x.x_handy : 0);
}

----

Comment 6 Erik Zscheile 2017-11-30 21:42:17 UTC
NOTE: glibc source taken from https://code.woboq.org/userspace/glibc/sunrpc/xdr_intXX_t.c.html

Comment 7 Erik Zscheile 2017-12-01 19:47:03 UTC
glusterd compiled without libtirpc works.

Comment 8 Erik Zscheile 2017-12-01 20:21:05 UTC
This bug only happens when glusterfs-3.12.3 is compiled against libtirpc-1.0.2-r1. It works with libtirpc-1.0.1-r1. So this is a bug in libtirpc.

Comment 9 Erik Zscheile 2017-12-06 21:54:19 UTC
see https://bugzilla.redhat.com/show_bug.cgi?id=1521004#c1

Comment 10 James Le Cuirot 2017-12-09 16:44:34 UTC
I contributed the patch for explicitly using libtirpc to master and backported it to 3.12.3 for Gentoo. I had been under the impression that libtirpc is just a drop-in replacement for glibc's RPC but after investigating this report, I have found that it segfaults unless you give --with-ipv6-default, which is new in 3.13.0.

More specifically, the crash is avoided if I change addr_family in rpc_transport_inet_options_build (rpc/rpc-lib/src/rpc-transport.c) from inet to inet6. I don't understand why the former causes a crash. Is it not possible to use libtirpc without IPv6? Our libtirpc package allows you to build it with IPv6 support disabled, though this doesn't seem to make any difference to the crash. Do I also have to change the other instances of inet to inet6 or is the new --with-libtirpc flag effectively redundant because all the --with-ipv6-default code is actually required?

Sorry for being slightly clueless here. I have used Gluster occasionally but I'm not the official Gentoo package maintainer, just a dev who thought he'd give the package some attention. I'm not yet familiar with IPv6 either. I am concerned that the flag I've added to master is effectively only good for causing segfaults so I'd like to resolve this before it ends up in a release.

CC'ing Kevin Vigor because he added the --with-ipv6-default flag.

Comment 11 Erik Zscheile 2017-12-09 20:22:07 UTC
backlink to a new backtrace, happens with applied patch:

https://bugs.gentoo.org/639838#c16

The bug is probably in the part which uses RPC / XDR to communicate with peers and depends on libtirpc, the availablity of peers and IPv4/IPv6.

Comment 12 James Le Cuirot 2017-12-10 19:09:28 UTC
Following Erik's testing, we have determined that changing rpc_transport_inet_options_build alone is not sufficient. Still looking for guidance here.

Comment 13 Worker Ant 2018-01-25 22:16:08 UTC
REVIEW: https://review.gluster.org/19334 (build: Fix redefinitions when using libtirpc without IPv6 by default) posted (#1) for review on master by James Le Cuirot

Comment 14 James Le Cuirot 2018-01-25 22:23:18 UTC
I initially thought the above commit would fix this issue. It doesn't but it is still related.

Comment 15 James Le Cuirot 2018-01-25 22:34:42 UTC
This is actually fixed by https://review.gluster.org/#/c/19330.

Comment 16 Amar Tumballi 2018-08-29 03:36:09 UTC
This update is done in bulk based on the state of the patch and the time since last activity. If the issue is still seen, please reopen the bug.


Note You need to log in before you can comment on or make changes to this bug.