Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1600451 - crash on glusterfs_handle_brick_status of the glusterfsd
Summary: crash on glusterfs_handle_brick_status of the glusterfsd
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: core
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact: hari gowtham
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-12 09:45 UTC by hari gowtham
Modified: 2018-10-23 15:13 UTC (History)
8 users (show)

Fixed In Version: glusterfs-5.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1600057
Environment:
Last Closed: 2018-10-23 15:13:48 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Comment 1 hari gowtham 2018-07-12 09:45:55 UTC
Description of problem:
On a WA setup, the glusterfsds crash at some random point which might be because of the race. 
This bug is to just avoid the crash from happening. RCA tracked separately.

They crash at:

Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd.debug...done.
done.
[New LWP 3816]
[New LWP 3817]
[New LWP 3823]
[New LWP 3813]
[New LWP 3814]
[New LWP 3815]
[New LWP 3812]

warning: Could not load shared library symbols for /lib64/libnss_sss.so.2.
Do you need "set solib-search-path" or "set sysroot"?

warning: Could not load shared library symbols for SINFO:      0x.
Do you need "set solib-search-path" or "set sysroot"?
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfsd -s bxts470192.eu.rabonet.com --volfile-id prod_xvavol.bxts'.
Program terminated with signal 11, Segmentation fault.
#0  glusterfs_handle_brick_status (req=0x7fae38001910) at glusterfsd-mgmt.c:1029
1029            any = active->first;
(gdb) bt
#0  glusterfs_handle_brick_status (req=0x7fae38001910) at glusterfsd-mgmt.c:1029
#1  0x00007fae4a5444f2 in synctask_wrap (old_task=<optimized out>) at syncop.c:375
#2  0x00007fae48b87d40 in ?? () from /lib64/libc.so.6
#3  0x0000000000000000 in ?? ()


Version-Release number of selected component (if applicable):
3.3

How reproducible:
rarely on WA cluster

Steps to Reproduce:
1. create a gluster volume
2. import to WA
3. make use of the volume. they crash after a certain time.

Actual results:
Crashes once in a while.

Expected results:
shouldn't crash.

Additional info:
Exact way to reproduce this is unknown and the details we have so far are, it  looks like a race between get-state detail and profile command in various order.
These are the commands the WA set up issues on gluster during the crash.

There are times these commands work will in various orders. once in a while they crash.

The RCA for this is tracked using the bug :
https://bugzilla.redhat.com/show_bug.cgi?id=1596371

There is a similar bug to track the RCA :
https://bugzilla.redhat.com/show_bug.cgi?id=1576726

Both end up in a situation which is not supposed to happen in the same environment.

Comment 2 Worker Ant 2018-07-12 09:48:35 UTC
REVIEW: https://review.gluster.org/20498 (core: dereference check on the variables in glusterfs_handle_brick_status) posted (#1) for review on master by hari gowtham

Comment 3 Worker Ant 2018-07-17 11:12:35 UTC
COMMIT: https://review.gluster.org/20498 committed in master by "Amar Tumballi" <amarts@redhat.com> with a commit message- core: dereference check on the variables in glusterfs_handle_brick_status

problem: In a race condition, the active->first which is supposed to be filled
is NULL and trying to dereference it crashs.

back trace:
Core was generated by `/usr/sbin/glusterfsd -s bxts470192.eu.rabonet.com --volfile-id prod_xvavol.bxts'.
Program terminated with signal 11, Segmentation fault.
1029            any = active->first;
(gdb) bt

Change-Id: Ia6291865319a9456b8b01a5251be2679c4985b7c
fixes: bz#1600451
Signed-off-by: Hari Gowtham <hgowtham@redhat.com>

Comment 4 Shyamsundar 2018-10-23 15:13:48 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-5.0, please open a new bug report.

glusterfs-5.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2018-October/000115.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.