Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1593865 - shd crash on startup
Summary: shd crash on startup
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: replicate
Version: rhgs-3.3
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: RHGS 3.4.0
Assignee: Ravishankar N
QA Contact: nchilaka
URL:
Whiteboard:
: 1519105 (view as bug list)
Depends On: 1596513 1597229 1597230
Blocks: 1598340 1503137 1582526 1597663
TreeView+ depends on / blocked
 
Reported: 2018-06-21 17:27 UTC by John Strunk
Modified: 2018-10-22 06:04 UTC (History)
9 users (show)

Fixed In Version: glusterfs-3.12.2-14
Doc Type: Bug Fix
Doc Text:
glusterd can send heal related requests to self-heal daemon before the latter's graph is fully initialized. In this case, the self-heal daemon used to crash when trying to access certain data structures. With the fix, if the self-heal daemon receives a request before its graph is initialized, it ignores the request.
Clone Of:
Environment:
Last Closed: 2018-09-04 06:49:14 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Bugzilla 1595752 None CLOSED [GSS] Core dump getting created inside gluster pods 2019-04-09 06:16:46 UTC
Red Hat Product Errata RHSA-2018:2607 None None None 2018-09-04 06:50:42 UTC

Internal Links: 1595752

Description John Strunk 2018-06-21 17:27:20 UTC
Description of problem:
When gluster starts up after a reboot, sometimes self-heal daemon crashes. Result is that volumes don't heal until manual intervention to restart shd.


Version-Release number of selected component (if applicable):
rhgs 3.3.1

$ rpm -aq | grep gluster
gluster-nagios-common-0.2.4-1.el7rhgs.noarch
glusterfs-cli-3.8.4-54.10.el7rhgs.x86_64
glusterfs-geo-replication-3.8.4-54.10.el7rhgs.x86_64
glusterfs-client-xlators-3.8.4-54.10.el7rhgs.x86_64
glusterfs-api-3.8.4-54.10.el7rhgs.x86_64
python-gluster-3.8.4-54.10.el7rhgs.noarch
gluster-nagios-addons-0.2.10-2.el7rhgs.x86_64
pcp-pmda-gluster-4.1.0-0.201805281909.git68ab4b18.el7.x86_64
glusterfs-libs-3.8.4-54.10.el7rhgs.x86_64
glusterfs-fuse-3.8.4-54.10.el7rhgs.x86_64
vdsm-gluster-4.17.33-1.2.el7rhgs.noarch
libvirt-daemon-driver-storage-gluster-3.9.0-14.el7_5.5.x86_64
glusterfs-3.8.4-54.10.el7rhgs.x86_64
glusterfs-server-3.8.4-54.10.el7rhgs.x86_64
glusterfs-rdma-3.8.4-54.10.el7rhgs.x86_64



How reproducible:
Happens approximately 10% of the time on reboot


Steps to Reproduce:
1. Stop glusterd, bricks, and mounts as per admin guide
2. shutdown -r now
3. check gluster vol status post reboot

Actual results:
Approx 10% of the time, self-heal daemon will not be running, and the pid will be NA in gluster vol status


Expected results:
shd should start up and run properly after reboot


Additional info:

Comment 5 Atin Mukherjee 2018-07-02 04:01:28 UTC
upstream patch : https://review.gluster.org/20422

Comment 11 nchilaka 2018-07-25 15:00:15 UTC
testversion:3.12.2-14

tc#1 polarion RHG3-13523 -->PASS
1. create a replica 3 volume and start it.
2. `while true; do gluster volume heal <volname>;sleep 0.5; done` in one terminal.
3. In another terminal, keep running 'service glusterd restart`


I was seen crash frequently before fix, but now with fix, I didnt see this problem , after running test for an hour

hence moving to verified


However note hit other issues, for which bugs have been reported
BZ#1608352 - brick (glusterfsd) crashed at in pl_trace_flush
BZ#1607888 - backtrace seen in glusterd log when triggering glusterd restart on issuing of index heal (TC#RHG3-13523)


also retried steps in description


didnt hit the shd crash

Comment 14 Srijita Mukherjee 2018-09-03 13:34:06 UTC
Doc text looks good to me.

Comment 15 errata-xmlrpc 2018-09-04 06:49:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2607

Comment 16 Ravishankar N 2018-10-22 06:04:48 UTC
*** Bug 1519105 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.