Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1595752 - [GSS] Core dump getting created inside gluster pods
Summary: [GSS] Core dump getting created inside gluster pods
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: replicate
Version: rhgs-3.3
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
: RHGS 3.3.1 Async
Assignee: Ravishankar N
QA Contact: nchilaka
URL:
Whiteboard:
Depends On: 1596513 1597229 1597230
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-27 13:26 UTC by Abhishek Kumar
Modified: 2018-07-19 06:01 UTC (History)
15 users (show)

Fixed In Version: glusterfs-3.8.4-54.14
Doc Type: Bug Fix
Doc Text:
Previously, glusterd could not check if the daemons it initiated, were fully initialized before sending them requests. Hence, if glusterd sends the index heal request from CLI to the self-heal daemon before it fully initializes its graph, the self-heal daemon would crash. This update fixes the self-heal daemon by ignoring the requests it receives from glusterd before graph is initialized. Thus, the CLI fails the command when user launches the index heal via. gluster CLI.
Clone Of:
Environment:
Last Closed: 2018-07-19 06:00:07 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Bugzilla 1593865 None CLOSED shd crash on startup 2019-04-09 06:16:47 UTC
Red Hat Product Errata RHBA-2018:2222 None None None 2018-07-19 06:01:41 UTC

Internal Links: 1593865

Description Abhishek Kumar 2018-06-27 13:26:50 UTC
Description of problem:

Randomly on the glusterfs pods, it crashes, creates a core.* file and fills up the  / partition, at this point the gluster pod stops working and we have to manually kill it.

Version-Release number of selected component (if applicable):

CNS 3.9

How reproducible:

Customer environment



Additional info:

Comment 3 Abhishek Kumar 2018-06-27 13:30:28 UTC
# gdb /usr/sbin/glusterfs core.36187 
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-110.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/sbin/glusterfsd...
warning: the debug information found in "/usr/lib/debug//usr/sbin/glusterfsd.debug" does not match "/usr/sbin/glusterfsd" (CRC mismatch).


warning: the debug information found in "/usr/lib/debug/usr/sbin/glusterfsd.debug" does not match "/usr/sbin/glusterfsd" (CRC mismatch).

Missing separate debuginfo for /usr/sbin/glusterfsd
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/e7/fa7c0b09c86663966ceeb6320e43e760a521ba.debug
Reading symbols from /usr/sbin/glusterfsd...(no debugging symbols found)...done.
(no debugging symbols found)...done.

warning: core file may not match specified executable file.
[New LWP 36191]
[New LWP 36187]
[New LWP 36192]
[New LWP 36188]
[New LWP 36189]
[New LWP 36193]
[New LWP 36190]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gl'.
Program terminated with signal 11, Segmentation fault.
#0  0x0000557e5ca70051 in glusterfs_handle_translator_op ()
(gdb) thread apply all bt

Thread 7 (Thread 0x7fa5ff44d700 (LWP 36190)):
#0  0x00007fa6010154fd in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007fa601015394 in __sleep (seconds=0, seconds@entry=30) at ../sysdeps/unix/sysv/linux/sleep.c:137
#2  0x00007fa60294c3fd in pool_sweeper (arg=<optimized out>) at mem-pool.c:464
#3  0x00007fa601785dd5 in start_thread (arg=0x7fa5ff44d700) at pthread_create.c:308
#4  0x00007fa60104eb3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 6 (Thread 0x7fa5fc18d700 (LWP 36193)):
#0  0x00007fa60104f113 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007fa6029806d2 in event_dispatch_epoll_worker (data=0x557e5ebf74e0) at event-epoll.c:638
#2  0x00007fa601785dd5 in start_thread (arg=0x7fa5fc18d700) at pthread_create.c:308
#3  0x00007fa60104eb3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 5 (Thread 0x7fa5ffc4e700 (LWP 36189)):
#0  0x00007fa60178d411 in do_sigwait (sig=0x7fa5ffc4de1c, set=<optimized out>) at ../sysdeps/unix/sysv/linux/sigwait.c:61
#1  __sigwait (set=0x7fa5ffc4de20, sig=0x7fa5ffc4de1c) at ../sysdeps/unix/sysv/linux/sigwait.c:99
#2  0x0000557e5ca6c07b in glusterfs_sigwaiter ()
#3  0x00007fa601785dd5 in start_thread (arg=0x7fa5ffc4e700) at pthread_create.c:308
#4  0x00007fa60104eb3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 4 (Thread 0x7fa60044f700 (LWP 36188)):
#0  0x00007fa60178ceed in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007fa602932f2e in gf_timer_proc (data=0x557e5ebb9250) at timer.c:176
#2  0x00007fa601785dd5 in start_thread (arg=0x7fa60044f700) at pthread_create.c:308
#3  0x00007fa60104eb3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 3 (Thread 0x7fa5fe44b700 (LWP 36192)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00007fa60295e9d8 in syncenv_task (proc=proc@entry=0x557e5ebba770) at syncop.c:603
#2  0x00007fa60295f820 in syncenv_processor (thdata=0x557e5ebba770) at syncop.c:695
#3  0x00007fa601785dd5 in start_thread (arg=0x7fa5fe44b700) at pthread_create.c:308
#4  0x00007fa60104eb3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 2 (Thread 0x7fa602e05780 (LWP 36187)):
#0  0x00007fa601786f47 in pthread_join (threadid=140350875817728, thread_return=thread_return@entry=0x0) at pthread_join.c:92
#1  0x00007fa602980b90 in event_dispatch_epoll (event_pool=0x557e5ebb2f40) at event-epoll.c:732
#2  0x0000557e5ca68ea3 in main ()

Thread 1 (Thread 0x7fa5fec4c700 (LWP 36191)):
#0  0x0000557e5ca70051 in glusterfs_handle_translator_op ()
#1  0x00007fa60295c4a2 in synctask_wrap (old_task=<optimized out>) at syncop.c:375
#2  0x00007fa600f97fc0 in ?? () from /lib64/libc.so.6
#3  0x0000000000000000 in ?? ()

Comment 4 Abhishek Kumar 2018-06-27 13:31:49 UTC
# gdb /usr/sbin/glusterfs core.121082 
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-110.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/sbin/glusterfsd...
warning: the debug information found in "/usr/lib/debug//usr/sbin/glusterfsd.debug" does not match "/usr/sbin/glusterfsd" (CRC mismatch).


warning: the debug information found in "/usr/lib/debug/usr/sbin/glusterfsd.debug" does not match "/usr/sbin/glusterfsd" (CRC mismatch).

Missing separate debuginfo for /usr/sbin/glusterfsd
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/e7/fa7c0b09c86663966ceeb6320e43e760a521ba.debug
Reading symbols from /usr/sbin/glusterfsd...(no debugging symbols found)...done.
(no debugging symbols found)...done.

warning: core file may not match specified executable file.
[New LWP 121086]
[New LWP 121087]
[New LWP 121088]
[New LWP 121083]
[New LWP 121082]
[New LWP 121084]
[New LWP 121085]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gl'.
Program terminated with signal 11, Segmentation fault.
#0  0x000055c8eebda051 in glusterfs_handle_translator_op ()
(gdb) thread apply all bt

Thread 7 (Thread 0x7f39a1d1f700 (LWP 121085)):
#0  0x00007f39a38e74fd in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f39a38e7394 in __sleep (seconds=0, seconds@entry=30) at ../sysdeps/unix/sysv/linux/sleep.c:137
#2  0x00007f39a521e3fd in pool_sweeper (arg=<optimized out>) at mem-pool.c:464
#3  0x00007f39a4057dd5 in start_thread (arg=0x7f39a1d1f700) at pthread_create.c:308
#4  0x00007f39a3920b3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 6 (Thread 0x7f39a2520700 (LWP 121084)):
#0  0x00007f39a405f411 in do_sigwait (sig=0x7f39a251fe1c, set=<optimized out>) at ../sysdeps/unix/sysv/linux/sigwait.c:61
#1  __sigwait (set=0x7f39a251fe20, sig=0x7f39a251fe1c) at ../sysdeps/unix/sysv/linux/sigwait.c:99
#2  0x000055c8eebd607b in glusterfs_sigwaiter ()
#3  0x00007f39a4057dd5 in start_thread (arg=0x7f39a2520700) at pthread_create.c:308
#4  0x00007f39a3920b3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 5 (Thread 0x7f39a56d7780 (LWP 121082)):
#0  0x00007f39a4058f47 in pthread_join (threadid=139885451540224, thread_return=thread_return@entry=0x0) at pthread_join.c:92
#1  0x00007f39a5252b90 in event_dispatch_epoll (event_pool=0x55c8eee26f40) at event-epoll.c:732
#2  0x000055c8eebd2ea3 in main ()

Thread 4 (Thread 0x7f39a2d21700 (LWP 121083)):
#0  0x00007f39a405eeed in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f39a5204f2e in gf_timer_proc (data=0x55c8eee2d250) at timer.c:176
#2  0x00007f39a4057dd5 in start_thread (arg=0x7f39a2d21700) at pthread_create.c:308
#3  0x00007f39a3920b3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 3 (Thread 0x7f399ea5f700 (LWP 121088)):
#0  0x00007f39a3921113 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f39a52526d2 in event_dispatch_epoll_worker (data=0x55c8eee6b4e0) at event-epoll.c:638
#2  0x00007f39a4057dd5 in start_thread (arg=0x7f399ea5f700) at pthread_create.c:308
#3  0x00007f39a3920b3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 2 (Thread 0x7f39a0d1d700 (LWP 121087)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00007f39a52309d8 in syncenv_task (proc=proc@entry=0x55c8eee2e770) at syncop.c:603
#2  0x00007f39a5231820 in syncenv_processor (thdata=0x55c8eee2e770) at syncop.c:695
#3  0x00007f39a4057dd5 in start_thread (arg=0x7f39a0d1d700) at pthread_create.c:308
#4  0x00007f39a3920b3d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113

Thread 1 (Thread 0x7f39a151e700 (LWP 121086)):
#0  0x000055c8eebda051 in glusterfs_handle_translator_op ()
#1  0x00007f39a522e4a2 in synctask_wrap (old_task=<optimized out>) at syncop.c:375
#2  0x00007f39a3869fc0 in ?? () from /lib64/libc.so.6
#3  0x0000000000000000 in ?? ()

Comment 13 Atin Mukherjee 2018-07-03 08:31:58 UTC
upstream patch : https://review.gluster.org/20422

Comment 16 Ravishankar N 2018-07-04 06:13:25 UTC
Downstream patch on rhgs-3.3.1 branch: https://code.engineering.redhat.com/gerrit/#/c/143109/

Comment 20 Ravishankar N 2018-07-05 11:13:55 UTC
I'm not an expert on CNS work flow, so I cannot comment on that. But if you have a consistent reproducer that gives the same shd crash+backtrace, I suppose it should be fine. FWIW, the steps I carried out on a plain glusterfs setup (no cns) is described here https://bugzilla.redhat.com/show_bug.cgi?id=1596513#c0.

Comment 21 errata-xmlrpc 2018-07-06 03:09:53 UTC
Bug report changed to ON_QA status by Errata System.
A QE request has been submitted for advisory RHBA-2018:34436-01
https://errata.devel.redhat.com/advisory/34436

Comment 24 nchilaka 2018-07-10 14:07:06 UTC
I have run the steps as mentioned in c#20 ie as below

1. create a replica 2 volume and start it.
2. `while true; do gluster volume heal <volname>;sleep 0.5; done` in one terminal.
3. In another terminal, keep running 'service glusterd restart`


I was seen crash frequently before fix, but now with fix, I didnt see this problem , after running test for an hour

hence moving to verified


test version: 3.8.4-54.14

Comment 27 Ravishankar N 2018-07-17 14:20:51 UTC
LGTM.

Comment 29 errata-xmlrpc 2018-07-19 06:00:07 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2222


Note You need to log in before you can comment on or make changes to this bug.