Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 590898 - corosync blocks on exit with debug: on enabled
Summary: corosync blocks on exit with debug: on enabled
Alias: None
Product: Fedora
Classification: Fedora
Component: corosync
Version: rawhide
Hardware: All
OS: Linux
Target Milestone: ---
Assignee: Steven Dake
QA Contact: Fedora Extras Quality Assurance
Depends On:
TreeView+ depends on / blocked
Reported: 2010-05-10 21:58 UTC by Steven Dake
Modified: 2016-04-26 21:50 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2010-07-07 16:46:31 UTC

Attachments (Terms of Use)

Description Steven Dake 2010-05-10 21:58:18 UTC
Description of problem:
corosync gets stuck in shutdown

Version-Release number of selected component (if applicable):

How reproducible:
opensuse dependent

Steps to Reproduce:
Actual results:
locks up

Expected results:
doesn't lock up

Additional info:

User attached to process and found this backtrace of all threads:

Thread 3 (Thread 0x7f679067e910 (LWP 19541)):
#0  0x00007f6792c41da6 in logsys_worker_thread (data=<value optimized out>) at logsys.c:766
#1  0x00007f679261865d in start_thread () from /lib64/
#2  0x00007f6792183e1d in clone () from /lib64/
#3  0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7f679317dfb0 (LWP 19542)):
#0  0x00007f679261d965 in ?? () from /lib64/
#1  0x00000000004091b8 in prioritized_timer_thread (data=<value optimized out>) at timer.c:135
#2  0x00007f679261865d in start_thread () from /lib64/
#3  0x00007f6792183e1d in clone () from /lib64/
#4  0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7f67932756f0 (LWP 19540)):
#0  0x00007f679261996d in pthread_join () from /lib64/
#1  0x0000000000407595 in _corosync_exit_error (err=AIS_DONE_EXIT, file=<value optimized out>, line=<value optimized out>) at util.c:97
#2  0x0000000000406d3b in unlink_all_completed () at main.c:160
#3  0x0000000000408aa3 in service_exit_schedwrk_handler (data=0x7f679067e9e0) at service.c:614
#4  0x000000000040c64b in schedwrk_do (type=<value optimized out>, context=<value optimized out>) at schedwrk.c:77
#5  0x00007f6792e5b561 in token_callbacks_execute (type=<value optimized out>, instance=<value optimized out>) at totemsrp.c:3209
#6  message_handler_orf_token (type=<value optimized out>, instance=<value optimized out>) at totemsrp.c:3601
#7  0x00007f6792e51cd3 in rrp_deliver_fn (context=0x63e790, msg=0x661cd8, msg_len=70) at totemrrp.c:1393
#8  0x00007f6792e50cf2 in net_deliver_fn (handle=<value optimized out>, fd=<value optimized out>, revents=<value optimized out>, data=<value optimized out>) at totemudp.c:1223
#9  0x00007f6792e4cdda in poll_run (handle=2240235047305084928) at coropoll.c:396
#10 0x0000000000405c44 in main (argc=4, argv=<value optimized out>) at main.c:1556

Comment 1 Steven Dake 2010-05-10 22:06:35 UTC
logsys.c:766 is
                        log_rec_idx = record_read (buf, log_rec_idx, &log_msg);

What if this function is spinning.

In that case
logsys.c:785 would never call pthread_exit

and then the pthread_join in the main thread would not collect the exit status of the thread and block indefinately on exit.

a break statement that occurs when no messages are waiting for flushing

Comment 2 Steven Dake 2010-05-10 22:20:59 UTC
steps to reproduce
place debug: on in config file
service corosync start
wait 10 seconds
service corosync stop

generates exact stack trace above.

Comment 3 Jan Friesse 2010-05-11 08:46:28 UTC
From my debug it is really problem in logsys (overwriting own its memory).

Because of: <sdake> about got logsys rewritten, reassigning back to Steve.

Note You need to log in before you can comment on or make changes to this bug.