Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1025321 - Corosync crash running cpg-init-load test
Summary: Corosync crash running cpg-init-load test
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: corosync
Version: 6.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: rc
: ---
Assignee: Jan Friesse
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On: 1055584
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-10-31 13:36 UTC by Jan Friesse
Modified: 2014-10-14 07:11 UTC (History)
5 users (show)

Fixed In Version: corosync-1.4.1-18.el6
Doc Type: Bug Fix
Doc Text:
Cause: Application call cpg_finalize (corosync cpg API). Consequence: Corosync (in very rare circumstances) can segfault. Fix: the finalize function is called from a different thread to the init and exit functions so, on a busy system, we can get list corruption. Solution is to handle cpg list removal in same thread as cpg_init. Result: Calling cpg_finalize shouldn't result is corosync segfault.
Clone Of:
Environment:
Last Closed: 2014-10-14 07:11:44 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2014:1508 normal SHIPPED_LIVE corosync bug fix update 2014-10-14 01:22:31 UTC

Description Jan Friesse 2013-10-31 13:36:53 UTC
Description of problem:
Running https://github.com/jfriesse/csts/blob/master/apps/cpg-init-load.c from time to time causes segfault of corosync. Usually it's *** glibc detected *** corosync: free(): invalid pointer: 0x0000000001c9ae10 *** but it can be double-free, ...

Version-Release number of selected component (if applicable):
Upstream flatiron + RHEL 6.5

How reproducible:
0.000000000001%

Steps to Reproduce:
1. Execute cpg-init-load in cycle

Actual results:
Corosync segault

Expected results:
No segfault

Additional info:
I was trying to find out WHAT is happening by using:
- valgrind - no results. After 24 hours of running, valgrind didn't showed any error
- Duma (ElectrictFence) - Works without any problem
- MALLOC_CHECK_=3 - Shows problem, usually with following bt:
#0  0x00007ffd8d39e8a5 in raise () from /lib64/libc.so.6
#1  0x00007ffd8d3a0085 in abort () from /lib64/libc.so.6
#2  0x00007ffd8d3dc7b7 in __libc_message () from /lib64/libc.so.6
#3  0x00007ffd8d3e20e6 in malloc_printerr () from /lib64/libc.so.6
#4  0x00007ffd8b744819 in _clear_object (instance=0x1c9a740) at objdb.c:687
#5  0x00007ffd8b7449a2 in object_destroy (object_handle=3522962077188620419) at objdb.c:745
#6  0x00000000004075ba in corosync_stats_destroy_connection (handle=3522962077188620419) at main.c:1259
#7  0x00007ffd8dd2a5af in conn_info_destroy (conn_info=0x21a8bd0) at coroipcs.c:521
#8  0x00007ffd8dd2cce4 in coroipcs_handler_dispatch (fd=75, revent=17, context=0x21a8bd0) at coroipcs.c:1642
#9  0x00000000004071ef in corosync_poll_handler_dispatch (handle=3698059312501882880, fd=75, revent=17, context=0x21a8bd0)
    at main.c:1135
#10 0x00007ffd8e1406cc in poll_run (handle=3698059312501882880) at coropoll.c:513
#11 0x0000000000408e86 in main (argc=2, argv=0x7fff69719188, envp=0x7fff697191a0) at main.c:1941

My theory (for now) is that ether object_destroy is called multiple times or (this is more probable) memory is overwritten somewhere else.

Comment 2 Christine Caulfield 2014-01-07 15:42:29 UTC
commit 3c11ea7b84c109e6f8451229437351c5a14c7168
Author: Christine Caulfield <ccaulfie@redhat.com>
Date:   Tue Jan 7 15:38:41 2014 +0000

    cpg: Avoid list corruption

Comment 6 errata-xmlrpc 2014-10-14 07:11:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-1508.html


Note You need to log in before you can comment on or make changes to this bug.