Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 456454 - qpidd segfault during RHTS run
Summary: qpidd segfault during RHTS run
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: qpid-cpp
Version: 1.0
Hardware: All
OS: Linux
urgent
high
Target Milestone: 1.0.1
: ---
Assignee: Kim van der Riet
QA Contact: Kim van der Riet
URL:
Whiteboard:
Depends On:
Blocks: 460113
TreeView+ depends on / blocked
 
Reported: 2008-07-23 19:36 UTC by Jeff Needle
Modified: 2008-10-06 19:08 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-10-06 19:08:21 UTC


Attachments (Terms of Use)
qpidd core dump (deleted)
2008-07-23 19:36 UTC, Jeff Needle
no flags Details
Core from 5.2 i386 run (deleted)
2008-07-24 13:24 UTC, Jeff Needle
no flags Details
Second qpidd core from 5.2 i386 run (deleted)
2008-07-24 13:25 UTC, Jeff Needle
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2008:0640 normal SHIPPED_LIVE Red Hat Enterprise MRG bug fix and enhancement update 2008-10-06 19:08:07 UTC

Description Jeff Needle 2008-07-23 19:36:43 UTC
Got this on the console:

qpidd[9845]: segfault at 00000000063cc000 rip 00002b0f7687844b rsp
0000000043145698 error 4

and caught the core dump, which is attached.  Off to find a qpidd with symbols
so I can get a meaningful backtrace.

qpidd-0.2.676581-1.el5

Comment 1 Jeff Needle 2008-07-23 19:36:43 UTC
Created attachment 312515 [details]
qpidd core dump

Comment 2 Jeff Needle 2008-07-23 19:42:09 UTC
qpidd.debug doesn't give me much more.  Deferring to the experts.

Core was generated by `/usr/sbin/qpidd --num-jfiles 8 --data-dir
/tmp/rhts_qpidd/qpid-data/pt_broker.8'.

Comment 4 David Sommerseth 2008-07-24 08:42:00 UTC
This seems to only happen on RHEL5 (i386 and x86_64).  RHEL4 do not seem to get
in such troubles.  It happens on boxes with 8 CPU cores.  

Wild guess (based on earlier chat with Andrew): Could it be connected to pthread
libraries?  Different pthread versions on RHEL4 and RHEL5?

Comment 5 Gordon Sim 2008-07-24 08:58:58 UTC
Stack trace from david:

#0  0x0053876c in memcpy () from /lib/libc.so.6
#1  0x001e2e54 in std::string::_Rep::_M_clone () from /usr/lib/libstdc++.so.6
#2  0x001e37b7 in std::basic_string<char, std::char_traits<char>,
std::allocator<char> >::basic_string () from /usr/lib/libstdc++.so.6
#3  0x0098972c in qpid::management::Journal::getPackageName () from
/usr/lib/qpidd/libbdbstore.so
#4  0x00e39726 in ?? ()
#5  0xb6ba0c6c in ?? ()
#6  0x09355308 in ?? ()
#7  0x00000001 in ?? ()
#8  0x00002b9c in ?? ()
#9  0xb6ba0c98 in ?? ()
#10 0x00534030 in free () from /lib/libc.so.6
#11 0x00982691 in qpid::management::Journal::writeStatistics () from
/usr/lib/qpidd/libbdbstore.so
#12 0x00e2afa3 in ?? ()
#13 0x09355308 in ?? ()
#14 0xb6ba0e50 in ?? ()
#15 0x00000000 in ?? ()


Comment 6 David Sommerseth 2008-07-24 09:04:47 UTC
On this last run, we got 2 core files.  Only one of them gave as much as the
comment #5 backtrace.

Both cores are equal on #0, and that's the only similarity.  

#0  0x005cf76c in memcpy () from /lib/libc.so.6
#1  0x00504874 in ?? ()
#2  0xa7a73014 in ?? ()
#3  0x0888b724 in ?? ()
#4  0x0888ba30 in ?? ()
#5  0x00557ff4 in ?? ()
#6  0x0888b718 in ?? ()
#7  0x08892c98 in ?? ()
#8  0xb6b8fb48 in ?? ()
#9  0x005051d7 in ?? ()
#10 0x0888b718 in ?? ()
#11 0xb6b8fb3f in ?? ()
#12 0x00000000 in ?? ()


Comment 7 Jeff Needle 2008-07-24 13:24:51 UTC
Created attachment 312556 [details]
Core from 5.2 i386 run

Comment 8 Jeff Needle 2008-07-24 13:25:43 UTC
Created attachment 312557 [details]
Second qpidd core from 5.2 i386 run

Comment 9 Kim van der Riet 2008-07-24 13:30:53 UTC
A partial backtrace of the attached core file (from #1 above) shows:
#0  0x00002b0f7687844b in ?? ()
#1  0x00002b0f761082b0 in ?? ()
#2  0x0000000005ed0100 in ?? ()
#3  0x0000000043145700 in ?? ()
#4  0x00000000431459e0 in ?? ()
#5  0x00002b0f761089af in ?? ()
#6  0x000000000000003d in ?? ()
#7  0x0000000000000024 in ?? ()
#8  0x0000000000610838 in std::string::_Rep::_S_empty_rep_storage ()
#9  0x0000000043145700 in ?? ()
#10 0x0000000005f2b600 in ?? ()
#11 0x00002b0f7725ea90 in qpid::management::Journal::getPackageName () from
/usr/lib64/qpidd/libbdbstore.so
#12 0x00002b0f741728aa in qpid::management::ManagementObject::writeTimestamps
(this=0x2aaab4000010, buf=@0x0) at qpid/management/ManagementObject.cpp:32
#13 0x00002b0f7725851d in qpid::management::Journal::writeStatistics () from
/usr/lib64/qpidd/libbdbstore.so
#14 0x00002b0f7416735b in qpid::management::ManagementBroker::PeriodicProcessing
(this=0x2aaaaaaab010) at qpid/management/ManagementBroker.cpp:314
#15 0x00002b0f74167938 in qpid::management::ManagementBroker::Periodic::fire
(this=0x2aaaac0563d0) at qpid/management/ManagementBroker.cpp:181
#16 0x00002b0f741575f5 in qpid::broker::Timer::run (this=0x2aaaaaaab138) at
qpid/broker/Timer.cpp:64
#17 0x00002b0f74500cda in qpid::sys::Thread::runRunnable (p=0x2aaab44fbea8) at
qpid/sys/posix/Thread.cpp:27
#18 0x00002b0f76b572f7 in ?? ()
#19 0x0000000000000000 in ?? ()

On the surface, this looks like a thread timing issue in management - ie a timer
is firing and making a call on a non-existent or deleted journal management
object (or part of an object, the crash seems to be happening on a std::string
operation of some sort) through qpid::management::Journal::writeStatistics()

Comment 10 Jeff Needle 2008-07-24 14:28:23 UTC
Playing the "Let's randomly install debuginfo packages until this is useful"
game (added gcc-debuginfo, glibc-debuginfo, and glibc-debuginfo-common) yields
this somewhat more useful trace for core.11161.  Fingers are starting to point
in Ted's general direction here...

Core was generated by `/usr/sbin/qpidd --num-jfiles 8 --data-dir
/tmp/rhts_qpidd/qpid-data/pt_broker.1'.
Program terminated with signal 11, Segmentation fault.
#0  0x0053876c in memcpy () from /lib/libc.so.6
(gdb) bt
#0  0x0053876c in memcpy () from /lib/libc.so.6
#1  0x001e2e54 in std::string::_Rep::_M_clone (this=0x9306718, 
    __alloc=@0xb6ba0c1f, __res=0)
    at
/usr/src/debug/gcc-4.1.2-20080102/obj-i386-redhat-linux/i386-redhat-linux/libstdc++-v3/include/bits/char_traits.h:269
#2  0x001e37b7 in basic_string (this=0xb6ba0c6c, __str=@0x9a9cec)
    at
/usr/src/debug/gcc-4.1.2-20080102/obj-i386-redhat-linux/i386-redhat-linux/libstdc++-v3/include/bits/basic_string.h:219
#3  0x0098972c in qpid::management::Journal::getPackageName ()
   from /usr/lib/qpidd/libbdbstore.so
#4  0x00e39726 in ?? ()
#5  0xb6ba0c6c in ?? ()
#6  0x09355308 in ?? ()
#7  0x00000001 in ?? ()
#8  0x00002b9c in ?? ()
#9  0xb6ba0c98 in ?? ()
#10 0x00534030 in *__GI___libc_free (mem=0x9355308) at malloc.c:3545
#11 0x00982691 in qpid::management::Journal::writeStatistics ()
   from /usr/lib/qpidd/libbdbstore.so
#12 0x00e2afa3 in ?? ()
#13 0x09355308 in ?? ()
#14 0xb6ba0e50 in ?? ()
#15 0x00000000 in ?? ()
Current language:  auto; currently c


Comment 11 Gordon Sim 2008-07-25 14:58:48 UTC
Interstingly packageName is a static string, and it seems to be when copying
that this problem occurs...

Comment 12 Kim van der Riet 2008-07-28 12:06:13 UTC
(In reply to comment #11)
> Interstingly packageName is a static string, and it seems to be when copying
> that this problem occurs...
But the function getPackageName() is itself not static (but it could be).



Comment 13 Gordon Sim 2008-07-28 13:29:27 UTC
It appears that the statics in the qpid::management::Journal are deleted before
the destructor of qpid::management::ManagementBroker is called. As the timer
controllef by the ManagementBroker instance is not stopped until that instance
is deleted, this means the thread could still invoke methods on the Journal
instance it has registered and some of these, notably getPackageName, access now
deleted statics.

Either we need to ensure that the ManagementBroker instance is always deleted
before the statics or at least we must ensure that the thread it controls is
stooped before those statics are deleted.

Comment 14 Gordon Sim 2008-07-28 13:30:21 UTC
Suggest either:

Index: src/qpidd.cpp
===================================================================
--- src/qpidd.cpp       (revision 680266)
+++ src/qpidd.cpp       (working copy)
@@ -272,6 +272,7 @@
             if (options->broker.port == 0)
                 cout << uint16_t(brokerPtr->getPort()) << endl;
             brokerPtr->run();
+            brokerPtr.reset();
             QPID_LOG(notice, "Shutting down.");
         }
         return 0;


or:

Index: src/qpid/management/ManagementBroker.cpp
===================================================================
--- src/qpid/management/ManagementBroker.cpp    (revision 680266)
+++ src/qpid/management/ManagementBroker.cpp    (working copy)
@@ -125,6 +125,7 @@

         broker->mExchange.reset ();
         broker->dExchange.reset ();
+        broker->timer.stop();
         agent.reset ();
     }
 }



Comment 15 Gordon Sim 2008-07-28 14:00:38 UTC
Latter patch from above applied to qpid.0-10 as r680362.

Comment 17 Frantisek Reznicek 2008-09-08 10:08:19 UTC
No more qpidd segfaults observed during MRG_Messaging/qpid_testmatrix1 runs.
No more qpidd seqfaults at all observed during RHTS testing.
See RHTS jobs 28372, 28374, 28425-9, 28432.

Comment 19 errata-xmlrpc 2008-10-06 19:08:21 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0640.html


Note You need to log in before you can comment on or make changes to this bug.