Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1366232 - qdrouterd segfault with "double free or corruption" in pn_class_decref
Summary: qdrouterd segfault with "double free or corruption" in pn_class_decref
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Satellite 6
Classification: Red Hat
Component: katello-agent
Version: 6.2.0
Hardware: x86_64
OS: Linux
medium
high with 1 vote vote
Target Milestone: 6.2
Assignee: Ted Ross
QA Contact: Perry Gagne
URL:
Whiteboard:
: 1366231 1385890 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-11 10:38 UTC by Pavel Moravec
Modified: 2018-12-06 20:49 UTC (History)
22 users (show)

Fixed In Version: qpid-dispatch-0.4-17
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-10 08:13:35 UTC


Attachments (Terms of Use)
(gdb) thread apply all bt (deleted)
2016-08-25 07:54 UTC, Jan Hutař
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:2699 normal SHIPPED_LIVE Satellite 6.2.4 Async Bug Release 2016-11-10 13:12:22 UTC

Description Pavel Moravec 2016-08-11 10:38:48 UTC
Description of problem:
Under unknown circumstances (some events pointed below), qdrouterd segfaulted when connecting many clients to it.



Version-Release number of selected component (if applicable):
libqpid-dispatch-0.4-13.el7sat.x86_64
qpid-dispatch-router-0.4-13.el7sat.x86_64
qpid-proton-c-0.9-16.el7.x86_64


How reproducible:
???


Steps to Reproduce:
???


Actual results:
segfault with backtrace:

(gdb) bt
#0  0x00007f9a412f95f7 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f9a412face8 in __GI_abort () at abort.c:90
#2  0x00007f9a41339327 in __libc_message (do_abort=do_abort@entry=2, fmt=fmt@entry=0x7f9a41443488 "*** Error in `%s': %s: 0x%s ***\n")
    at ../sysdeps/unix/sysv/linux/libc_fatal.c:196
#3  0x00007f9a41341053 in malloc_printerr (ar_ptr=0x7f99b4000020, ptr=<optimized out>, 
    str=0x7f9a41443588 "double free or corruption (!prev)", action=3) at malloc.c:5022
#4  _int_free (av=0x7f99b4000020, p=<optimized out>, have_lock=0) at malloc.c:3842
#5  0x00007f9a4208d806 in pn_class_decref (clazz=0x7f9a422c12e0 <clazz.4933>, object=0x7f99b402af60)
    at /usr/src/debug/qpid-proton-0.9/proton-c/src/object/object.c:103
#6  0x00007f9a4209b580 in pn_event_finalize (event=0x7f99d40847f0) at /usr/src/debug/qpid-proton-0.9/proton-c/src/events/event.c:190
#7  pn_event_finalize_cast (object=0x7f99d40847f0) at /usr/src/debug/qpid-proton-0.9/proton-c/src/events/event.c:235
#8  0x00007f9a4208d7e8 in pn_class_decref (clazz=0x7f9a422c1460 <clazz.2272>, object=0x7f99d40847f0)
    at /usr/src/debug/qpid-proton-0.9/proton-c/src/object/object.c:97
#9  0x00007f9a4208da12 in pn_decref (object=<optimized out>) at /usr/src/debug/qpid-proton-0.9/proton-c/src/object/object.c:252
#10 0x00007f9a4209b722 in pn_collector_pop (collector=collector@entry=0x20dad80)
    at /usr/src/debug/qpid-proton-0.9/proton-c/src/events/event.c:167
#11 0x00007f9a422daf00 in process_handler (unused=<optimized out>, qd_conn=0x7f9a2800cb30, container=0x1fd3e20)
    at /usr/src/debug/qpid-dispatch-0.4/src/container.c:422
#12 handler (handler_context=0x1fd3e20, conn_context=<optimized out>, event=event@entry=QD_CONN_EVENT_PROCESS, qd_conn=0x7f9a2800cb30)
    at /usr/src/debug/qpid-dispatch-0.4/src/container.c:486
#13 0x00007f9a422edb9c in process_connector (cxtr=0x7f9a28010270, qd_server=0x1fe37d0)
    at /usr/src/debug/qpid-dispatch-0.4/src/server.c:398
#14 thread_run (arg=<optimized out>) at /usr/src/debug/qpid-dispatch-0.4/src/server.c:626
#15 0x00007f9a41e5fdc5 in start_thread (arg=0x7f9a227f4700) at pthread_create.c:308
#16 0x00007f9a413baced in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113


Expected results:
no segfault


Additional info:
/var/log/messages from relevant time:

Aug 10 08:29:10 ip-10-1-1-2 qdrouterd: Wed Aug 10 08:29:10 2016 ROUTER_LS (info) Router Link Lost - link_id=3
Aug 10 08:29:10 ip-10-1-1-2 qpidd: 2016-08-10 08:29:10 [Protocol] error Error on attach: Node not found: pulp.agent.bb79963e-92e2-4020-a0db-d34d082b0eb7

(the error on attach repeated multiple times, until..)
Aug 10 08:29:11 ip-10-1-1-2 qdrouterd: *** Error in `/usr/sbin/qdrouterd': double free or corruption (!prev): 0x00007f99b402af50 ***

Comment 3 Pavel Moravec 2016-08-11 11:00:30 UTC
Standalone reproducer:

1) Link routing to qpidd to route pulp.*
2) Run below script 10 times in parallel - it tries to create a receiver to qdrouterd/qpidd but the broker does not have such a queue (i.e. "Node not found" error printed by qpidd):

#!/usr/bin/python

from time import sleep
from uuid import uuid4
from proton.utils import BlockingConnection, LinkDetached

routerURL = "proton+amqp://0.0.0.0:5648"

conn = BlockingConnection(routerURL, ssl_domain=None, heartbeat=2)

while True:
  sleep(0.05)
  try:
    rcv = conn.create_receiver("pulp."+str(uuid4()), name=str(uuid4()))
    rcv.close()
  except LinkDetached, e:
    print e
    if conn:
      conn.close()
      conn = BlockingConnection(routerURL, ssl_domain=None, heartbeat=2)

<end-of-the-script>


This segfault is usually not expected to happen in Sat6 environment. Since it relies on _missing_ pulp.agent.* queue that goferd tries to subscribe to. Usually, goferd should create its queue during startup..

Comment 4 Pavel Moravec 2016-08-11 11:18:33 UTC
*** Bug 1366231 has been marked as a duplicate of this bug. ***

Comment 8 Jeff Ortel 2016-08-16 15:09:25 UTC
May need to keep this assigned to tross.  The mitigation possible by goferd is to re-create the queue when getting LinkDetached with condition = amqp:not-found.  This means goferd could still try to create a receiver (Link) when the queue does not exist and crash the router.

Note: This can only happen in cases where the queue existed (or was created by goferd on startup) and then disappeared.

Comment 9 Pavel Moravec 2016-08-16 17:54:30 UTC
(In reply to Jeff Ortel from comment #8)
> May need to keep this assigned to tross.  The mitigation possible by goferd
> is to re-create the queue when getting LinkDetached with condition =
> amqp:not-found.  This means goferd could still try to create a receiver
> (Link) when the queue does not exist and crash the router.
> 
> Note: This can only happen in cases where the queue existed (or was created
> by goferd on startup) and then disappeared.

+1.

The primary problem is qdrouterd segfaulting in some scenario. goferd can be improved like Jeff suggests since the repeated link failures from the same agent increased probability of the failure/segfault.

Comment 14 Jan Hutař 2016-08-25 07:54:16 UTC
Created attachment 1193891 [details]
(gdb) thread apply all bt

Comment 27 errata-xmlrpc 2016-11-10 08:13:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:2699

Comment 28 Andrew Kofink 2017-01-05 14:26:34 UTC
*** Bug 1385890 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.