Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1516285 - collectd ceph plugin crashes
Summary: collectd ceph plugin crashes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: collectd
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
high
urgent
Target Milestone: Upstream M1
: 13.0 (Queens)
Assignee: Matthias Runge
QA Contact: Leonid Natapov
URL:
Whiteboard:
Depends On: 1558015
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-22 12:17 UTC by Matthias Runge
Modified: 2018-06-27 14:15 UTC (History)
6 users (show)

Fixed In Version: collectd-5.8.0-1.1.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-06-27 13:08:58 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2018:2084 None None None 2018-06-27 13:10:40 UTC
Github collectd collectd issues 2572 None None None 2017-11-23 15:39:54 UTC

Description Matthias Runge 2017-11-22 12:17:45 UTC
Description of problem:
Nov 22 12:53:34 euler systemd: Starting Collectd statistics daemon...
Nov 22 12:53:34 euler collectd[5437]: plugin_load: plugin "syslog" successfully loaded.
Nov 22 12:53:34 euler collectd: [2017-11-22 12:53:35] plugin_load: plugin "logfile" successfully loaded.
Nov 22 12:53:34 euler collectd[5437]: plugin_load: plugin "logfile" successfully loaded.
Nov 22 12:53:34 euler collectd[5437]: syslog: invalid loglevel [debug] defaulting to 'info'
Nov 22 12:53:34 euler collectd[5437]: plugin_load: plugin "cpu" successfully loaded.
Nov 22 12:53:34 euler collectd[5437]: plugin_load: plugin "interface" successfully loaded.
Nov 22 12:53:34 euler collectd[5437]: plugin_load: plugin "load" successfully loaded.
Nov 22 12:53:34 euler collectd[5437]: plugin_load: plugin "memory" successfully loaded.
Nov 22 12:53:34 euler collectd[5437]: plugin_load: plugin "write_graphite" successfully loaded.
Nov 22 12:53:34 euler collectd[5437]: plugin_load: plugin "ceph" successfully loaded.
Nov 22 12:53:34 euler collectd[5437]: plugin_load: plugin "df" successfully loaded.
Nov 22 12:53:34 euler collectd[5437]: plugin_load: plugin "disk" successfully loaded.
Nov 22 12:53:34 euler collectd[5437]: plugin_load: plugin "virt" successfully loaded.
Nov 22 12:53:34 euler collectd[5437]: Systemd detected, trying to signal readyness.
Nov 22 12:53:34 euler systemd: Started Collectd statistics daemon.
Nov 22 12:53:34 euler collectd[5437]: virt plugin: reader virt-0 initialized
Nov 22 12:53:34 euler collectd[5437]: Initialization complete, entering read-loop.
Nov 22 12:53:34 euler kernel: reader#3[5446]: segfault at 0 ip 00007f1bf12698dd sp 00007f1be5272e00 error
 4 in ceph.so[7f1bf1268000+5000]
Nov 22 12:53:34 euler abrt-hook-ccpp: Process 5437 (collectd) of user 0 killed by SIGSEGV - ignoring (repeated crash)
Nov 22 12:53:34 euler libvirtd: 2017-11-22 11:53:34.570+0000: 1461: error : virNetSocketReadWire:1793 : Cannot recv data: Connection reset by peer
Nov 22 12:53:34 euler systemd: collectd.service: main process exited, code=killed, status=11/SEGV
Nov 22 12:53:34 euler systemd: Unit collectd.service entered failed state.


Version-Release number of selected component (if applicable):
collectd-5.8

Comment 3 Matthias Runge 2017-11-27 13:36:49 UTC
There is a commit message in collect-5.8: https://github.com/collectd/collectd/commit/647ac31bf9db60b1685d6d8d25be65375ba85891#diff-20b37368527caaa7f0318870e8cefd51

"""
This patch is not backward compatible with previous ceph versions.
"""

Comment 5 Matthias Runge 2017-11-29 09:12:29 UTC
It seems, the crash only happens, when the option 
ConvertSpecialMetricTypes is set to true. Explicitly setting it to false, makes the plugin work even with older ceph releases.

Comment 6 Matthias Runge 2017-12-05 07:27:28 UTC
Proposed fix usptream: https://github.com/collectd/collectd/commit/de05fb53fad6bc998f585b704ca0caeadc14a035

Comment 14 Leonid Natapov 2018-02-19 09:01:49 UTC
Please,provide instructions how to test/configure

Thank you,

Comment 15 Matthias Runge 2018-02-19 10:54:36 UTC
https://collectd.org/documentation/manpages/collectd.conf.5.shtml#plugin_ceph

In my case, the ceph config file looks like:

<LoadPlugin ceph>
  Globals false
</LoadPlugin>  
<Plugin "ceph">
    LongRunAvgLatency false
    ConvertSpecialMetricTypes false
    <Daemon "osd.0">
      SocketPath "/var/run/ceph/ceph-osd.0.asok"
    </Daemon>
</Plugin>

You'd probably need to figure out, where your ceph*.asok is stored.

and tons of ceph related metrics will show up in grafana, all beginning with "ceph_"

Comment 17 Leonid Natapov 2018-05-30 07:31:33 UTC
[2018-05-30 07:00:39] plugin_load: plugin "ceph" successfully loaded.

Comment 19 errata-xmlrpc 2018-06-27 13:08:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2084


Note You need to log in before you can comment on or make changes to this bug.