Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 453600 - cluster-snmp deadlocks snmpd
Summary: cluster-snmp deadlocks snmpd
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: clustermon
Version: 5.2
Hardware: All
OS: Linux
Target Milestone: rc
: ---
Assignee: Ryan McCabe
QA Contact: Cluster QE
Depends On: 441947
TreeView+ depends on / blocked
Reported: 2008-07-01 14:41 UTC by Bryn M. Reeves
Modified: 2018-10-20 03:10 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 484880 (view as bug list)
Last Closed: 2009-01-20 20:51:36 UTC
Target Upstream Version:

Attachments (Terms of Use)
Set sock.nonblocking(true) in ClusterMonitor::get_cluster() (deleted)
2008-07-01 14:41 UTC, Bryn M. Reeves
no flags Details | Diff

System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2009:0086 normal SHIPPED_LIVE clustermon bug fix update 2009-01-20 16:04:23 UTC

Description Bryn M. Reeves 2008-07-01 14:41:13 UTC
Description of problem:
The SNMPD plugin for clustersuite uses the
ClusterMonitoring::ClusterMonitor::get_cluster() method to retrieve the cluster

This in turn calls ClientSocket::recv() -> read_restart().

The read_restart function is designed to fill a buffer with all data currently
buffered on the socket and to return when the underlying read() returns with EAGAIN.

This will only work if the socket has O_NONBLOCK set. Using this method on a
blocking socket will cause the thread calling get_cluster() to block
indefinitely waiting for additional data to arrive on the socket.

Version-Release number of selected component (if applicable):
0.10.0-5.el5 contains the defect but it is masked by bug 441947; rebuilding the
package to avoid the dlopen problem or using a later package (e.g. 0.12.0-7.el5)
allows the bug to be triggered.

How reproducible:

Steps to Reproduce:
1. Configure a cluster with snmpd enabled on the nodes
2. Enable cluster-snmp
3. Try to access a REDHAT-CLUSTER-MIB MIB, e.g. REDHAT-CLUSTER-MIB::rhcMIBVersion.0
Actual results:
$ cat /etc/snmp/snmpd.conf
dlmod RedHatCluster     /usr/lib/cluster-snmp/
rocommunity public
$ snmpwalk -v2c -c public localhost
[tons of output, works fine but doesn't show REDHAT-CLUSTER-MIB::RedHatCluster]
$ snmpwalk -v2c -c public localhost REDHAT-CLUSTER-MIB::RedHatCluster
Timeout: No Response from localhost
$ snmpwalk -v2c -c public localhost
Timeout: No Response from localhost

After this snmpd can only be interrupted by SIGKILL.

Expected results:
MIB output correctly, no hang of snmpd.

Additional info:
Analysis & proposed patch from Adrien Kunysz

Comment 1 Bryn M. Reeves 2008-07-01 14:41:13 UTC
Created attachment 310677 [details]
Set sock.nonblocking(true) in ClusterMonitor::get_cluster()

Comment 2 RHEL Product and Program Management 2008-07-01 15:27:11 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update

Comment 3 Ryan McCabe 2008-07-03 14:56:29 UTC
Thanks for the patch. Applied to the current CVS trees.

Comment 5 Brian Brock 2008-12-17 00:03:22 UTC
verified, snmp-walk'ed without error or hang

Comment 7 errata-xmlrpc 2009-01-20 20:51:36 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

Note You need to log in before you can comment on or make changes to this bug.