Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1363595 - Node remains in stopped state in pcs status with "/usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 137: [: too many arguments ]" messages in logs.
Summary: Node remains in stopped state in pcs status with "/usr/lib/ocf/resource.d/hea...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: common-ha
Version: 3.9
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Kaleb KEITHLEY
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1372728 1373529
TreeView+ depends on / blocked
 
Reported: 2016-08-03 07:00 UTC by Shashank Raj
Modified: 2017-03-06 17:21 UTC (History)
6 users (show)

Fixed In Version: glusterfs-3.10.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1372728 (view as bug list)
Environment:
Last Closed: 2017-03-06 17:21:32 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Shashank Raj 2016-08-03 07:00:08 UTC
Description of problem:

One of the node remains in stopped state in pcs status with "/usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 137: [: too many arguments ]" messages in logs.

Version-Release number of selected component (if applicable):

[root@dhcp41-253 ~]# rpm -qa|grep glusterfs
glusterfs-3.8.1-0.4.git56fcf39.el7rhgs.x86_64
glusterfs-cli-3.8.1-0.4.git56fcf39.el7rhgs.x86_64
glusterfs-ganesha-3.8.1-0.4.git56fcf39.el7rhgs.x86_64
glusterfs-libs-3.8.1-0.4.git56fcf39.el7rhgs.x86_64
glusterfs-client-xlators-3.8.1-0.4.git56fcf39.el7rhgs.x86_64
glusterfs-fuse-3.8.1-0.4.git56fcf39.el7rhgs.x86_64
glusterfs-server-3.8.1-0.4.git56fcf39.el7rhgs.x86_64
glusterfs-geo-replication-3.8.1-0.4.git56fcf39.el7rhgs.x86_64
glusterfs-api-3.8.1-0.4.git56fcf39.el7rhgs.x86_64

[root@dhcp41-253 ~]# rpm -qa|grep ganesha
glusterfs-ganesha-3.8.1-0.4.git56fcf39.el7rhgs.x86_64
nfs-ganesha-gluster-2.4.0-0.14dev26.el7.centos.x86_64
nfs-ganesha-2.4.0-0.14dev26.el7.centos.x86_64

How reproducible:

Observed twice

Steps to Reproduce:
1. Try creating nfs-ganesha cluster on 4 nodes.
2. Observe that sometimes, after gluster nfs-ganesha enable, one of the nodes remains in stopped state in pcs status and below messages are seen in /var/log/messages:

Aug  3 12:22:10 dhcp41-253 lrmd[645]:  notice: nfs-mon_monitor_10000:7257:stderr [ /usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 137: [: too many arguments ]
Aug  3 12:22:25 dhcp41-253 lrmd[645]:  notice: nfs-mon_monitor_10000:7271:stderr [ /usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 137: [: too many arguments ]
Aug  3 12:22:40 dhcp41-253 lrmd[645]:  notice: nfs-mon_monitor_10000:7285:stderr [ /usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 137: [: too many arguments ]
Aug  3 12:22:55 dhcp41-253 lrmd[645]:  notice: nfs-mon_monitor_10000:7326:stderr [ /usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 137: [: too many arguments ]
Aug  3 12:23:10 dhcp41-253 lrmd[645]:  notice: nfs-mon_monitor_10000:7340:stderr [ /usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 137: [: too many arguments ]
Aug  3 12:23:25 dhcp41-253 lrmd[645]:  notice: nfs-mon_monitor_10000:7354:stderr [ /usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 137: [: too many arguments ]
Aug  3 12:23:40 dhcp41-253 lrmd[645]:  notice: nfs-mon_monitor_10000:7368:stderr [ /usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 137: [: too many arguments ]


pcs status output:

4 nodes and 16 resources configured

Online: [ dhcp41-206.lab.eng.blr.redhat.com dhcp41-253.lab.eng.blr.redhat.com dhcp43-133.lab.eng.blr.redhat.com dhcp43-181.lab.eng.blr.redhat.com ]

Full list of resources:

 Clone Set: nfs_setup-clone [nfs_setup]
     Started: [ dhcp41-206.lab.eng.blr.redhat.com dhcp41-253.lab.eng.blr.redhat.com dhcp43-133.lab.eng.blr.redhat.com dhcp43-181.lab.eng.blr.redhat.com ]
 Clone Set: nfs-mon-clone [nfs-mon]
     Started: [ dhcp41-206.lab.eng.blr.redhat.com dhcp41-253.lab.eng.blr.redhat.com dhcp43-133.lab.eng.blr.redhat.com dhcp43-181.lab.eng.blr.redhat.com ]
 Clone Set: nfs-grace-clone [nfs-grace]
     Started: [ dhcp41-206.lab.eng.blr.redhat.com dhcp43-133.lab.eng.blr.redhat.com dhcp43-181.lab.eng.blr.redhat.com ]
     Stopped: [ dhcp41-253.lab.eng.blr.redhat.com ]
 dhcp43-133.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr):        Started dhcp43-133.lab.eng.blr.redhat.com
 dhcp41-206.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr):        Started dhcp41-206.lab.eng.blr.redhat.com
 dhcp41-253.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr):        Started dhcp41-206.lab.eng.blr.redhat.com
 dhcp43-181.lab.eng.blr.redhat.com-cluster_ip-1 (ocf::heartbeat:IPaddr):        Started dhcp43-181.lab.eng.blr.redhat.com

Failed Actions:
* nfs-grace_monitor_0 on dhcp41-253.lab.eng.blr.redhat.com 'unknown error' (1): call=17, status=complete, exitreason='none',
    last-rc-change='Tue Aug  2 17:37:52 2016', queued=0ms, exec=55ms


PCSD Status:
  dhcp43-133.lab.eng.blr.redhat.com: Online
  dhcp41-206.lab.eng.blr.redhat.com: Online
  dhcp41-253.lab.eng.blr.redhat.com: Online
  dhcp43-181.lab.eng.blr.redhat.com: Online

Daemon Status:
  corosync: active/disabled
  pacemaker: active/disabled
  pcsd: active/enabled


Actual results:

One of the node remains in stopped state in pcs status with "/usr/lib/ocf/resource.d/heartbeat/ganesha_mon: line 137: [: too many arguments ]" messages in logs.

Expected results:

There should not be any errors in logs and all the nodes should be up

Additional info:

sosreports and logs will be attached.

Comment 1 Shashank Raj 2016-08-03 07:10:38 UTC
sosreports, ganesha logs and ganesha_mon script can be accessed under http://rhsqe-repo.lab.eng.blr.redhat.com/sosreports/1363595

Comment 2 Worker Ant 2016-09-02 13:41:45 UTC
REVIEW: http://review.gluster.org/15390 (common-ha: ganesha_mon: line 137: [: too many arguments ]" messages) posted (#1) for review on master by Kaleb KEITHLEY (kkeithle@redhat.com)

Comment 3 Worker Ant 2016-09-05 11:19:04 UTC
COMMIT: http://review.gluster.org/15390 committed in master by Niels de Vos (ndevos@redhat.com) 
------
commit 9c057750310b7e296624746bfeb909690320a2b3
Author: Kaleb S. KEITHLEY <kkeithle@redhat.com>
Date:   Fri Sep 2 09:33:16 2016 -0400

    common-ha: ganesha_mon: line 137: [: too many arguments ]" messages
    
    ensure that there are always valid, non-null arguments to /bin/test
    
    Here there be dragons. Very racy, but if the races lose, they lose
    in a way that's consistent with what we're testing for anyway, namely
    that the ganesha.nfsd process is gone.
    
    Change-Id: I88b770dd874ffa8576711f8009f27122a4fb0130
    BUG: 1363595
    Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com>
    Reviewed-on: http://review.gluster.org/15390
    Smoke: Gluster Build System <jenkins@build.gluster.org>
    NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.org>
    Reviewed-by: Niels de Vos <ndevos@redhat.com>

Comment 5 Shyamsundar 2017-03-06 17:21:32 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.10.0, please open a new bug report.

glusterfs-3.10.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://lists.gluster.org/pipermail/gluster-users/2017-February/030119.html
[2] https://www.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.