Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1511628 - Cassandra readiness probe can incorrectly fail in multi node setup
Summary: Cassandra readiness probe can incorrectly fail in multi node setup
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Hawkular
Version: 3.5.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 3.5.z
Assignee: Ruben Vargas Palma
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On: 1494673 1511629 1511631
Blocks: 1496228 1511627
TreeView+ depends on / blocked
 
Reported: 2017-11-09 18:36 UTC by Matt Wringe
Modified: 2018-10-08 08:52 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1494673
Environment:
Last Closed: 2018-10-08 08:52:58 UTC


Attachments (Terms of Use)

Description Matt Wringe 2017-11-09 18:36:11 UTC
+++ This bug was initially created as a clone of Bug #1494673 +++

Our Cassandra readiness probe will parse the output of 'nodetool status' to determine if the Cassandra instance is in the 'up' and 'normal' state.

Our string parsing of the output can have an issue in certain situations. If the string value of the current host's ip address is contained within the ip of another node in the cluster, then we will try and parse two lines of the output instead of just one.

For instance, consider the case where we have two nodes in our Cassandra cluster where their ip addresses are '172.17.0.3' and '172.17.0.3' ('72.17.0.3' and '172.17.0.3' would also cause a problem as well).

How we are parsing this output, our script would incorrectly try and handle both entries from 'nodetool status' instead of just the one.

This will cause the readiness probe to get unexpected information and fail.

If the pod is brought down and restarted, it should be granted a new ip address which should not conflict with the second ip address anymore and then be able to continue.

--- Additional comment from Matt Wringe on 2017-09-22 15:16:20 EDT ---

Simple PR which fixes this issue by checking for whitespace before and after the ip address, thus preventing the script from considering the ip address the same: https://github.com/openshift/origin-metrics/pull/380

--- Additional comment from Junqi Zhao on 2017-09-30 05:29:33 EDT ---

@Matt
Which image version contain the fix?
Do we still need to verify it failed with previous versions?

--- Additional comment from Junqi Zhao on 2017-09-30 06:07:23 EDT ---

Tested with currently latest image:metrics-cassandra-v3.7.0-0.135.0.0, it returned "Cassandra is in the up and normal state"

--- Additional comment from Matt Wringe on 2017-10-06 13:24:36 EDT ---

(In reply to Junqi Zhao from comment #2)
> @Matt
> Which image version contain the fix?

The latest 3.7 release should have this fixed.

> Do we still need to verify it failed with previous versions?

Its the exact same change as https://bugzilla.redhat.com/show_bug.cgi?id=1496228 which I believe was verified there.

--- Additional comment from Junqi Zhao on 2017-10-08 20:38:48 EDT ---

Closed based on Comment 3 and Comment 4

Comment 2 Junqi Zhao 2018-01-10 06:33:03 UTC
Tested with metrics-cassandra-docker-3.5.0-52, test steps followed https://bugzilla.redhat.com/show_bug.cgi?id=1496228#c5,  it returned "Cassandra is in the up and normal state"

sh-4.2$ source /opt/apache-cassandra/bin/cassandra-docker-ready.sh
Cassandra is in the up and normal state. It is now ready.


Set this defect to VERIFIED


Note You need to log in before you can comment on or make changes to this bug.