Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1696057 - Hawkular metrics fails readiness and liveliness probes
Summary: Hawkular metrics fails readiness and liveliness probes
Keywords:
Status: NEW
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Hawkular
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Jan Martiska
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-04 05:45 UTC by Aditya Deshpande
Modified: 2019-04-12 06:25 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)

Description Aditya Deshpande 2019-04-04 05:45:16 UTC
Description of problem:
Hawkular metrics is failing for readiness and liveliness probes.

The errors seen in the logs are as below.

hawkular-metrics logs

3 ERROR [org.jboss.as.controller.management-operation] (Controller Boot Thread) WFLYCTL0013: Operation ("deploy") failed - address: ([("deployment" => "hawkular-metrics.war")]) - failure description: {"WFLYCTL0080: Failed services" => {"jboss.undertow.deployment.default-server.default-host./hawkular/metrics" => "java.lang.RuntimeException: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: hawkular-cassandra/x.x.x.x:9042 (com.datastax.driver.core.exceptions.UnavailableException: Not enough replicas available for query at consistency LOCAL_ONE (1 required but only 0 alive)))

[[0m^[[xmyyyy-mm-dd xx:xx:xx,891 ERROR [org.jboss.as] (Controller Boot Thread) WFLYSRV0026: JBoss EAP 7.1.4.GA (WildFly Core 3.0.17.Final-redhat-1) started (with errors) in 10157ms - Started 445 of 692 services (2 services failed or missing dependencies, 372 services are lazy, passive or on-demand)

 FATAL [org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle] (metricsservice-lifecycle-thread) HAWKMETRICS200006: An error occurred trying to connect to the Cassandra cluster: java.lang.RuntimeException: java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: hawkular-cassandra/x.x.x.x:9042 (com.datastax.driver.core.exceptions.UnavailableException: Not enough replicas available for query at consistency LOCAL_ONE (1 required but only 0 alive)))


cassandra logs:

WARN  [OptionalTasks:1] yyyy-mm-dd xx:xx:52,234 CassandraRoleManager.java:359 - CassandraRoleManager skipped default role setup: some nodes were not ready
INFO  [OptionalTasks:1] yyyy-mm-dd xx:xx:52,234 CassandraRoleManager.java:398 - Setup task failed with error, rescheduling


Events:

30m       30m       1         hawkular-metrics-XXX   Pod       spec.containers{hawkular-metrics}   Warning   Unhealthy   kubelet, node   Liveness probe failed: Failed to access the status endpoint : <urlopen error [Errno 111] Connection refused>.
Traceback (most recent call last):
  File "/opt/hawkular/scripts/hawkular-metrics-liveness.py", line 48, in <module>
    if int(uptime) < int(timeout):
ValueError: invalid literal for int() with base 10: ''

30m       30m       1         hawkular-metrics-XXX   Pod       spec.containers{hawkular-metrics}   Warning   Unhealthy   kubelet, node   Readiness probe failed: Failed to access the status endpoint : <urlopen error [Errno 111] Connection refused>. This may be due to Hawkular Metrics not being ready yet. Will try again.

54m       55m       4         heapster-XXX   Pod       spec.containers{heapster}   Warning   Unhealthy   kubelet, node   Readiness probe failed: The heapster process is not yet started, it is waiting for the Hawkular Metrics to start.

54m       54m       1         heapster-XXX   Pod                     spec.containers{heapster}   Normal    Killing            kubelet, node   Killing container with id docker://heapster:Need to kill Pod


Version-Release number of selected component (if applicable):
OCP v3.10.119


Expected results:
Metrics should be in running state.


Note You need to log in before you can comment on or make changes to this bug.