Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1509634 - [WARN ][com.floragunn.searchguard.ssl.transport.SearchGuardSSLNettyTransport] [logging-es-data-master] exception caught on transport layer [[id: 0x5b9d58e6]], closing connection java.net.NoRouteToHostException: No route to host
Summary: [WARN ][com.floragunn.searchguard.ssl.transport.SearchGuardSSLNettyTransport]...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Logging
Version: 3.6.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.6.z
Assignee: Jan Wozniak
QA Contact: Anping Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-05 11:02 UTC by Vladislav Walek
Modified: 2018-03-22 02:25 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-03-22 02:25:36 UTC


Attachments (Terms of Use)

Description Vladislav Walek 2017-11-05 11:02:17 UTC
Description of problem:

Elastic search pod is running, however the other modules are not available. 
Possible issue as in https://bugzilla.redhat.com/show_bug.cgi?id=1475119

Only error message shown:
[WARN ][com.floragunn.searchguard.ssl.transport.SearchGuardSSLNettyTransport] [logging-es-data-master] exception caught on transport layer [[id: 0x5b9d58e6]], closing connection
java.net.NoRouteToHostException: No route to host
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
	at org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
...


Version-Release number of selected component (if applicable):
OpenShift Container Platform 3.6

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 Jan Wozniak 2017-11-06 12:01:36 UTC
Could you attach an output from 
1) origin-aggregated-logging/hack/logging-dump.sh
2) oc describe pod --namespace=logging | grep 'Image:' -A 1
3) oc get pod -o wide --namespace=logging
4) oc get node -o wide

And also try following to test if this is potentially networking issue:
 # exec to the malfunctioning pod which reports the exception
 $ oc rsh [container_producing_the_exeption]

 # try pinging the IPs from logs
 $ for ip in 10.130.2.30 10.131.2.37; do ping $ip -c 1; done

 # try 'makeshift' traceroute to the same IPs
 $ for ip in 10.130.2.30 10.131.2.37; do (echo >/dev/tcp/$ip/9300) &>/dev/null && echo "$ip open" || echo "$ip closed"; done

Comment 4 Jeff Cantrill 2017-11-06 15:19:20 UTC
Any change of getting router logs or api logs from the same time frame?

Comment 5 Vladislav Walek 2017-11-09 10:52:32 UTC
Hello Jan, Jeff,

the customer was able to connect to the pods IP. However, due the lock on the filesystem, the elastic search was not able to start.
After removing the data and clearing the PV, the issue came again that elastic search can't connect to pods.
As this was tested between pod <> pod, the assumption is that the Networking is not working correctly.
Asked for the OVS flows and the iptables from the node where ES pod was tested.

Will update the case when I will have more info.
Thx

Comment 13 Jeff Cantrill 2018-03-22 02:25:36 UTC
Please reopen if this is still an issue.


Note You need to log in before you can comment on or make changes to this bug.