Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1687292 - [OSP] Sometimes masters fail to get ignition from load balancer vm and got error "dial tcp <LB ip>:22623: i/o timeout"
Summary: [OSP] Sometimes masters fail to get ignition from load balancer vm and got er...
Keywords:
Status: NEW
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.1
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.2.0
Assignee: Flavio Percoco
QA Contact: Tomas Sedovic
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-11 08:40 UTC by weiwei jiang
Modified: 2019-04-11 07:02 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)

Description weiwei jiang 2019-03-11 08:40:15 UTC
Description of problem:
When trying to launch a ocp cluster on openstack with installer, bootstrap and api work well, but sometimes masters instance fail to fetch ignition from load balancer.

The temp machine-config-server on bootstrap work well from the outside of openstack
[openshift@dhcp-140-70 ~]$ curl -k  https://api.wjiang-ocp.shiftstack.com:22623/config/master -I
HTTP/2 200 
content-type: application/json
content-length: 46313
date: Mon, 11 Mar 2019 07:15:23 GMT

Boot log of one master instance:
[  801.234287] ignition[542]: GET https://api.wjiang-ocp.shiftstack.com:22623/config/master: attempt #27
[  831.235304] ignition[542]: GET error: Get https://api.wjiang-ocp.shiftstack.com:22623/config/master: dial tcp 10.0.76.127:22623: i/o timeout



Version-Release number of the following components:
[openshift@dhcp-140-70 installer]$ bin/openshift-install version 
bin/openshift-install unreleased-master-540-g12af0c9b8e6a090c041b19c2fb0c040188607bcb

How reproducible:
Sometimes

Steps to Reproduce:
1. Launch an OCP cluster with installer
2. Check the boot log of bootstrap, api and masters 
3.

Actual results:
Bootstrap and api work well for ignition service.
masters fail to fetch bootstrap config from temp master-config-server

Expected results:
master should also work well

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 1 Flavio Percoco 2019-03-13 11:39:14 UTC
Are you still seeing this? I haven't seen issues related to masters not getting the ignition config in a while

Comment 2 weiwei jiang 2019-03-15 10:02:38 UTC
(In reply to Flavio Percoco from comment #1)
> Are you still seeing this? I haven't seen issues related to masters not
> getting the ignition config in a while

Checked with
[openshift@dhcp-140-70 installer]$ bin/openshift-install version 
bin/openshift-install unreleased-master-560-g974d9b0848866f03d4dd8c577d8b7ef28756a1d5-dirty
built from commit 974d9b0848866f03d4dd8c577d8b7ef28756a1d5

But unfortunately got this https://bugzilla.redhat.com/show_bug.cgi?id=1687241#c2

Comment 3 weiwei jiang 2019-03-15 10:59:20 UTC
Checked again and met https://bugzilla.redhat.com/show_bug.cgi?id=1687241#c3

[openshift@dhcp-140-70 installer]$ bin/openshift-install version 
bin/openshift-install unreleased-master-560-g974d9b0848866f03d4dd8c577d8b7ef28756a1d5
built from commit 974d9b0848866f03d4dd8c577d8b7ef28756a1d5

(openstack) server list --name wjiang
+--------------------------------------+----------------------------+--------+-------------------------------------------------------+-------+----------------+
| ID                                   | Name                       | Status | Networks                                              | Image | Flavor         |
+--------------------------------------+----------------------------+--------+-------------------------------------------------------+-------+----------------+
| 78cdfb63-cd5e-4fc2-8f0c-e14be3e6d91f | wjiang-ocp-fvkd5-master-1  | ACTIVE | wjiang-ocp-fvkd5-openshift=192.168.0.11               | rhcos | ci.m1.medlarge |
| 984783ba-5303-4d35-a26a-fe7e9b784e3d | wjiang-ocp-fvkd5-master-2  | ACTIVE | wjiang-ocp-fvkd5-openshift=192.168.0.5                | rhcos | ci.m1.medlarge |
| 0f4812e9-9f18-4336-b1c8-5a356a90a8e1 | wjiang-ocp-fvkd5-master-0  | ACTIVE | wjiang-ocp-fvkd5-openshift=192.168.0.9                | rhcos | ci.m1.medlarge |
| e4347058-6889-4dc7-a5ad-d98115e468f4 | wjiang-ocp-fvkd5-api       | ACTIVE | wjiang-ocp-fvkd5-openshift=192.168.128.13, 10.0.77.71 | rhcos | ci.m1.medlarge |
| bfb430a2-a6b7-4899-9117-6bb3bca7a181 | wjiang-ocp-fvkd5-bootstrap | ACTIVE | wjiang-ocp-fvkd5-openshift=192.168.0.10               | rhcos | ci.m1.medlarge |
+--------------------------------------+----------------------------+--------+-------------------------------------------------------+-------+----------------+

Comment 4 weiwei jiang 2019-03-25 10:16:37 UTC
After I disable the creation of trunk for masters for upshift openstack, all work well.

DEBUG OpenShift Installer unreleased-master-601-g1c1b2bb6f64b25c3eccacd07f031a3ec5b2ab29d-dirty                                                                                                                                                                                 
DEBUG Built from commit 1c1b2bb6f64b25c3eccacd07f031a3ec5b2ab29d                                                                                                                                                                                                                
INFO Waiting up to 30m0s for the Kubernetes API at https://api.wjiang-ocp.shiftstack.com:6443...                                                                                                                                                                                
DEBUG Still waiting for the Kubernetes API: Get https://api.wjiang-ocp.shiftstack.com:6443/version?timeout=32s: dial tcp 10.0.76.214:6443: connect: connection refused                                                                                                          
DEBUG Still waiting for the Kubernetes API: Get https://api.wjiang-ocp.shiftstack.com:6443/version?timeout=32s: EOF                                                                                                                                                             
DEBUG Still waiting for the Kubernetes API: Get https://api.wjiang-ocp.shiftstack.com:6443/version?timeout=32s: EOF                                                                                                                                                             
DEBUG Still waiting for the Kubernetes API: Get https://api.wjiang-ocp.shiftstack.com:6443/version?timeout=32s: EOF                                                                                                                                                             
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource                                                                                                                                                                                    
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource
DEBUG Still waiting for the Kubernetes API: the server could not find the requested resource
DEBUG Still waiting for the Kubernetes API: Get https://api.wjiang-ocp.shiftstack.com:6443/version?timeout=32s: EOF
INFO API v1.12.4+8156b0c up
INFO Waiting up to 30m0s for the bootstrap-complete event...
DEBUG added kube-controller-manager.158f2bc94596490e: wjiang-ocp-ctshd-bootstrap.wjiang-ocp.shiftstack.com_f9e39ec3-4ee5-11e9-ad40-fa163ef33bb6 became leader                                                                                                                  
DEBUG added kube-scheduler.158f2bc97583d42a: wjiang-ocp-ctshd-bootstrap.wjiang-ocp.shiftstack.com_f5af79b9-4ee5-11e9-bb52-fa163ef33bb6 became leader                                                                                                                           
DEBUG modified kube-controller-manager.158f2bc94596490e: wjiang-ocp-ctshd-bootstrap.wjiang-ocp.shiftstack.com_f9e39ec3-4ee5-11e9-ad40-fa163ef33bb6 became leader                                                                                                               
DEBUG modified kube-scheduler.158f2bc97583d42a: wjiang-ocp-ctshd-bootstrap.wjiang-ocp.shiftstack.com_f5af79b9-4ee5-11e9-bb52-fa163ef33bb6 became leader                                                                                                                        
DEBUG added kube-controller-manager.158f2c018fddb679: wjiang-ocp-ctshd-bootstrap.wjiang-ocp.shiftstack.com_55e4225b-4ee6-11e9-8463-fa163ef33bb6 became leader                                                                                                                  
DEBUG added kube-scheduler.158f2c01dbc01f68: wjiang-ocp-ctshd-bootstrap.wjiang-ocp.shiftstack.com_54a353be-4ee6-11e9-bcbf-fa163ef33bb6 became leader                                                                                                                           
DEBUG added openshift-master-controllers.158f2c027a6b2309: controller-manager-rbxq9 became leader
DEBUG added bootstrap-success: Required control plane pods have been created
DEBUG added openshift-master-controllers.158f2c11437d5e36: controller-manager-5lcpm became leader
DEBUG added bootstrap-complete: cluster bootstrapping has completed
INFO Destroying the bootstrap resources...

Comment 5 weiwei jiang 2019-04-01 10:12:18 UTC
One work around here is to use service_port_ip even lb_ip is defined, to make the communication within cluster go through same network.

diff --git a/data/data/openstack/service/main.tf b/data/data/openstack/service/main.tf
index 534762e18..41a494ee1 100644
--- a/data/data/openstack/service/main.tf
+++ b/data/data/openstack/service/main.tf
@@ -200,7 +200,7 @@ $ORIGIN ${var.cluster_domain}.
                                 3600       ; minimum (1 hour)
                                 )
 
-${length(var.lb_floating_ip) == 0 ? "api  IN  A  ${var.service_port_ip}" : "api  IN  A  ${var.lb_floating_ip}"}
+api  IN  A  ${var.service_port_ip}
 ${length(var.lb_floating_ip) == 0 ? "*.apps  IN  A  ${var.service_port_ip}" : "*.apps  IN  A  ${var.lb_floating_ip}"}
 
 bootstrap.${var.cluster_domain}  IN  A  ${var.bootstrap_ip}

Comment 6 weiwei jiang 2019-04-02 11:06:40 UTC
This also block all the routes.

All the routes target to the external ip of load balancer, this make web console not work well, since it require authentication routes.

[openshift@dhcp-140-70 installer]$ oc -n openshift-console logs console-d9d875c95-tww2b 
2019/04/2 10:59:58 cmd/main: cookies are secure!
2019/04/2 11:00:03 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://openshift-authentication-openshift-authentication.apps.wjiang-ocp.shiftstack.com/oauth/token failed: Head https://openshift-authentication-openshift-authentication.apps.wjiang-ocp.shiftstack.com: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2019/04/2 11:00:18 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://openshift-authentication-openshift-authentication.apps.wjiang-ocp.shiftstack.com/oauth/token failed: Head https://openshift-authentication-openshift-authentication.apps.wjiang-ocp.shiftstack.com: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2019/04/2 11:00:33 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://openshift-authentication-openshift-authentication.apps.wjiang-ocp.shiftstack.com/oauth/token failed: Head https://openshift-authentication-openshift-authentication.apps.wjiang-ocp.shiftstack.com: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2019/04/2 11:00:48 auth: error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://openshift-authentication-openshift-authentication.apps.wjiang-ocp.shiftstack.com/oauth/token failed: Head https://openshift-authentication-openshift-authentication.apps.wjiang-ocp.shiftstack.com: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
[openshift@dhcp-140-70 installer]$ oc get pods -n openshift-console -o wide 
NAME                         READY   STATUS    RESTARTS   AGE    IP            NODE                                                  NOMINATED NODE
console-d9d875c95-chq7f      0/1     Running   22         105m   10.129.0.28   wjiang-ocp-5hrhk-master-1.wjiang-ocp.shiftstack.com   <none>
console-d9d875c95-tww2b      0/1     Running   22         105m   10.128.0.22   wjiang-ocp-5hrhk-master-0.wjiang-ocp.shiftstack.com   <none>
downloads-77f7688f6c-pjrkp   1/1     Running   0          105m   10.128.0.21   wjiang-ocp-5hrhk-master-0.wjiang-ocp.shiftstack.com   <none>
downloads-77f7688f6c-txp92   1/1     Running   0          105m   10.130.0.20   wjiang-ocp-5hrhk-master-2.wjiang-ocp.shiftstack.com   <none>


Note You need to log in before you can comment on or make changes to this bug.