Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1694223 - waiting for Kubernetes API: context deadline exceeded
Summary: waiting for Kubernetes API: context deadline exceeded
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.1.0
Assignee: Abhinav Dahiya
QA Contact: Johnny Liu
URL:
Whiteboard:
Depends On:
Blocks: 1694567
TreeView+ depends on / blocked
 
Reported: 2019-03-29 20:01 UTC by Ben Parees
Modified: 2019-04-11 18:06 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-04-11 18:06:49 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Ben Parees 2019-03-29 20:01:34 UTC
Description of problem:
Installing from initial release registry.svc.ci.openshift.org/ocp/release:4.0.0-0.nightly-2019-03-28-030453
level=warning msg="Found override for ReleaseImage. Please be warned, this is not advised"
level=info msg="Consuming \"Install Config\" from target directory"
level=info msg="Creating infrastructure resources..."
level=info msg="Waiting up to 30m0s for the Kubernetes API at https://api.ci-op-gw21b8pn-0ffca.origin-ci-int-aws.dev.rhcloud.com:6443..."
level=fatal msg="waiting for Kubernetes API: context deadline exceeded"


https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/729


Three other versions of this bug have been closed as "insufficient data":
https://bugzilla.redhat.com/show_bug.cgi?id=1669811
https://bugzilla.redhat.com/show_bug.cgi?id=1669812
https://bugzilla.redhat.com/show_bug.cgi?id=1674079

So before closing this one for the same reason:  What needs to be fixed so we *do* get sufficient data?

1) Does our CI system need to retrieve additional artifacts?
2) Should the installer error give more instruction to the user about what should be gathered/investigated?

Comment 3 Abhinav Dahiya 2019-04-01 19:15:13 UTC
(In reply to Scott Dodson from comment #2)
> recurrence:
> https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/
> openshift_cluster-kube-apiserver-operator/364/pull-ci-openshift-cluster-kube-
> apiserver-operator-master-e2e-aws-operator/206/

looking at the bootkube.sh logs from https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_cluster-kube-apiserver-operator/364/pull-ci-openshift-cluster-kube-apiserver-operator-master-e2e-aws-operator/206/artifacts/e2e-aws-operator/bootstrap/bootkube.service

```
Apr 01 17:26:58 ip-10-0-8-185 bootkube.sh[4582]: Rendering Kubernetes API server core manifests...
Apr 01 17:26:59 ip-10-0-8-185 bootkube.sh[4582]: W0401 17:26:59.262848       1 generic_config_merger.go:42] yaml: line 99: could not find expected ':'
Apr 01 17:26:59 ip-10-0-8-185 bootkube.sh[4582]: F0401 17:26:59.262952       1 render.go:58] failed to generate bootstrap config (phase 1): failed to merge configs: invalid character 'a' looking for beginning of value
```

This looks like incorrect code change in the PR.

Comment 4 Abhinav Dahiya 2019-04-04 01:04:44 UTC
(In reply to Ben Parees from comment #0)
> Description of problem:
> Installing from initial release
> registry.svc.ci.openshift.org/ocp/release:4.0.0-0.nightly-2019-03-28-030453
> level=warning msg="Found override for ReleaseImage. Please be warned, this
> is not advised"
> level=info msg="Consuming \"Install Config\" from target directory"
> level=info msg="Creating infrastructure resources..."
> level=info msg="Waiting up to 30m0s for the Kubernetes API at
> https://api.ci-op-gw21b8pn-0ffca.origin-ci-int-aws.dev.rhcloud.com:6443..."
> level=fatal msg="waiting for Kubernetes API: context deadline exceeded"
> 
> 
> https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-
> openshift-origin-installer-e2e-aws-upgrade-4.0/729
> 
> 
> Three other versions of this bug have been closed as "insufficient data":
> https://bugzilla.redhat.com/show_bug.cgi?id=1669811
> https://bugzilla.redhat.com/show_bug.cgi?id=1669812
> https://bugzilla.redhat.com/show_bug.cgi?id=1674079
> 
> So before closing this one for the same reason:  What needs to be fixed so
> we *do* get sufficient data?
> 
> 1) Does our CI system need to retrieve additional artifacts?

CI already includes information from the bootstrap node like https://storage.googleapis.com/origin-ci-test/pr-logs/pull/openshift_installer/1515/pull-ci-openshift-installer-master-e2e-aws/4925/artifacts/e2e-aws/bootstrap/bootkube.service when bootstrapping fails like here https://openshift-gce-devel.appspot.com/build/origin-ci-test/pr-logs/pull/openshift_installer/1515/pull-ci-openshift-installer-master-e2e-aws/4925

> 2) Should the installer error give more instruction to the user about what
> should be gathered/investigated?

the installer already provides trouble shooting docs for this https://github.com/openshift/installer/blob/master/docs/user/troubleshooting.md#kubernetes-api-is-unavailable

Comment 5 Ben Parees 2019-04-04 05:55:15 UTC
> the installer already provides trouble shooting docs for this https://github.com/openshift/installer/blob/master/docs/user/troubleshooting.md#kubernetes-api-is-unavailable

Nothing in the output directs me to that doc.  How, as a customer, am i expected to discover or know about these instructions?  In a case where the installation has completely failed, I think we can give a user more immediate/helpful guidance than just expecting them to know about a troubleshooting guide hosted in a git repo.  Is there a reason not to dump those exact troubleshooting instructions in the output when this happens?

Comment 6 Scott Dodson 2019-04-04 13:46:35 UTC
Once we deliver https://jira.coreos.com/browse/CORS-1050 we should link to updated troubleshooting documentation that is based around walking the standard log bundle we gather.

Marking this as 4.1 so it's tracked as a blocker. I think the only thing that would bump it from 4.1 would be overwhelming evidence that customer success rate significantly outpaces our CI where we've got 60+ clusters being deployed into the same account at any given moment.

Comment 7 Brenton Leanhardt 2019-04-11 18:06:49 UTC
We're tracking this in Jira now.


Note You need to log in before you can comment on or make changes to this bug.