Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1690043 - APIServer should return a structured error and retry-after for graceful shutdown errors
Summary: APIServer should return a structured error and retry-after for graceful shutd...
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Master
Version: 4.1
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.1.0
Assignee: Michal Fojtik
QA Contact: zhou ying
: 1690167 (view as bug list)
Depends On:
TreeView+ depends on / blocked
Reported: 2019-03-18 16:10 UTC by Clayton Coleman
Modified: 2019-04-11 07:59 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed:
Target Upstream Version:

Attachments (Terms of Use)

Description Clayton Coleman 2019-03-18 16:10:28 UTC
CMO reported a hard error (failing=true) on it's cluster operator, and this should be an error it handles and ignores/retries:

Mar 18 14:54:32.602 E clusteroperator/monitoring changed Failing to True: Failed to rollout the stack. Error: running task Updating Prometheus-k8s failed: reconciling Prometheus ClusterRoleBinding failed: updating ClusterRoleBinding object failed: an error on the server ("apiserver is shutting down.") has prevented the request from succeeding (put prometheus-k8s)

Depends on which should make it automatic.

For 4.1 we want the server to return a structured error and have client stacks gracefully retry the error, to minimize the churn caused by API restarts. Blocks GA

Comment 1 Clayton Coleman 2019-03-18 16:10:51 UTC
Related to

Need to ensure all components are protected.

Comment 2 Michal Fojtik 2019-03-19 09:06:23 UTC
*** Bug 1690167 has been marked as a duplicate of this bug. ***

Comment 3 Michal Fojtik 2019-03-19 10:12:07 UTC
To mitigate:

Stefan believe we have bug in shutdown order, so we still need to look at that. The pick above should make the error less disturbing.

Comment 7 Michal Fojtik 2019-04-09 11:29:34 UTC
To match with upstream:

Comment 8 zhou ying 2019-04-10 02:38:10 UTC
No 'apiserver is shutting down' error , but have some related error: ClusterOperatorNotAvailable: Cluster operator openshift-apiserver has not yet reported success.   Not sure is same issue or not.

Comment 9 Michal Fojtik 2019-04-10 18:02:44 UTC
That is different error and it has been fixed today.

Comment 10 zhou ying 2019-04-11 07:59:35 UTC
Checked with latest e2e test logs, no 'apiserver is shutting down' error , will verify this.

Note You need to log in before you can comment on or make changes to this bug.