Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1510174 - occasional restart of atomic-openshift-master-controllers.service due to scheduler cache corruption
Summary: occasional restart of atomic-openshift-master-controllers.service due to sche...
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Master
Version: 3.7.0
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 3.9.0
Assignee: Jordan Liggitt
QA Contact: Vikas Laad
Whiteboard: aos-scalability-37
Depends On:
TreeView+ depends on / blocked
Reported: 2017-11-06 20:44 UTC by Vikas Laad
Modified: 2018-03-28 14:11 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Last Closed: 2018-03-28 14:11:22 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2018:0489 None None None 2018-03-28 14:11:45 UTC

Description Vikas Laad 2017-11-06 20:44:39 UTC
Description of problem:
I am running reliability tests on 3.7 cluster, I see occasional restart of atomic-openshift-master-controllers.service in following pattern

Nov 02 04:15:33 atomic-openshift-master-controllers[93589]: F1102 04:15:33.881293   93600 cache.go:264] Schedulercache is corrupted and can badly affect scheduling decisions
Nov 02 04:15:33 systemd[1]: atomic-openshift-master-controllers.service: main process exited, code=exited, status=255/n/a
Nov 02 04:15:33 atomic-openshift-master-controllers[16013]: container "atomic-openshift-master-controllers" does not exist
Nov 02 04:15:33 systemd[1]: atomic-openshift-master-controllers.service: control process exited, code=exited status=1
Nov 02 04:15:34 systemd[1]: Unit atomic-openshift-master-controllers.service entered failed state.
Nov 02 04:15:34 systemd[1]: atomic-openshift-master-controllers.service failed.
Nov 02 04:15:39 systemd[1]: atomic-openshift-master-controllers.service holdoff time over, scheduling restart.
Nov 02 04:15:39 systemd[1]: Starting atomic-openshift-master-controllers.service...
Nov 02 04:15:39 systemd[1]: Started atomic-openshift-master-controllers.service.
Nov 02 04:15:39 atomic-openshift-master-controllers[16049]: I1102 04:15:39.427994   16060 plugins.go:77] Registered admission plugin "NamespaceLifecycle"
Nov 02 04:15:39 atomic-openshift-master-controllers[16049]: W1102 04:15:39.429207   16060 start_master.go:290] Warning: assetConfig.loggingPublicURL: Invalid value: "": required to view aggregated container logs in the console, master start will continue.
Nov 02 04:15:39 atomic-openshift-master-controllers[16049]: W1102 04:15:39.429234   16060 start_master.go:290] Warning: assetConfig.metricsPublicURL: Invalid value: "": required to view cluster metrics in the console, master start will continue.

Version-Release number of selected component (if applicable):
openshift v3.7.0-0.178.0
kubernetes v1.7.6+a08f5eeb62
etcd 3.2.8

How reproducible:

Steps to Reproduce:
1. Keep creating/updating/building/scaling quickstart apps on the cluster
2. watch master logs

Actual results:
master controller restart occasionally

Expected results:
should not restart master controller

Additional info:
See master logs attached.

Comment 2 Mike Fiedler 2017-11-07 03:11:40 UTC
In the referenced logs in comment 1, master-controllers restarted 15 times in 5 days due to the scheduler cache corruption fatal.

Comment 3 Jordan Liggitt 2017-11-07 14:10:08 UTC

Comment 4 Jordan Liggitt 2017-11-07 20:15:49 UTC
Fix in

Comment 5 Michal Fojtik 2017-12-07 09:16:45 UTC

Comment 6 Jordan Liggitt 2018-01-09 20:04:48 UTC
Will be fixed by 1.9.1 rebase in

Comment 8 Mike Fiedler 2018-01-15 20:43:40 UTC
Assigning QA to @vlaad.   This will be verified in the 3.9 reliability runs.

Comment 9 Vikas Laad 2018-01-22 16:08:25 UTC
Verified in following version

openshift v3.9.0-0.20.0
kubernetes v1.9.1+a0ce1bc657
etcd 3.2.8

I do not see restarts of master-controller process anymore.

Comment 12 errata-xmlrpc 2018-03-28 14:11:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

Note You need to log in before you can comment on or make changes to this bug.