Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1690658 - kube-scheduler crashlooping in extended conformance
Summary: kube-scheduler crashlooping in extended conformance
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Pod
Version: 4.1
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.1.0
Assignee: ravig
QA Contact: Jianwei Hou
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-20 00:03 UTC by Chance Zibolski
Modified: 2019-04-04 19:22 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-04-04 18:50:59 UTC
Target Upstream Version:


Attachments (Terms of Use)
Occurrences of this error in CI from 2019-03-19T12:28 to 2019-03-21T20:06 UTC (deleted)
2019-03-22 05:30 UTC, W. Trevor King
no flags Details

Description Chance Zibolski 2019-03-20 00:03:23 UTC
Description of problem:

The following test is failing:

openshift-tests [Feature:Platform] Managed cluster should have no crashlooping pods in core namespaces over two minutes 


The error in the test is:

fail [github.com/openshift/origin/test/extended/operators/cluster.go:109]: Expected
    <[]string | len:1, cap:1>: [
        "Pod openshift-kube-scheduler/installer-1-ip-10-0-136-121.ec2.internal is not healthy: 


From the following release stream:

https://openshift-release.svc.ci.openshift.org/releasestream/4.0.0-0.ci/release/4.0.0-0.ci-2019-03-19-213803

https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-4.0/5895

Comment 1 Seth Jennings 2019-03-20 16:40:23 UTC
installer container exiting with code 255 after 1m28s of running time.  No error message from the installer container.  Last message was "cmd.go:269] Writing static pod manifest..." which occurs 1m5s before the finishedAt time, indicating the installer it stuck here.

  "containerStatuses": [
    {
      "name": "installer",
      "state": {
        "terminated": {
          "exitCode": 255,
          "reason": "Error",
          "message": "-pod/version\" ...\nI0319 22:31:37.099702       1 cmd.go:236] Creating directory \"/etc/kubernetes/static-pod-resources/kube-scheduler-pod-1/configmaps/scheduler-kubeconfig\" ...\nI0319 22:31:37.099877       1 cmd.go:241] Writing config file \"/etc/kubernetes/static-pod-resources/kube-scheduler-pod-1/configmaps/scheduler-kubeconfig/kubeconfig\" ...\nI0319 22:31:37.100060       1 cmd.go:236] Creating directory \"/etc/kubernetes/static-pod-resources/kube-scheduler-pod-1/configmaps/serviceaccount-ca\" ...\nI0319 22:31:37.100234       1 cmd.go:241] Writing config file \"/etc/kubernetes/static-pod-resources/kube-scheduler-pod-1/configmaps/serviceaccount-ca/ca-bundle.crt\" ...\nI0319 22:31:37.100383       1 cmd.go:249] Writing pod manifest \"/etc/kubernetes/static-pod-resources/kube-scheduler-pod-1/kube-scheduler-pod.yaml\" ...\nI0319 22:31:37.100548       1 cmd.go:255] Creating directory for static pod manifest \"/etc/kubernetes/manifests\" ...\nI0319 22:31:37.100626       1 cmd.go:269] Writing static pod manifest \"/etc/kubernetes/manifests/kube-scheduler-pod.yaml\" ...\n{\"kind\":\"Pod\",\"apiVersion\":\"v1\",\"metadata\":{\"name\":\"openshift-kube-scheduler\",\"namespace\":\"openshift-kube-scheduler\",\"creationTimestamp\":null,\"labels\":{\"app\":\"openshift-kube-scheduler\",\"revision\":\"1\",\"scheduler\":\"true\"}},\"spec\":{\"volumes\":[{\"name\":\"resource-dir\",\"hostPath\":{\"path\":\"/etc/kubernetes/static-pod-resources/kube-scheduler-pod-1\"}}],\"containers\":[{\"name\":\"scheduler\",\"image\":\"registry.svc.ci.openshift.org/ocp/4.0-2019-03-19-213803@sha256:192ee24c8ac5321cc995fb35966efbb9d59a5440bb6400c416c2d031c27cde73\",\"command\":[\"hyperkube\",\"kube-scheduler\"],\"args\":[\"--config=/etc/kubernetes/static-pod-resources/configmaps/config/config.yaml\",\"-v=2\"],\"resources\":{\"requests\":{\"memory\":\"50Mi\"}},\"volumeMounts\":[{\"name\":\"resource-dir\",\"mountPath\":\"/etc/kubernetes/static-pod-resources\"}],\"terminationMessagePolicy\":\"FallbackToLogsOnError\",\"imagePullPolicy\":\"IfNotPresent\"}],\"hostNetwork\":true,\"tolerations\":[{\"operator\":\"Exists\"}],\"priorityClassName\":\"system-node-critical\"},\"status\":{}}\n",
          "startedAt": "2019-03-19T22:31:14Z",
          "finishedAt": "2019-03-19T22:32:42Z",
          "containerID": "cri-o://162ef8f47b4bf4b526df93d94ae1290fde65e72bba92079c5f7a2c99f5304232"
        }
      },
      "lastState": {},
      "ready": false,
      "restartCount": 0,
      "image": "registry.svc.ci.openshift.org/ocp/4.0-2019-03-19-213803@sha256:9a2160a24860b80bf580999398fc4661eed4100b38e786b7c6e0391149d843af",
      "imageID": "registry.svc.ci.openshift.org/ocp/4.0-2019-03-19-213803@sha256:9a2160a24860b80bf580999398fc4661eed4100b38e786b7c6e0391149d843af",
      "containerID": "cri-o://162ef8f47b4bf4b526df93d94ae1290fde65e72bba92079c5f7a2c99f5304232"
    }
  ],

Comment 2 W. Trevor King 2019-03-22 05:30:57 UTC
Created attachment 1546768 [details]
Occurrences of this error in CI from 2019-03-19T12:28 to 2019-03-21T20:06 UTC

Occurrences of this error in CI from 2019-03-19T12:28 to 2019-03-21T20:06 UTC

This has caused 2 of our 861 failures in *-e2e-aws* jobs across the whole CI system over the past 55 hours.  Generated with [1]:

  $ deck-build-log-plot 'Pod openshift-kube-scheduler/installer.* is not healthy.*Writing static pod manifest'

[1]: https://github.com/wking/openshift-release/tree/debug-scripts/deck-build-log

Comment 3 Seth Jennings 2019-04-04 18:50:59 UTC
Reran `deck-build-log-plot 'Pod openshift-kube-scheduler/installer.* is not healthy.*Writing static pod manifest'` and there have been no occurrences in the past 48 hours.

Not sure what fixed it but there has been a lot of bug fixing activity in library-go.

Closing for now.


Note You need to log in before you can comment on or make changes to this bug.