Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1596876 - On a 3.10 system, some pods are in terminating for multiple hours even though the nodes are up [NEEDINFO]
Summary: On a 3.10 system, some pods are in terminating for multiple hours even though...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Pod
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
: 3.10.z
Assignee: Seth Jennings
QA Contact: DeShuai Ma
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-06-29 21:47 UTC by Clayton Coleman
Modified: 2018-08-13 21:20 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-08-13 21:20:27 UTC
Target Upstream Version:
sjenning: needinfo? (ccoleman)


Attachments (Terms of Use)

Description Clayton Coleman 2018-06-29 21:47:55 UTC
On a 3.10 rc.0 system, I have a number of pods spread across three nodes that should have been terminated hours ago, but haven't:

○ oc get pods --all-namespaces -o wide | grep Terminating
ci-op-5pqxws2w                      e2e-gcp                                           0/4       Terminating   0          6h        <none>          origin-ci-ig-n-nbnl
ci-op-5pqxws2w                      rpm-repo-1-rdgjl                                  0/1       Terminating   0          6h        172.16.16.45    origin-ci-ig-n-7kb3
ci-op-7ybp9v4f                      e2e-gcp                                           0/4       Terminating   0          6h        172.16.8.222    origin-ci-ig-n-nbnl
ci-op-7ybp9v4f                      rpm-repo-1-5j6fx                                  0/1       Terminating   0          6h        <none>          origin-ci-ig-n-7kb3
ci-op-b41n92cl                      integration                                       0/2       Terminating   0          5h        172.16.4.210    origin-ci-ig-n-r7r2
ci-op-ztg5b6rc                      rpm-repo-1-9kc55                                  0/1       Terminating   0          6h        <none>          origin-ci-ig-n-7kb3
ci                                  10cf0577-7ba3-11e8-bb72-0a58ac100bda              0/2       Terminating   0          7h        172.16.4.132    origin-ci-ig-n-r7r2
telemeter                           telemeter-1                                       0/1       Terminating   0          7h        172.16.8.120    origin-ci-ig-n-nbnl

The nodes are healthy and are checking in:

○ oc get nodes
NAME                  STATUS    ROLES          AGE       VERSION
origin-ci-ig-m-vqm9   Ready     infra,master   2d        v1.10.0+b81c8f8
origin-ci-ig-n-7kb3   Ready     compute        7h        v1.10.0+b81c8f8
origin-ci-ig-n-7vg6   Ready     compute        2h        v1.10.0+b81c8f8
origin-ci-ig-n-fxsc   Ready     compute        15h       v1.10.0+b81c8f8
origin-ci-ig-n-hs69   Ready     compute        9h        v1.10.0+b81c8f8
origin-ci-ig-n-nbnl   Ready     compute        10h       v1.10.0+b81c8f8
origin-ci-ig-n-pz4j   Ready     compute        2h        v1.10.0+b81c8f8
origin-ci-ig-n-qk2n   Ready     compute        12h       v1.10.0+b81c8f8
origin-ci-ig-n-r7r2   Ready     compute        13h       v1.10.0+b81c8f8
origin-ci-ig-n-xv7m   Ready     compute        7h        v1.10.0+b81c8f8

YAML of one of the pods:

apiVersion: v1
items:
- apiVersion: v1
  kind: Pod
  metadata:
    annotations:
      openshift.io/scc: restricted
    creationTimestamp: 2018-06-29T16:19:54Z
    deletionGracePeriodSeconds: 30
    deletionTimestamp: 2018-06-29T16:57:34Z
    name: integration
    namespace: ci-op-b41n92cl
    ownerReferences:
    - apiVersion: image.openshift.io/v1
      controller: true
      kind: ImageStream
      name: pipeline
      uid: 717ee7bb-7bb6-11e8-9efa-42010a8e0004
    resourceVersion: "24788407"
    selfLink: /api/v1/namespaces/ci-op-b41n92cl/pods/integration
    uid: 421f59cf-7bb8-11e8-9efa-42010a8e0004
  spec:
    containers:
    - command:
      - /bin/sh
      - -c
      - |-
        #!/bin/sh
        set -eu
        ARTIFACT_DIR=/tmp/artifacts JUNIT_REPORT=1 KUBERNETES_SERVICE_HOST= make test-integration
      image: docker-registry.default.svc:5000/ci-op-b41n92cl/pipeline@sha256:c21345abe15937f95b8216168087ea9439764beb51a6a0842a482741d0c299ff
      imagePullPolicy: IfNotPresent
      name: test
      resources:
        limits:
          cpu: "7"
          memory: 11Gi
        requests:
          cpu: "3"
          memory: 5Gi
      securityContext:
        capabilities:
          drop:
          - KILL
          - MKNOD
          - SETGID
          - SETUID
        runAsUser: 1001540000
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: FallbackToLogsOnError
      volumeMounts:
      - mountPath: /tmp/artifacts
        name: artifacts
      - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
        name: default-token-xtz79
        readOnly: true
    - command:
      - /bin/sh
      - -c
      - "#!/bin/sh\nset -euo pipefail\ntrap 'kill $(jobs -p); exit 0' TERM\n\ntouch
        /tmp/done\necho \"Waiting for artifacts to be extracted\"\nwhile true; do\n\tif
        [[ ! -f /tmp/done ]]; then\n\t\techo \"Artifacts extracted\"\n\t\texit 0\n\tfi\n\tsleep
        5 & wait\ndone\n"
      image: busybox
      imagePullPolicy: Always
      name: artifacts
      resources: {}
      securityContext:
        capabilities:
          drop:
          - KILL
          - MKNOD
          - SETGID
          - SETUID
        runAsUser: 1001540000
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      volumeMounts:
      - mountPath: /tmp/artifacts
        name: artifacts
      - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
        name: default-token-xtz79
        readOnly: true
    dnsPolicy: ClusterFirst
    imagePullSecrets:
    - name: default-dockercfg-c6t7p
    nodeName: origin-ci-ig-n-r7r2
    nodeSelector:
      role: app
    restartPolicy: Never
    schedulerName: default-scheduler
    securityContext:
      fsGroup: 1001540000
      seLinuxOptions:
        level: s0:c39,c29
    serviceAccount: default
    serviceAccountName: default
    terminationGracePeriodSeconds: 30
    tolerations:
    - effect: NoSchedule
      key: node.kubernetes.io/memory-pressure
      operator: Exists
    volumes:
    - emptyDir: {}
      name: artifacts
    - name: default-token-xtz79
      secret:
        defaultMode: 420
        secretName: default-token-xtz79
  status:
    conditions:
    - lastProbeTime: null
      lastTransitionTime: 2018-06-29T16:26:05Z
      reason: PodCompleted
      status: "True"
      type: Initialized
    - lastProbeTime: null
      lastTransitionTime: 2018-06-29T16:26:09Z
      reason: PodCompleted
      status: "False"
      type: Ready
    - lastProbeTime: null
      lastTransitionTime: 2018-06-29T16:26:05Z
      status: "True"
      type: PodScheduled
    containerStatuses:
    - containerID: docker://4f70850d12293e1c83e3bb63d7a84d3f7c3f9a8894ba1f76929a333b6090c14c
      image: docker.io/busybox:latest
      imageID: docker-pullable://docker.io/busybox@sha256:141c253bc4c3fd0a201d32dc1f493bcf3fff003b6df416dea4f41046e0f37d47
      lastState: {}
      name: artifacts
      ready: false
      restartCount: 0
      state:
        terminated:
          exitCode: 0
          finishedAt: null
          startedAt: null
    - containerID: docker://c03189f32d4c4e2b41dab23e31bc464e1857110565a2f21a10e96942d154cb59
      image: docker-registry.default.svc:5000/ci-op-b41n92cl/pipeline:bin
      imageID: docker-pullable://docker-registry.default.svc:5000/ci-op-b41n92cl/pipeline@sha256:c21345abe15937f95b8216168087ea9439764beb51a6a0842a482741d0c299ff
      lastState: {}
      name: test
      ready: false
      restartCount: 0
      state:
        terminated:
          exitCode: 0
          finishedAt: null
          startedAt: null
    hostIP: 10.142.0.2
    phase: Succeeded
    podIP: 172.16.4.210
    qosClass: Burstable
    startTime: 2018-06-29T16:26:05Z
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

Logs from the node that had the pod:

# journalctl -u origin-node | grep ci-op-b41n92cl
Jun 29 16:13:29 origin-ci-ig-n-r7r2 origin-node[10569]: I0629 16:13:29.871655   10569 kubelet.go:1878] SyncLoop (ADD, "api"): "bin-build_ci-op-b41n92cl(5c9856df-7bb7-11e8-9efa-42010a8e0004)"
Jun 29 16:13:30 origin-ci-ig-n-r7r2 origin-node[10569]: I0629 16:13:30.199944   10569 kuberuntime_manager.go:385] No sandbox for pod "bin-build_ci-op-b41n92cl(5c9856df-7bb7-11e8-9efa-42010a8e0004)" can be found. Need to start a new one
Jun 29 16:13:30 origin-ci-ig-n-r7r2 origin-node[10569]: I0629 16:13:30.557039   10569 kubelet.go:1923] SyncLoop (PLEG): "bin-build_ci-op-b41n92cl(5c9856df-7bb7-11e8-9efa-42010a8e0004)", event: &pleg.PodLifecycleEvent{ID:"5c9856df-7bb7-11e8-9efa-42010a8e0004", Type:"ContainerStarted", Data:"db03014c7bd0ce63ea8b441f4778434205ae54825c84f207bb8b68efa872ef14"}
Jun 29 16:13:32 origin-ci-ig-n-r7r2 origin-node[10569]: I0629 16:13:32.747466   10569 kubelet.go:1923] SyncLoop (PLEG): "bin-build_ci-op-b41n92cl(5c9856df-7bb7-11e8-9efa-42010a8e0004)", event: &pleg.PodLifecycleEvent{ID:"5c9856df-7bb7-11e8-9efa-42010a8e0004", Type:"ContainerDied", Data:"91dda1690c6a53338bd97e23bde2414c44dd5e7e5d6e6a4d11b4a359f8264d19"}
Jun 29 16:13:33 origin-ci-ig-n-r7r2 origin-node[10569]: I0629 16:13:33.048507   10569 kuberuntime_manager.go:513] Container {Name:docker-build Image:docker.io/openshift/origin-docker-builder:v3.10.0 Command:[openshift-docker-build] Args:[--loglevel=0] WorkingDir: Ports:[] EnvFrom:[] Env:[{Name:BUILD Value:{"kind":"Build","apiVersion":"v1","metadata":{"name":"bin","namespace":"ci-op-b41n92cl","selfLink":"/apis/build.openshift.io/v1/namespaces/ci-op-b41n92cl/builds/bin","uid":"5c4727f7-7bb7-11e8-9efa-42010a8e0004","resourceVersion":"24749991","creationTimestamp":"2018-06-29T16:13:29Z","labels":{"build-id":"224","created-by-ci":"true","creates":"bin","job":"pull-ci-origin-e2e-gcp","persists-between-builds":"false"},"annotations":{"ci.openshift.io/job-spec":"{\"type\":\"presubmit\",\"job\":\"pull-ci-origin-e2e-gcp\",\"buildid\":\"224\",\"prowjobid\":\"5e16c021-7bb6-11e8-a3ff-0a58ac100037\",\"refs\":{\"org\":\"openshift\",\"repo\":\"origin\",\"base_ref\":\"master\",\"base_sha\":\"83ac5ae6a7d635ae67b1be438d85c339500fd65b\",\"pulls\":[{\"number\":20139,\"author\":\"soltysh\",\"sha\":\"df4770c1b4fd0426349b8ed0741488e97caf1107\"}]}}"},"ownerReferences":[{"apiVersion":"image.openshift.io/v1","kind":"ImageStream","name":"pipeline","uid":"717ee7bb-7bb6-11e8-9efa-42010a8e0004","controller":true}]},"spec":{"serviceAccount":"builder","source":{"type":"Dockerfile","dockerfile":"FROM pipeline:src\nRUN [\"/bin/bash\", \"-c\", \"set -o errexit; umask 0002; make build\"]"},"strategy":{"type":"Docker","dockerStrategy":{"from":{"kind":"DockerImage","name":"docker-registry.default.svc:5000/ci-op-b41n92cl/pipeline@sha256:5a71c79ac28245684d9df79201ca5177ff6fdb1da00aee66887fa94939d79217"},"pullSecret":{"name":"builder-dockercfg-5cp2q"},"noCache":true,"forcePull":true,"imageOptimizationPolicy":"SkipLayers"}},"output":{"to":{"kind":"DockerImage","name":"docker-registry.default.svc:5000/ci-op-b41n92cl/pipeline:bin"},"pushSecret":{"name":"builder-dockercfg-5cp2q"}},"resources":{"limits":{"cpu":"7","memory":"9Gi"},"requests":{"cpu":"3","memory":"7Gi"}},"postCommit":{},"nodeSelector":null,"triggeredBy":null},"status":{"phase":"New","outputDockerImageReference":"docker-registry.default.svc:5000/ci-op-b41n92cl/pipeline:bin","output":{}}}
Jun 29 16:13:33 origin-ci-ig-n-r7r2 origin-node[10569]: I0629 16:13:33.871698   10569 kubelet.go:1923] SyncLoop (PLEG): "bin-build_ci-op-b41n92cl(5c9856df-7bb7-11e8-9efa-42010a8e0004)", event: &pleg.PodLifecycleEvent{ID:"5c9856df-7bb7-11e8-9efa-42010a8e0004", Type:"ContainerStarted", Data:"ba75f860902034b19059a662b81c4d7f38e4b40982cb78a4bda6604be8d10ddc"}
Jun 29 16:19:54 origin-ci-ig-n-r7r2 origin-node[10569]: I0629 16:19:54.762863   10569 kubelet.go:1923] SyncLoop (PLEG): "bin-build_ci-op-b41n92cl(5c9856df-7bb7-11e8-9efa-42010a8e0004)", event: &pleg.PodLifecycleEvent{ID:"5c9856df-7bb7-11e8-9efa-42010a8e0004", Type:"ContainerDied", Data:"ba75f860902034b19059a662b81c4d7f38e4b40982cb78a4bda6604be8d10ddc"}
Jun 29 16:19:54 origin-ci-ig-n-r7r2 origin-node[10569]: I0629 16:19:54.889900   10569 kubelet.go:1878] SyncLoop (ADD, "api"): "verify_ci-op-b41n92cl(421f561a-7bb8-11e8-9efa-42010a8e0004)"
Jun 29 16:19:55 origin-ci-ig-n-r7r2 origin-node[10569]: I0629 16:19:55.226745   10569 kuberuntime_manager.go:385] No sandbox for pod "verify_ci-op-b41n92cl(421f561a-7bb8-11e8-9efa-42010a8e0004)" can be found. Need to start a new one
Jun 29 16:19:55 origin-ci-ig-n-r7r2 origin-node[10569]: I0629 16:19:55.867451   10569 kubelet.go:1923] SyncLoop (PLEG): "bin-build_ci-op-b41n92cl(5c9856df-7bb7-11e8-9efa-42010a8e0004)", event: &pleg.PodLifecycleEvent{ID:"5c9856df-7bb7-11e8-9efa-42010a8e0004", Type:"ContainerDied", Data:"db03014c7bd0ce63ea8b441f4778434205ae54825c84f207bb8b68efa872ef14"}
Jun 29 16:19:55 origin-ci-ig-n-r7r2 origin-node[10569]: I0629 16:19:55.874763   10569 kubelet.go:1923] SyncLoop (PLEG): "verify_ci-op-b41n92cl(421f561a-7bb8-11e8-9efa-42010a8e0004)", event: &pleg.PodLifecycleEvent{ID:"421f561a-7bb8-11e8-9efa-42010a8e0004", Type:"ContainerStarted", Data:"6b4604081a990d3d111c7591ca72663803e38e381997c721968b0ca2ef1b197d"}
Jun 29 16:19:56 origin-ci-ig-n-r7r2 origin-node[10569]: I0629 16:19:56.993439   10569 kubelet.go:1923] SyncLoop (PLEG): "verify_ci-op-b41n92cl(421f561a-7bb8-11e8-9efa-42010a8e0004)", event: &pleg.PodLifecycleEvent{ID:"421f561a-7bb8-11e8-9efa-42010a8e0004", Type:"ContainerStarted", Data:"24043afb8b68283c289936af96199a1c3bd12778ee9b24080a7b7f2efb244065"}
Jun 29 16:20:00 origin-ci-ig-n-r7r2 origin-node[10569]: I0629 16:20:00.240572   10569 kubelet.go:1923] SyncLoop (PLEG): "verify_ci-op-b41n92cl(421f561a-7bb8-11e8-9efa-42010a8e0004)", event: &pleg.PodLifecycleEvent{ID:"421f561a-7bb8-11e8-9efa-42010a8e0004", Type:"ContainerStarted", Data:"91cc004d6d7643254bde35608ab31d31416a0af84c2d625ca82bac75b4994f97"}
Jun 29 16:26:01 origin-ci-ig-n-r7r2 origin-node[10569]: I0629 16:26:01.654007   10569 kubelet.go:1923] SyncLoop (PLEG): "verify_ci-op-b41n92cl(421f561a-7bb8-11e8-9efa-42010a8e0004)", event: &pleg.PodLifecycleEvent{ID:"421f561a-7bb8-11e8-9efa-42010a8e0004", Type:"ContainerDied", Data:"24043afb8b68283c289936af96199a1c3bd12778ee9b24080a7b7f2efb244065"}
Jun 29 16:26:04 origin-ci-ig-n-r7r2 origin-node[10569]: I0629 16:26:04.915176   10569 kubelet.go:1923] SyncLoop (PLEG): "verify_ci-op-b41n92cl(421f561a-7bb8-11e8-9efa-42010a8e0004)", event: &pleg.PodLifecycleEvent{ID:"421f561a-7bb8-11e8-9efa-42010a8e0004", Type:"ContainerDied", Data:"91cc004d6d7643254bde35608ab31d31416a0af84c2d625ca82bac75b4994f97"}
Jun 29 16:26:05 origin-ci-ig-n-r7r2 origin-node[10569]: I0629 16:26:05.350748   10569 kubelet.go:1878] SyncLoop (ADD, "api"): "integration_ci-op-b41n92cl(421f59cf-7bb8-11e8-9efa-42010a8e0004)"
Jun 29 16:26:05 origin-ci-ig-n-r7r2 origin-node[10569]: I0629 16:26:05.683719   10569 kuberuntime_manager.go:385] No sandbox for pod "integration_ci-op-b41n92cl(421f59cf-7bb8-11e8-9efa-42010a8e0004)" can be found. Need to start a new one
Jun 29 16:26:06 origin-ci-ig-n-r7r2 origin-node[10569]: I0629 16:26:06.038150   10569 kubelet.go:1923] SyncLoop (PLEG): "integration_ci-op-b41n92cl(421f59cf-7bb8-11e8-9efa-42010a8e0004)", event: &pleg.PodLifecycleEvent{ID:"421f59cf-7bb8-11e8-9efa-42010a8e0004", Type:"ContainerStarted", Data:"6dc441ed6c2e3d28551af46067607c092f118b0d6ddb37178193d46e5bf44336"}
Jun 29 16:26:06 origin-ci-ig-n-r7r2 origin-node[10569]: I0629 16:26:06.055936   10569 kubelet.go:1923] SyncLoop (PLEG): "verify_ci-op-b41n92cl(421f561a-7bb8-11e8-9efa-42010a8e0004)", event: &pleg.PodLifecycleEvent{ID:"421f561a-7bb8-11e8-9efa-42010a8e0004", Type:"ContainerDied", Data:"6b4604081a990d3d111c7591ca72663803e38e381997c721968b0ca2ef1b197d"}
Jun 29 16:26:08 origin-ci-ig-n-r7r2 origin-node[10569]: I0629 16:26:08.251717   10569 kubelet.go:1923] SyncLoop (PLEG): "integration_ci-op-b41n92cl(421f59cf-7bb8-11e8-9efa-42010a8e0004)", event: &pleg.PodLifecycleEvent{ID:"421f59cf-7bb8-11e8-9efa-42010a8e0004", Type:"ContainerStarted", Data:"c03189f32d4c4e2b41dab23e31bc464e1857110565a2f21a10e96942d154cb59"}
Jun 29 16:26:09 origin-ci-ig-n-r7r2 origin-node[10569]: I0629 16:26:09.361171   10569 kubelet.go:1923] SyncLoop (PLEG): "integration_ci-op-b41n92cl(421f59cf-7bb8-11e8-9efa-42010a8e0004)", event: &pleg.PodLifecycleEvent{ID:"421f59cf-7bb8-11e8-9efa-42010a8e0004", Type:"ContainerStarted", Data:"4f70850d12293e1c83e3bb63d7a84d3f7c3f9a8894ba1f76929a333b6090c14c"}
Jun 29 17:32:56 origin-ci-ig-n-r7r2 origin-node[9894]: W0629 17:32:56.629070    9894 docker_sandbox.go:365] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "bin-build_ci-op-b41n92cl": CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "db03014c7bd0ce63ea8b441f4778434205ae54825c84f207bb8b68efa872ef14"
Jun 29 17:32:59 origin-ci-ig-n-r7r2 origin-node[9894]: W0629 17:32:59.421260    9894 docker_sandbox.go:365] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "verify_ci-op-b41n92cl": CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "6b4604081a990d3d111c7591ca72663803e38e381997c721968b0ca2ef1b197d"
Jun 29 17:33:02 origin-ci-ig-n-r7r2 origin-node[9894]: I0629 17:33:02.099052    9894 kubelet.go:1878] SyncLoop (ADD, "api"): "e2e-gcp_ci-op-86xd3j4s(b5228529-7bb8-11e8-9efa-42010a8e0004), sdn-2bs58_openshift-sdn(948783f3-7b71-11e8-9efa-42010a8e0004), console-jenkins-operator-10-6b5vd_ci(c38ad17a-7b8f-11e8-9efa-42010a8e0004), refresh-14-njv2m_ci(c3e816e5-7b8f-11e8-9efa-42010a8e0004), tracer-14-7dq99_ci(c4036ac6-7b8f-11e8-9efa-42010a8e0004), rpm-repo-1-dmf45_ci-op-tkz7016x(c31095ca-7b8f-11e8-9efa-42010a8e0004), cleanup-when-idle_ci-op-r78dwji1(47f6e017-7ba4-11e8-9efa-42010a8e0004), aws-machine-controller-46-l5w8c_openshift-cluster-operator(c4259ff7-7b8f-11e8-9efa-42010a8e0004), deck-internal-23-g8k4b_ci(c39cb5da-7b8f-11e8-9efa-42010a8e0004), gcsweb-20-v9r7k_ci(c3c1ce86-7b8f-11e8-9efa-42010a8e0004), ovs-rt5t8_openshift-sdn(94875855-7b71-11e8-9efa-42010a8e0004), ansible-build_ci-op-8bvfx44r(d5e8bf93-7bbb-11e8-9efa-42010a8e0004), 10cf0577-7ba3-11e8-bb72-0a58ac100bda_ci(14712ac7-7ba3-11e8-9efa-42010a8e0004), service-cert-sync-gf2qm_openshift-node(947b92c8-7b71-11e8-9efa-42010a8e0004), docker-registry-20-kmftd_default(c4085004-7b8f-11e8-9efa-42010a8e0004), tide-25-8xlsh_ci(c3f128ec-7b8f-11e8-9efa-42010a8e0004), sync-gbt4f_openshift-node(94690d8f-7b71-11e8-9efa-42010a8e0004), prometheus-node-exporter-gvvc4_openshift-monitoring(946f4ba5-7b71-11e8-9efa-42010a8e0004), artifact-uploader-22-qkfwr_ci(c37b8bea-7b8f-11e8-9efa-42010a8e0004), openshift-origin-bot-55c7bb9d87-rqvnf_origin-publisher-bot(c445274f-7b8f-11e8-9efa-42010a8e0004), integration_ci-op-b41n92cl(421f59cf-7bb8-11e8-9efa-42010a8e0004)"
Jun 29 17:33:02 origin-ci-ig-n-r7r2 origin-node[9894]: I0629 17:33:02.244303    9894 kubelet.go:1923] SyncLoop (PLEG): "integration_ci-op-b41n92cl(421f59cf-7bb8-11e8-9efa-42010a8e0004)", event: &pleg.PodLifecycleEvent{ID:"421f59cf-7bb8-11e8-9efa-42010a8e0004", Type:"ContainerStarted", Data:"4f70850d12293e1c83e3bb63d7a84d3f7c3f9a8894ba1f76929a333b6090c14c"}
Jun 29 17:33:02 origin-ci-ig-n-r7r2 origin-node[9894]: I0629 17:33:02.244324    9894 kubelet.go:1923] SyncLoop (PLEG): "integration_ci-op-b41n92cl(421f59cf-7bb8-11e8-9efa-42010a8e0004)", event: &pleg.PodLifecycleEvent{ID:"421f59cf-7bb8-11e8-9efa-42010a8e0004", Type:"ContainerDied", Data:"c03189f32d4c4e2b41dab23e31bc464e1857110565a2f21a10e96942d154cb59"}
Jun 29 17:33:02 origin-ci-ig-n-r7r2 origin-node[9894]: I0629 17:33:02.244383    9894 kubelet.go:1923] SyncLoop (PLEG): "integration_ci-op-b41n92cl(421f59cf-7bb8-11e8-9efa-42010a8e0004)", event: &pleg.PodLifecycleEvent{ID:"421f59cf-7bb8-11e8-9efa-42010a8e0004", Type:"ContainerStarted", Data:"6dc441ed6c2e3d28551af46067607c092f118b0d6ddb37178193d46e5bf44336"}
Jun 29 17:33:02 origin-ci-ig-n-r7r2 origin-node[9894]: E0629 17:33:02.673153    9894 event.go:200] Server rejected event '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"integration.153cb1e7d2ad6194", GenerateName:"", Namespace:"ci-op-b41n92cl", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:v1.Time{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*v1.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Initializers:(*v1.Initializers)(nil), Finalizers:[]string(nil), ClusterName:""}, InvolvedObject:v1.ObjectReference{Kind:"Pod", Namespace:"ci-op-b41n92cl", Name:"integration", UID:"421f59cf-7bb8-11e8-9efa-42010a8e0004", APIVersion:"v1", ResourceVersion:"24770907", FieldPath:"spec.containers{artifacts}"}, Reason:"Killing", Message:"Killing container with id docker://artifacts:Need to kill Pod", Source:v1.EventSource{Component:"kubelet", Host:"origin-ci-ig-n-r7r2"}, FirstTimestamp:v1.Time{Time:time.Time{wall:0xbec5b933a581d594, ext:7672073287, loc:(*time.Location)(0x8fb9320)}}, LastTimestamp:v1.Time{Time:time.Time{wall:0xbec5b933a581d594, ext:7672073287, loc:(*time.Location)(0x8fb9320)}}, Count:1, Type:"Normal", EventTime:v1.MicroTime{Time:time.Time{wall:0x0, ext:0, loc:(*time.Location)(nil)}}, Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"", ReportingInstance:""}': 'events "integration.153cb1e7d2ad6194" is forbidden: unable to create new content in namespace ci-op-b41n92cl because it is being terminated' (will not retry!)
Jun 29 17:33:04 origin-ci-ig-n-r7r2 origin-node[9894]: I0629 17:33:04.766561    9894 kubelet.go:1923] SyncLoop (PLEG): "integration_ci-op-b41n92cl(421f59cf-7bb8-11e8-9efa-42010a8e0004)", event: &pleg.PodLifecycleEvent{ID:"421f59cf-7bb8-11e8-9efa-42010a8e0004", Type:"ContainerDied", Data:"4f70850d12293e1c83e3bb63d7a84d3f7c3f9a8894ba1f76929a333b6090c14c"}
Jun 29 17:33:05 origin-ci-ig-n-r7r2 origin-node[9894]: I0629 17:33:05.706508    9894 kubelet.go:1923] SyncLoop (PLEG): "integration_ci-op-b41n92cl(421f59cf-7bb8-11e8-9efa-42010a8e0004)", event: &pleg.PodLifecycleEvent{ID:"421f59cf-7bb8-11e8-9efa-42010a8e0004", Type:"ContainerDied", Data:"6dc441ed6c2e3d28551af46067607c092f118b0d6ddb37178193d46e5bf44336"}
Jun 29 17:33:58 origin-ci-ig-n-r7r2 origin-node[9894]: W0629 17:33:58.620398    9894 docker_sandbox.go:365] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "verify_ci-op-b41n92cl": CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "6b4604081a990d3d111c7591ca72663803e38e381997c721968b0ca2ef1b197d"
Jun 29 17:34:00 origin-ci-ig-n-r7r2 origin-node[9894]: W0629 17:34:00.111056    9894 docker_sandbox.go:365] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "bin-build_ci-op-b41n92cl": CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "db03014c7bd0ce63ea8b441f4778434205ae54825c84f207bb8b68efa872ef14"
Jun 29 17:34:04 origin-ci-ig-n-r7r2 origin-node[9894]: W0629 17:34:04.933773    9894 docker_sandbox.go:365] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "bin-build_ci-op-b41n92cl": CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "db03014c7bd0ce63ea8b441f4778434205ae54825c84f207bb8b68efa872ef14"
Jun 29 17:34:06 origin-ci-ig-n-r7r2 origin-node[9894]: W0629 17:34:06.350963    9894 docker_sandbox.go:365] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "verify_ci-op-b41n92cl": CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "6b4604081a990d3d111c7591ca72663803e38e381997c721968b0ca2ef1b197d"
[root@origin-ci-ig-n-r7r2 clayton]# date
Fri Jun 29 21:47:19 UTC 2018

Comment 1 Clayton Coleman 2018-06-29 21:49:32 UTC
After restarting the node, it gets cleaned up:

Jun 29 17:34:06 origin-ci-ig-n-r7r2 origin-node[9894]: W0629 17:34:06.350963    9894 docker_sandbox.go:365] failed to read pod IP from plugin/docker: NetworkPlugin cni failed on the status hook for pod "verify_ci-op-b41n92cl": CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "6b4604081a990d3d111c7591ca72663803e38e381997c721968b0ca2ef1b197d"
Jun 29 21:48:12 origin-ci-ig-n-r7r2 origin-node[26433]: I0629 21:48:12.945799   26433 kubelet.go:1878] SyncLoop (ADD, "api"): "sdn-2bs58_openshift-sdn(948783f3-7b71-11e8-9efa-42010a8e0004), 0cc052d7-7bc1-11e8-9221-0a58ac1010b6_ci(8d65e8f6-7bc2-11e8-9efa-42010a8e0004), 185414df-7bcd-11e8-9221-0a58ac1010b6_ci(2a14667e-7bcd-11e8-9efa-42010a8e0004), 4bc19590-7bd3-11e8-9221-0a58ac1010b6_ci(4f184134-7bd3-11e8-9efa-42010a8e0004), logging-elasticsearch-build_ci-op-w8v4vhdy(b7ae1050-7be4-11e8-9efa-42010a8e0004), node-build_ci-op-n6ppr6fr(532f0dd6-7bda-11e8-9efa-42010a8e0004), 47a22364-7bde-11e8-9221-0a58ac1010b6_ci(585a7d49-7bde-11e8-9efa-42010a8e0004), 181f1ae6-7bc3-11e8-9221-0a58ac1010b6_ci(1a958295-7bc3-11e8-9efa-42010a8e0004), ovs-rt5t8_openshift-sdn(94875855-7b71-11e8-9efa-42010a8e0004), rpms-build_ci-op-fzwj3xx9(4382398a-7be0-11e8-9efa-42010a8e0004), template-service-broker-build_ci-op-fzwj3xx9(3ebf51c0-7be1-11e8-9efa-42010a8e0004), haproxy-router-build_ci-op-fzwj3xx9(55c17992-7be1-11e8-9efa-42010a8e0004), rpm-repo-1-fclrg_ci-op-fzwj3xx9(1d53c65a-7be1-11e8-9efa-42010a8e0004), 389dac88-7bce-11e8-8b37-0a58ac1010a8_ci(478f3172-7bce-11e8-9efa-42010a8e0004), cleanup-when-idle_ci-op-qvmd1xdx(4e65f2f8-7bda-11e8-9efa-42010a8e0004), 10cf0577-7ba3-11e8-bb72-0a58ac100bda_ci(14712ac7-7ba3-11e8-9efa-42010a8e0004), base-build_ci-op-fzwj3xx9(2a084c86-7be1-11e8-9efa-42010a8e0004), 4bc36399-7bd3-11e8-9221-0a58ac1010b6_ci(4f122b85-7bd3-11e8-9efa-42010a8e0004), bin-build_ci-op-fzwj3xx9(5e4d3618-7bdf-11e8-9efa-42010a8e0004), keepalived-ipfailover-build_ci-op-g3r9fc5t(c1b135de-7be4-11e8-9efa-42010a8e0004), 3bc3469e-7bda-11e8-9221-0a58ac1010b6_ci(4b1fd707-7bda-11e8-9efa-42010a8e0004), cli-build_ci-op-13kmrit5(7bae9fee-7be0-11e8-9efa-42010a8e0004), service-cert-sync-gf2qm_openshift-node(947b92c8-7b71-11e8-9efa-42010a8e0004), cli-build_ci-op-fzwj3xx9(3e78577a-7be1-11e8-9efa-42010a8e0004), 5940121b-7bda-11e8-9221-0a58ac1010b6_ci(5d283538-7bda-11e8-9efa-42010a8e0004), verify_ci-op-pxrzgxt5(07d8e36d-7be5-11e8-9efa-42010a8e0004), template-service-broker-build_ci-op-c9p0w501(5094e51b-7be1-11e8-9efa-42010a8e0004), docker-builder-build_ci-op-fzwj3xx9(55c735f3-7be1-11e8-9efa-42010a8e0004), hyperkube-build_ci-op-fzwj3xx9(3e77e09e-7be1-11e8-9efa-42010a8e0004), f5-router-build_ci-op-fzwj3xx9(55cc36f1-7be1-11e8-9efa-42010a8e0004), b8eb2240-7bde-11e8-9221-0a58ac1010b6_ci(c39cd0b2-7bde-11e8-9efa-42010a8e0004), rpms-build_ci-op-g3r9fc5t(2d167a0c-7be3-11e8-9efa-42010a8e0004), 181d4cd0-7bc3-11e8-9221-0a58ac1010b6_ci(1a95ad57-7bc3-11e8-9efa-42010a8e0004), 3bc61b0a-7bda-11e8-9221-0a58ac1010b6_ci(4b28a1ee-7bda-11e8-9efa-42010a8e0004), deployer-build_ci-op-fzwj3xx9(56302faa-7be1-11e8-9efa-42010a8e0004), cleanup-when-idle_ci-op-hycs68gr(a7d8e906-7bda-11e8-9efa-42010a8e0004), egress-dns-proxy-build_ci-op-fzwj3xx9(3ea3f894-7be1-11e8-9efa-42010a8e0004), prometheus-node-exporter-gvvc4_openshift-monitoring(946f4ba5-7b71-11e8-9efa-42010a8e0004), 73db5708-7bd7-11e8-9221-0a58ac1010b6_ci(800e120e-7bd7-11e8-9efa-42010a8e0004), integration_ci-op-b41n92cl(421f59cf-7bb8-11e8-9efa-42010a8e0004), src-build_ci-op-qvmd1xdx(4e7d5e16-7bda-11e8-9efa-42010a8e0004), d370d36f-7bcd-11e8-9221-0a58ac1010b6_ci(dcf49351-7bcd-11e8-9efa-42010a8e0004), 3bc83097-7bda-11e8-9221-0a58ac1010b6_ci(4b27e9f9-7bda-11e8-9efa-42010a8e0004), 2303279a-7bba-11e8-a3ff-0a58ac100037_ci(8d66c4c4-7bc2-11e8-9efa-42010a8e0004), src-build_ci-op-hycs68gr(a80b7a3e-7bda-11e8-9efa-42010a8e0004), unit_ci-op-hycs68gr(4f859761-7bdb-11e8-9efa-42010a8e0004), recycler-build_ci-op-13kmrit5(b6372b93-7be0-11e8-9efa-42010a8e0004), sync-gbt4f_openshift-node(94690d8f-7b71-11e8-9efa-42010a8e0004)"
Jun 29 21:48:21 origin-ci-ig-n-r7r2 origin-node[26433]: I0629 21:48:21.998630   26433 kubelet.go:1894] SyncLoop (DELETE, "api"): "integration_ci-op-b41n92cl(421f59cf-7bb8-11e8-9efa-42010a8e0004)"
Jun 29 21:48:22 origin-ci-ig-n-r7r2 origin-node[26433]: I0629 21:48:22.030187   26433 kubelet.go:1888] SyncLoop (REMOVE, "api"): "integration_ci-op-b41n92cl(421f59cf-7bb8-11e8-9efa-42010a8e0004)"
Jun 29 21:48:22 origin-ci-ig-n-r7r2 origin-node[26433]: I0629 21:48:22.030274   26433 kubelet.go:2082] Failed to delete pod "integration_ci-op-b41n92cl(421f59cf-7bb8-11e8-9efa-42010a8e0004)", err: pod not found

Comment 2 Seth Jennings 2018-07-26 20:50:01 UTC
Clayton, has this still been happening?

Comment 3 Seth Jennings 2018-08-13 21:20:27 UTC
I tried to follow the integration pod though the logs, but there isn't enough information to figure out what's happening here.

For the breadcrumbs I could put together in case this is hit again:

PLEG sees the sandbox and two containers come up by 16:26:09.

Before 17:32:56 node process is restarted (pid change).

At 17:33:02 one of the containers (c03189f32d4c4e2b41dab23e31bc464e1857110565a2f21a10e96942d154cb59) is dead

By 17:33:05, the other container and sandbox are dead too.

A DELETE from the api server isn't seen until after the restart when the pod cleans up.


Note You need to log in before you can comment on or make changes to this bug.