Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1510556 - [free-int] Pod takes about 5 minutes to terminate on small / 9 node cluster
Summary: [free-int] Pod takes about 5 minutes to terminate on small / 9 node cluster
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Pod
Version: 3.7.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.9.0
Assignee: Andrew McDermott
QA Contact: DeShuai Ma
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-07 16:34 UTC by Justin Pierce
Modified: 2018-10-02 13:11 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-01-23 15:54:43 UTC


Attachments (Terms of Use)

Description Justin Pierce 2017-11-07 16:34:33 UTC
Description of problem:
The OpenShiftIO team is reporting a workflow where a Jenkins slave pod is taking about 5 minutes to terminate. This seems to be a regression from 3.6 which only took about 30 seconds. 

[root@free-int-master-3c664 ~]# oc get pods -w
NAME                         READY     STATUS        RESTARTS   AGE
content-repository-1-x5c2n   1/1       Running       0          54m
jenkins-1-kgvvl              1/1       Running       0          54m
jenkins-slave-zfk2k-11qjj    0/2       Terminating   0          4m
testapp4-s2i-2-build         0/1       Completed     0          3m



Version-Release number of selected component (if applicable):
oc v3.7.0-0.191.0


How reproducible:
Very

Comment 10 Seth Jennings 2017-11-08 21:33:00 UTC
Created attachment 1349584 [details]
jenkins-slave-1jr75-mnx78.log

Attached is the pod yaml during the time between the containers are dead and the pod remains in Terminating state for ~3m.

Doesn't look good.  Might be another status reset bug.  The maven container is in state Waiting with reason ContainerCreating, even though it has already run, the deletion timestamp is set, and the restartPolicy is Never.

  containerStatuses:
  - containerID: docker://4a1a05a769daf26c27eb7679ec3cab3e54a66bfa9751c838ff933a3edd130010
    image: docker.io/fabric8/jenkins-slave-base-centos7:0.0.1
    imageID: docker-pullable://docker.io/fabric8/jenkins-slave-base-centos7@sha256:ea43b1792c0bdaa70def8e177a74a51d675e313244d672b26c753d46d44c03dd
    lastState: {}
    name: jnlp
    ready: false
    restartCount: 0
    state:
      terminated:
        containerID: docker://4a1a05a769daf26c27eb7679ec3cab3e54a66bfa9751c838ff933a3edd130010
        exitCode: 143
        finishedAt: 2017-11-08T21:11:52Z
        reason: Error
        startedAt: 2017-11-08T21:10:15Z
  - image: fabric8/maven-builder:v7973e33
    imageID: ""
    lastState: {}
    name: maven
    ready: false
    restartCount: 0
    state:
      waiting:
        reason: ContainerCreating

The maven container status only has the image field set.  Everything else looks like it just got init'ed.

4a1a05a769daf26c27eb7679ec3cab3e54a66bfa9751c838ff933a3edd130010 (jnlp)
42b882ae907dc60c02a1e0093d5b94056e13084ddc22d43d091d8f5f6238e61e (maven)
3ad0702118ee9833f7964e24e3bc173df58d59de8b9e2b36874ce8ce46d28767 (sandbox)

Worth noting that the maven container was killed by the grace period while jnlp exited cleanly.

The pod is cleaned up after ~3m because we get a second

SyncLoop (DELETE, "api"): "jenkins-slave-1jr75-mnx78_jfchevrette-jenkins(26e5398a-c4c9-11e7-a24c-0ac586c2eb16)"

then 

kubelet.go:1864] SyncLoop (REMOVE, "api"): "jenkins-slave-1jr75-mnx78_jfchevrette-jenkins(26e5398a-c4c9-11e7-a24c-0ac586c2eb16)"

Which causes the kubelet to clean up the pod, since maven is not running as it is not allowed to run again, and the pod is deleted.

Comment 11 Seth Jennings 2017-11-08 21:33:41 UTC
Created attachment 1349585 [details]
node.log

Comment 13 Seth Jennings 2017-11-09 15:52:22 UTC
Justin, I don't think we should hold up on this.  There doesn't seem to be an impact to the end user here.  Is there some process that is waiting on the jenkins-slave pod to completely terminate?

Comment 14 Seth Jennings 2017-11-09 17:02:03 UTC
Joel is going to follow up here.  I'll work on it too when I can.

Comment 17 Andrew McDermott 2018-01-22 14:45:58 UTC
Is this still occurring? If so, is it still 5-6 minutes or closer to 2 minutes as mentioned in comment #16?

Comment 18 Seth Jennings 2018-01-23 15:54:43 UTC
Closing as free-int is no longer running 3.7.  Please reopen if the situation persists on 3.9.


Note You need to log in before you can comment on or make changes to this bug.