Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1364243 - Terminating Pod does not get rescheduled to another node when node is NotReady
Summary: Terminating Pod does not get rescheduled to another node when node is NotReady
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Pod
Version: 3.3.0
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: ---
Assignee: Derek Carr
QA Contact: Vikas Laad
: 1343157 1365657 (view as bug list)
Depends On:
TreeView+ depends on / blocked
Reported: 2016-08-04 19:19 UTC by Vikas Laad
Modified: 2017-03-08 18:43 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Last Closed: 2017-01-18 12:51:59 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:0066 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.4 RPM Release Advisory 2017-01-18 17:23:26 UTC

Description Vikas Laad 2016-08-04 19:19:59 UTC
Description of problem:
Pod got stuck in Terminating state see

Docker was not responding on that node, so I did a reboot. Now node is not coming back because of possibly

Since the node is now showing in NotReady state pod should be rescheduled to another node, but its not happening.

root@300node-support-2: ~/svt/openshift_scalability # oc get pods --all-namespaces -o wide
NAMESPACE           NAME                          READY     STATUS        RESTARTS   AGE       IP            NODE 
clusterproject266   deploymentconfig2v0-1-9us8s   1/1       Terminating   0          1d 

root@300node-support-2: ~/svt/openshift_scalability # oc get nodes | grep    NotReady                   6d      

Version-Release number of selected component (if applicable):
openshift v3.3.0.10 
kubernetes v1.3.0+57fb9ac  
etcd 2.3.0+git 

How reproducible:

Steps to Reproduce:
1. Pod is terminating and if the node becomes NotReady

Actual results:
Pod is stuck in Terminating state and Project does not get deleted.

Expected results:
Pod should be rescheduled to another Ready node

Additional info:

Comment 1 Andy Goldstein 2016-08-04 19:29:25 UTC
If you wait > 5 minutes, does the DeploymentConfig create a new pod on another node?

Comment 2 Vikas Laad 2016-08-04 19:55:52 UTC
No, this Terminating pod is stuck for a day. Node became NotReady for few hours now, still it was not creating on another node.

Comment 3 Andy Goldstein 2016-08-04 20:02:11 UTC
Derek would you mind looking at this? I think this may reproduce on a multi-node cluster by just stopping Docker on one node and waiting >5 minutes to see if the NodeController evicts the pods on the NotReady node.

Comment 4 Andy Goldstein 2016-08-04 20:03:44 UTC
I do want to clarify that pods never get rescheduled. If you have a scalable resource (replication controller, deployment config), that will attempt to create new pods to replace failed ones, but a pod by itself is never moved or rescheduled. Just wanted to make sure that's clear :-)

Comment 5 Derek Carr 2016-08-12 15:07:12 UTC
*** Bug 1365657 has been marked as a duplicate of this bug. ***

Comment 6 Derek Carr 2016-08-12 20:25:01 UTC
To summarize the full set of discussion topics in this thread:

1. The kubelet will wait 5 minutes before transitioning from a Ready to NotReady state if the kubelet container runtime goes down.  I think this time is too long, and its not tunable by operators since its hard-coded.  

See upstream issue to try and come to a consensus:

2. The node controller does not evict a Pod if its in Terminating state, and its the ONLY pod scheduled to that node that required eviction.  This is because the node controller identifies that the nodes on the pod should be evicted, but because its the only pod on the node, and it has a TerminationGracePeriodSeconds, the current logic skips a delete on it, and it never goes into the terminating evictor queue.

See upstream issue to try and determine how to refactor:

The operator can forcefully delete the pod in question by doing:
$ oc delete pods <pod> --grace-period=0

Given this is an edge-case, and its fix requires a larger refactor, I am marking this UPCOMING_RELEASE and hope to get fixes into kubernetes 1.4 to be picked up by OpenShift upon that rebase.

Comment 7 Derek Carr 2016-08-15 17:23:23 UTC
*** Bug 1343157 has been marked as a duplicate of this bug. ***

Comment 8 Derek Carr 2016-08-18 15:44:20 UTC
Upstream PR for node controller not removing terminating pods from a node if it was the only pod on the node:

Comment 9 Derek Carr 2016-08-18 18:01:48 UTC
Origin PR

Comment 10 Derek Carr 2016-09-30 14:35:11 UTC
This should be fixed as the requisite origin pr has merged above.

Comment 11 Vikas Laad 2016-10-28 16:16:58 UTC
Tested with following scenario
- Created 2 nodes cluster
- Created projects which has pods on both the nodes
- Stopped docker on one of the nodes
- Deleted projects immediately
- Node becomes NotReady and Pods stay in Terminating state (It was stuck in this state)
- After few minutes Pods are gone, Node is Still in NotReady state
- Stared docker back on that node, Node is Ready and everything is good.

Comment 12 Vikas Laad 2016-10-28 16:18:10 UTC
Verified in following version

openshift v3.4.0.16+cc70b72
kubernetes v1.4.0+776c994
etcd 3.1.0-rc.0

Comment 14 errata-xmlrpc 2017-01-18 12:51:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

Note You need to log in before you can comment on or make changes to this bug.