Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1512370 - [free-stg] Long period ContainerCreating / Init:0/2
Summary: [free-stg] Long period ContainerCreating / Init:0/2
Keywords:
Status: CLOSED DUPLICATE of bug 1509799
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Pod
Version: 3.7.0
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 3.7.0
Assignee: Seth Jennings
QA Contact: DeShuai Ma
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-13 03:06 UTC by Justin Pierce
Modified: 2017-11-13 20:48 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-11-13 19:49:19 UTC


Attachments (Terms of Use)

Description Justin Pierce 2017-11-13 03:06:29 UTC
Description of problem:

NAME                                 READY     STATUS              RESTARTS   AGE
po/dancer-mysql-persistent-1-build   0/1       Init:0/2            0          9m
po/database-1-deploy                 0/1       ContainerCreating   0          9m



Version-Release number of selected component (if applicable):
[root@free-stg-master-03fb6 ~]# oc version
oc v3.7.4
kubernetes v1.7.6+a08f5eeb62
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://172.31.78.254:443
openshift v3.7.4
kubernetes v1.7.6+a08f5eeb62

Comment 4 Seth Jennings 2017-11-13 15:48:56 UTC
The CNI plugin is jammed (54k occurences in the node log):

1304 cni.go:304] Error deleting network when building cni runtime conf: could not retrieve port mappings: checkpoint is corrupted.

While the checkpoint file should not be corrupt, the docker shim should remove any corrupt checkpoint file, but it is not due to a bug.

buildCNIRuntimeConf() is modifying the err from plugin.host.GetPodPortMappings() as it propagates to the caller.  However, the caller checks the error against errors.CorruptCheckpointError to determine if the checkpoint file should be removed.  This will never be true as buildCNIRuntimeConf() is modifying the error.

Comment 5 Seth Jennings 2017-11-13 16:02:17 UTC
Sorry, meant to keep this one. Working on a fix.

Comment 6 Seth Jennings 2017-11-13 19:49:19 UTC
Sorry for the delay. The corrupt checkpoint messages, while nasty, are not the cause of the delay in sandbox start.  It is the vnid issue again.

*** This bug has been marked as a duplicate of bug 1509799 ***

Comment 7 Dan Winship 2017-11-13 20:48:29 UTC
(In reply to Seth Jennings from comment #4)
> While the checkpoint file should not be corrupt, the docker shim should
> remove any corrupt checkpoint file, but it is not due to a bug.
> 
> buildCNIRuntimeConf() is modifying the err from
> plugin.host.GetPodPortMappings() as it propagates to the caller.  However,
> the caller checks the error against errors.CorruptCheckpointError to
> determine if the checkpoint file should be removed.  This will never be true
> as buildCNIRuntimeConf() is modifying the error.

The newly-added check in docker_sandbox.go is also too late: the CorruptCheckpointError we're getting isn't coming from StopContainer(), it's coming from TearDownPod() a few lines earlier (via buildCNIRuntimeConf() -> GetPodPortMappings() -> GetCheckpoint()).


Note You need to log in before you can comment on or make changes to this bug.