Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1695305 - Docker 1.13.1-94 upgrade leads to write parent: broken pipe with OCS 3.11.1 container
Summary: Docker 1.13.1-94 upgrade leads to write parent: broken pipe with OCS 3.11.1 c...
Keywords:
Status: ON_QA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: docker
Version: 7.6
Hardware: All
OS: All
high
high
Target Milestone: rc
: ---
Assignee: Lokesh Mandvekar
QA Contact: atomic-bugs@redhat.com
URL:
Whiteboard:
Depends On:
Blocks: 1668273
TreeView+ depends on / blocked
 
Reported: 2019-04-02 19:31 UTC by Matthew Robson
Modified: 2019-04-16 15:11 UTC (History)
9 users (show)

Fixed In Version: docker-1.13.1-96.gitb2f74b2.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)

Description Matthew Robson 2019-04-02 19:31:06 UTC
Description of problem:

Upgrading docker from 1.13.1-75 to 1.13.1-94 leads to issues with the Red Hat Gluster pods from OCS starting correctly.

The issue is very similar to: https://bugzilla.redhat.com/show_bug.cgi?id=1655214

# oc -n cns rsh glusterfs-btf9v
rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:100: writing config to pipe caused \"write parent: broken pipe\""

When the container first starts, it does not exhibit the problem right away. It seems to manifest around the time of the gluster startup script running.

This eventually causes all of the probes to fail.

Mar 27 12:49:51 node atomic-openshift-node[30814]: I0327 12:49:51.306365   30814 kubelet.go:1865] SyncLoop (PLEG): "glusterfs-btf9v_cns(7b0ce9ef-50c9-11e9-b9b0-005056832285)", event: &pleg.PodLifecycleEvent{ID:"7b0ce9ef-50c9-11e9-b9b0-005056832285", Type:"ContainerStarted", Data:"e5c74c17cc6c260d7b1b1793c8a4217a484f74092431ec0ae306712cbf632713"}
Mar 27 12:49:52 node atomic-openshift-node[30814]: I0327 12:49:52.327877   30814 kubelet.go:1865] SyncLoop (PLEG): "glusterfs-btf9v_cns(7b0ce9ef-50c9-11e9-b9b0-005056832285)", event: &pleg.PodLifecycleEvent{ID:"7b0ce9ef-50c9-11e9-b9b0-005056832285", Type:"ContainerStarted", Data:"814fe287c745b5249384ccce9d42733e992bbbb43e93477c0691a6e995a6320b"}

Mar 27 12:54:37 node atomic-openshift-node[30814]: I0327 12:54:37.226983   30814 prober.go:111] Readiness probe for "glusterfs-btf9v_cns(7b0ce9ef-50c9-11e9-b9b0-005056832285):glusterfs" failed (failure): rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:100: writing config to pipe caused \"write parent: broken pipe\""
Mar 27 12:54:41 node atomic-openshift-node[30814]: I0327 12:54:41.107285   30814 prober.go:111] Liveness probe for "glusterfs-btf9v_cns(7b0ce9ef-50c9-11e9-b9b0-005056832285):glusterfs" failed (failure): rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "process_linux.go:100: writing config to pipe caused \"write parent: broken pipe\""


Other containers on the node such as the OpenShift daemons sets (sdn, fluentd) and other OCS components like Heketi did not see the same issue.

Running a standard RHEL 7.6 container did not see the issue:

[root@node ~]# docker run --name test -d --rm registry.access.redhat.com/rhel7:7.6 sleep 100
Unable to find image 'registry.access.redhat.com/rhel7:7.6' locally
Trying to pull repository registry.access.redhat.com/rhel7 ...
7.6: Pulling from registry.access.redhat.com/rhel7
da59b306fcf5: Already exists
e23b0afac3fa: Already exists
Digest: sha256:93a7dcd8b5f2eeb4e37066478f6d3d579e55e09ce3e10ec3fd7fb788b9f92da6
Status: Downloaded newer image for registry.access.redhat.com/rhel7:7.6
71ba854cc956443a28454d8da7795555604dc80f8fcc5d0bcff3ce80d4b7d140
[root@node ~]# docker exec test date
Wed Mar 27 22:03:17 UTC 2019


Side note:

The issue does not exist when upgrading from -74 to docker 1.13.1-91


Version-Release number of selected component (if applicable):
docker-client-1.13.1-94.gitb2f74b2.el7

How reproducible:
Always after upgrade

Steps to Reproduce:
The issue was reproduced as follows:
- unlabel node to remove glusterfs pod.
- RHEL patching with docker upgrade
- reboot node
- re-label node after server reboots and core services are back online.
- pod fails to completely start all services after a period of time (around 10 minutes or less), same lack of responsiveness for target docker container.
- can still access other pods running on the same node, only issues with glusterfs pod.


Actual results:
Fails to start

Expected results:
Should start without error

Additional info:


Note You need to log in before you can comment on or make changes to this bug.