Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1601572 - oc commands hangs - Too many requests, please try again later ocp v3.9.27 [NEEDINFO]
Summary: oc commands hangs - Too many requests, please try again later ocp v3.9.27
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Pod
Version: 3.9.0
Hardware: x86_64
OS: All
unspecified
medium
Target Milestone: ---
: ---
Assignee: Seth Jennings
QA Contact: DeShuai Ma
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-07-16 17:47 UTC by Vítor Corrêa
Modified: 2018-07-30 21:42 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-07-30 21:42:05 UTC
Target Upstream Version:
sjenning: needinfo? (vcorrea)


Attachments (Terms of Use)

Description Vítor Corrêa 2018-07-16 17:47:10 UTC
Hello Engineering. I am re opening for 3.9.27 what I found to be a already solved bug.


Description: After try to delete a pod the oc cli hangs. 
( https://access.redhat.com/solutions/3462751 )
*We suggested to disable service catalog, but it didn't solved.


***The following messages are displayed at journal:

ERROR LOGS: Jul 11 16:00:03 scwxd0059cld dockerd-current: time="2018-07-11T16:00:03.146933789-03:00" level=error msg="Handler for DELETE /v1.26/containers/fcb0fc1efcb4 returned error: Driver devicemapper failed to remove root filesystem fcb0fc1efcb4f708eeacd4d59ff1081f7e891cbd62163d63d31fed335e575e27: error while removing /var/lib/docker/devicemapper/mnt/ad0d26669b47e4ee2db47eb197098e5767122b2fa08fc63eef0c62e8c97d0a3a: invalid argument"

dockerd-current: time="2018-07-11T16:00:01.131661961-03:00" level=error msg="Error removing mounted layer ad9afa94330eef6fc7cf17be8f6ab31ae1cc1dc17f124b2fea59324d1bd2386b: error while removing /var/lib/docker/devicemapper/mnt/b60c1dac74d0421f8f2f68c66b9ab6e1bcb535be3654f4188bb851841fc03e11: invalid argument"
Jul 11 16:00:01 scwxd0059cld kernel: device-mapper: thin: Deletion of thin device 1297 failed.
Jul 11 16:00:01 scwxd0059cld atomic-openshift-node: E0711 16:00:01.134161    2478 remote_runtime.go:132] RemovePodSandbox "ad9afa94330eef6fc7cf17be8f6ab31ae1cc1dc17f124b2fea59324d1bd2386b" from runtime service failed: rpc error: code = Unknown desc = Error response from daemon: Driver devicemapper failed to remove root filesystem ad9afa94330eef6fc7cf17be8f6ab31ae1cc1dc17f124b2fea59324d1bd2386b: error while removing /var/lib/docker/devicemapper/mnt/b60c1dac74d0421f8f2f68c66b9ab6e1bcb535be3654f4188bb851841fc03e11: invalid argument
Jul 11 16:00:01 scwxd0059cld atomic-openshift-node: E0711 16:00:01.134197    2478 kuberuntime_gc.go:157] Failed to remove sandbox "ad9afa94330eef6fc7cf17be8f6ab31ae1cc1dc17f124b2fea59324d1bd2386b": rpc error: code = Unknown desc = Error response from daemon: Driver devicemapper failed to remove root filesystem ad9afa94330eef6fc7cf17be8f6ab31ae1cc1dc17f124b2fea59324d1bd2386b: error while removing /var/lib/docker/devicemapper/mnt/b60c1dac74d0421f8f2f68c66b9ab6e1bcb535be3654f4188bb851841fc03e11: invalid argument

and

Jul 11 12:18:25 scwxd0059cld atomic-openshift-master-controllers[47932]: E0711 12:18:25.886538   47932 reflector.go:205] github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/garbagecollector/graph_builder.go:124: Failed to list <nil>: the server has received too many requests and has asked us to try again later (get 
serviceinstances.servicecatalog.k8s.io)


The issue actually seems to be a mix of the following bugs:
1. https://access.redhat.com/solutions/3150891 ( fixed on: Upgrade the kernel to kernel-3.10.0-693.el7.x86_64 and docker package to docker-1.12.6-48.git0fdc778.el7.x86_64.

2. https://bugzilla.redhat.com/show_bug.cgi?id=1403027
3. https://bugzilla.redhat.com/show_bug.cgi?id=1450554
4. https://bugzilla.redhat.com/show_bug.cgi?id=1573460



Customer components version:
[root@lnx sosreport-scwxd0059cld-20180711160011]# cat uname 
Linux scwxd0059cld 3.10.0-862.3.2.el7.x86_64 #1 SMP Tue May 15 18:22:15 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux


[root@lnx sosreport-scwxd0059cld-20180711160011]# cat etc/redhat-release 
Red Hat Enterprise Linux Server release 7.5 (Maipo)






addtional info:


grep b60c1dac74d0421f /proc/*/mountinfo

[root@lnx sosreport-scwxd0059cld-20180711160011]# grep b60c1dac74d0421f proc/*/mountinfo
215 89 253:15 / /var/lib/docker/devicemapper/mnt/b60c1dac74d0421f8f2f68c66b9ab6e1bcb535be3654f4188bb851841fc03e11 rw,relatime shared:136 - xfs /dev/mapper/docker-253:8-6475908-b60c1dac74d0421f8f2f68c66b9ab6e1bcb535be3654f4188bb851841fc03e11 rw,context="system_u:object_r:container_file_t:s0:c5,c6",nouuid,attr2,inode64,sunit=1024,swidth=1024,noquota

grep ad0d26669b47e4 /proc/*/mountinfo

[root@lnx sosreport-scwxd0059cld-20180711160011]# grep ad0d26669b47e4 proc/*/mountinfo
181 89 253:13 / /var/lib/docker/devicemapper/mnt/ad0d26669b47e4ee2db47eb197098e5767122b2fa08fc63eef0c62e8c97d0a3a rw,relatime shared:125 - xfs /dev/mapper/docker-253:8-6475908-ad0d26669b47e4ee2db47eb197098e5767122b2fa08fc63eef0c62e8c97d0a3a rw,context="system_u:object_r:container_file_t:s0:c0,c10",nouuid,attr2,inode64,sunit=1024,swidth=1024,noquota


Regards

Comment 2 Juan Vallejo 2018-07-16 18:52:59 UTC
Adding Pod team

Comment 3 Juan Vallejo 2018-07-18 14:16:43 UTC
> The issue actually seems to be a mix of the following bugs:
> 1. https://access.redhat.com/solutions/3150891 ( fixed on: Upgrade the kernel 
> to kernel-3.10.0-693.el7.x86_64 and docker package to docker-1.12.6-
> 48.git0fdc778.el7.x86_64.

Just to make sure I'm understanding correctly, upgrading the kernel solved the first error message, but you are still seeing `oc delete` hang when deleting pods?

> github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/garbagecollector/graph_builder.go:124: Failed to list <nil>: the server has received too many requests and has asked us to try again later (get 
serviceinstances.servicecatalog.k8s.io)

David, I don't think this is related to the first error message seen. Looks like the node is under high load?

Comment 4 Juan Vallejo 2018-07-18 14:28:30 UTC
> github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/controller/garbagecollector/graph_builder.go:124: Failed to list <nil>: the server has received too many requests and has asked us to try again later (get 
serviceinstances.servicecatalog.k8s.io)

Clayton, could you provide a prometheus query that would be useful in identifying what is spamming the server here?


Note You need to log in before you can comment on or make changes to this bug.