Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1685074 - Rollouts continuously get cancelled when using oc replace
Summary: Rollouts continuously get cancelled when using oc replace
Keywords:
Status: ASSIGNED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Command Line Interface
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 3.9.z
Assignee: Maciej Szulik
QA Contact: Xingxing Xia
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-04 10:12 UTC by rsandu
Modified: 2019-04-01 10:03 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-03-07 12:11:47 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description rsandu 2019-03-04 10:12:32 UTC
Description of problem: using "oc replace --force" in v3.9.{60,68} ends up with rollouts being continuously cancelled:

The "oc describe dc/rhel7-atomic" output:

Events:
  Type		Reason				Age			From				Message
  ----		------				----			----				-------
  Normal	DeploymentAwaitingCancellation	5s (x2 over 5s)		deploymentconfig-controller	Deployment of version 1 awaiting cancellation of older running deployments
  Normal	DeploymentCancelled		5s (x2 over 5s)		deploymentconfig-controller	Cancelled deployment "rhel7-atomic-1" superceded by version 1
  Normal	DeploymentCreated		4s (x21 over 5s)	deploymentconfig-controller	Created new replication controller "rhel7-atomic-1" for version 1

# oc get pods -o wide -w
[...]
rhel7-atomic-1-deploy   0/1       ContainerCreating   0         0s        <none>    node-1.local.lab
rhel7-atomic-1-deploy   0/1       Terminating   0         0s        <none>    node-1.local.lab
rhel7-atomic-1-deploy   0/1       Terminating   0         0s        <none>    node-1.local.lab
rhel7-atomic-1-deploy   0/1       Terminating   0         10s       <none>    node-1.local.lab
rhel7-atomic-1-deploy   0/1       Terminating   0         10s       <none>    node-1.local.lab
rhel7-atomic-1-deploy   0/1       Pending   0         0s        <none>    <none>
rhel7-atomic-1-deploy   0/1       Pending   0         0s        <none>    node-1.local.lab
rhel7-atomic-1-deploy   0/1       Terminating   0         0s        <none>    node-1.local.lab
rhel7-atomic-1-deploy   0/1       Terminating   0         0s        <none>    node-1.local.lab
rhel7-atomic-1-deploy   0/1       Terminating   0         10s       <none>    node-1.local.lab
rhel7-atomic-1-deploy   0/1       Terminating   0         10s       <none>    node-1.local.lab
rhel7-atomic-1-deploy   0/1       Pending   0         0s        <none>    <none>
rhel7-atomic-1-deploy   0/1       Pending   0         0s        <none>    node-1.local.lab
rhel7-atomic-1-deploy   0/1       ContainerCreating   0         0s        <none>    node-1.local.lab
rhel7-atomic-1-deploy   0/1       Terminating   0         0s        <none>    node-1.local.lab
rhel7-atomic-1-deploy   0/1       Terminating   0         0s        <none>    node-1.local.lab
[...]

Version-Release number of selected component (if applicable):
atomic-openshift-clients-3.9.68-1.git.0.76fd86e.el7.x86_64

How reproducible: always


Steps to Reproduce:
1. Create a project called "test-force-replace"
2. Run the attached break_dc.sh script
3. See "oc get pods -o wide -w" output

Actual results: rollout pods being continuously terminated in background.


Expected results: successful deployments.


Additional info: seems to be a similar issue as described in [1].

---

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1632654

Comment 2 rsandu 2019-03-04 10:21:40 UTC
The issue does not happen when using a higher oc client version, as atomic-openshift-clients-3.11.82-1.git.0.08bc31b.el7.x86_64

Comment 3 Maciej Szulik 2019-03-07 12:09:09 UTC
This is related to the GC changes that were introduced after 3.9, iow. previously we need to manually
remove all dependant objects and it looks like we didn't do a great job in case of replace and delete.
Newer versions have that fixed with proper deletion strategies.

Comment 4 Maciej Szulik 2019-03-07 12:11:47 UTC
This was fixed in newer versions and based on my previous comment we're not going to fix it in 3.9.

Comment 5 rsandu 2019-03-07 14:50:14 UTC
Hi Maciej.

Following up our earlier conversation, I'm reopening this as it seems the issue affects the ansible service broker role in openshift-ansible and 3.9 z-stream upgrades:

- https://github.com/openshift/openshift-ansible/blob/e88b6afadd622cf2e9f6f3a3ac5e85a22c2c425d/roles/ansible_service_broker/tasks/install.yml#L174-L180
- https://github.com/openshift/openshift-ansible/blob/5f79e1cb1a6c697e17749a169cd9fcccecd0ee09/roles/lib_openshift/library/oc_obj.py#L950-L962

Can we either reassess as backport fix for 3.9 or including the "--cascade=true" flag when using "oc replace" in the openshift-ansible service broker role?

Comment 6 Maciej Szulik 2019-03-08 12:55:21 UTC
I'll check what's possible.

Comment 7 rsandu 2019-04-01 10:03:58 UTC
Hi.

Any update regarding this bug?

Thank you.


Note You need to log in before you can comment on or make changes to this bug.