Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1690162 - Cluster did not acknowledge request to upgrade: [NEEDINFO]
Summary: Cluster did not acknowledge request to upgrade:
Keywords:
Status: VERIFIED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.1
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: 4.1.0
Assignee: Sergiusz Urbaniak
QA Contact: Junqi Zhao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-18 23:05 UTC by Ben Parees
Modified: 2019-04-04 02:55 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
bparees: needinfo? (adahiya)


Attachments (Terms of Use)

Description Ben Parees 2019-03-18 23:05:57 UTC
Description of problem:
fail [github.com/openshift/origin/test/e2e/upgrade/upgrade.go:81]: during upgrade
Expected error:
    <*errors.errorString | 0xc421f86580>: {
        s: "Cluster did not acknowledge request to upgrade: timed out waiting for the condition",
    }
    Cluster did not acknowledge request to upgrade: timed out waiting for the condition
not to have occurred


https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/234/

Version-Release number of the following components:
4.1

How reproducible:
Appear to be a flake.

Comment 2 Abhinav Dahiya 2019-03-19 00:15:50 UTC
From

`clsuterversion.yaml`

```
{
            "apiVersion": "config.openshift.io/v1",
            "kind": "ClusterVersion",
            "metadata": {
                "creationTimestamp": "2019-03-17T04:27:04Z",
                "generation": 2,
                "name": "version",
                "resourceVersion": "65775",
                "selfLink": "/apis/config.openshift.io/v1/clusterversions/version",
                "uid": "ea967f21-486c-11e9-bd56-120db6c725be"
            },
            "spec": {
                "channel": "stable-4.0",
                "clusterID": "1323fa57-4285-43eb-a338-c36bfb4b0a07",
                "desiredUpdate": {
                    "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-17-041405",
                    "version": ""
                },
                "upstream": "https://api.openshift.com/api/upgrades_info/v1/graph"
            },
            "status": {
                "availableUpdates": null,
                "conditions": [
                    {
                        "lastTransitionTime": "2019-03-17T04:40:43Z",
                        "message": "Done applying 4.0.0-0.ci-2019-03-17-010422",
                        "status": "True",
                        "type": "Available"
                    },
                    {
                        "lastTransitionTime": "2019-03-17T05:31:14Z",
                        "status": "False",
                        "type": "Failing"
                    },
                    {
                        "lastTransitionTime": "2019-03-17T04:43:16Z",
                        "message": "Working towards 4.0.0-0.ci-2019-03-17-041405: 90% complete",
                        "status": "True",
                        "type": "Progressing"
                    },
                    {
                        "lastTransitionTime": "2019-03-17T04:27:18Z",
                        "message": "Unable to retrieve available updates: unknown version 4.0.0-0.ci-2019-03-17-041405",
                        "reason": "RemoteFailed",
                        "status": "False",
                        "type": "RetrievedUpdates"
                    }
                ],
                "desired": {
                    "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-17-041405",
                    "version": "4.0.0-0.ci-2019-03-17-041405"
                },
                "history": [
                    {
                        "completionTime": null,
                        "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-17-041405",
                        "startedTime": "2019-03-17T05:34:01Z",
                        "state": "Partial",
                        "version": "4.0.0-0.ci-2019-03-17-041405"
                    },
                    {
                        "completionTime": "2019-03-17T05:34:01Z",
                        "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-17-041405",
                        "startedTime": "2019-03-17T04:44:17Z",
                        "state": "Partial",
                        "version": "4.0.0-0.ci-2019-03-17-041405"
                    },
                    {
                        "completionTime": "2019-03-17T04:44:17Z",
                        "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-17-041405",
                        "startedTime": "2019-03-17T04:43:16Z",
                        "state": "Partial",
                        "version": "4.0.0-0.ci-2019-03-17-041405"
                    },
                    {
                        "completionTime": "2019-03-17T04:43:16Z",
                        "image": "registry.svc.ci.openshift.org/ocp/release@sha256:9972758232e864ad3f4756127ff4401ed1e313f4ae956f42e44354c83fa4d23b",
                        "startedTime": "2019-03-17T04:27:18Z",
                        "state": "Completed",
                        "version": "4.0.0-0.ci-2019-03-17-010422"
                    }
                ],
                "observedGeneration": 2,
                "versionHash": "vjzTSO3pwl0="
            }
        }
```

The desired version is `4.0.0-0.ci-2019-03-17-041405`

The only clusteroperator that is not at `4.0.0-0.ci-2019-03-17-041405` is monitoring

```
        {
            "apiVersion": "config.openshift.io/v1",
            "kind": "ClusterOperator",
            "metadata": {
                "creationTimestamp": "2019-03-17T04:33:34Z",
                "generation": 1,
                "name": "monitoring",
                "resourceVersion": "56407",
                "selfLink": "/apis/config.openshift.io/v1/clusteroperators/monitoring",
                "uid": "d37049c6-486d-11e9-8028-0a28398410f8"
            },
            "spec": {},
            "status": {
                "conditions": [
                    {
                        "lastTransitionTime": "2019-03-17T05:22:26Z",
                        "message": "Successfully rolled out the stack.",
                        "status": "True",
                        "type": "Available"
                    },
                    {
                        "lastTransitionTime": "2019-03-17T05:22:26Z",
                        "status": "False",
                        "type": "Progressing"
                    },
                    {
                        "lastTransitionTime": "2019-03-17T05:22:26Z",
                        "status": "False",
                        "type": "Failing"
                    }
                ],
                "extension": null,
                "relatedObjects": null,
                "versions": [
                    {
                        "name": "operator",
                        "version": "4.0.0-0.ci-2019-03-17-010422"
                    }
                ]
            }
        },
```

Comment 3 Abhinav Dahiya 2019-03-19 00:16:24 UTC
From https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/242/

`clsuterversion.yaml`

```
{
            "apiVersion": "config.openshift.io/v1",
            "kind": "ClusterVersion",
            "metadata": {
                "creationTimestamp": "2019-03-17T04:27:04Z",
                "generation": 2,
                "name": "version",
                "resourceVersion": "65775",
                "selfLink": "/apis/config.openshift.io/v1/clusterversions/version",
                "uid": "ea967f21-486c-11e9-bd56-120db6c725be"
            },
            "spec": {
                "channel": "stable-4.0",
                "clusterID": "1323fa57-4285-43eb-a338-c36bfb4b0a07",
                "desiredUpdate": {
                    "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-17-041405",
                    "version": ""
                },
                "upstream": "https://api.openshift.com/api/upgrades_info/v1/graph"
            },
            "status": {
                "availableUpdates": null,
                "conditions": [
                    {
                        "lastTransitionTime": "2019-03-17T04:40:43Z",
                        "message": "Done applying 4.0.0-0.ci-2019-03-17-010422",
                        "status": "True",
                        "type": "Available"
                    },
                    {
                        "lastTransitionTime": "2019-03-17T05:31:14Z",
                        "status": "False",
                        "type": "Failing"
                    },
                    {
                        "lastTransitionTime": "2019-03-17T04:43:16Z",
                        "message": "Working towards 4.0.0-0.ci-2019-03-17-041405: 90% complete",
                        "status": "True",
                        "type": "Progressing"
                    },
                    {
                        "lastTransitionTime": "2019-03-17T04:27:18Z",
                        "message": "Unable to retrieve available updates: unknown version 4.0.0-0.ci-2019-03-17-041405",
                        "reason": "RemoteFailed",
                        "status": "False",
                        "type": "RetrievedUpdates"
                    }
                ],
                "desired": {
                    "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-17-041405",
                    "version": "4.0.0-0.ci-2019-03-17-041405"
                },
                "history": [
                    {
                        "completionTime": null,
                        "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-17-041405",
                        "startedTime": "2019-03-17T05:34:01Z",
                        "state": "Partial",
                        "version": "4.0.0-0.ci-2019-03-17-041405"
                    },
                    {
                        "completionTime": "2019-03-17T05:34:01Z",
                        "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-17-041405",
                        "startedTime": "2019-03-17T04:44:17Z",
                        "state": "Partial",
                        "version": "4.0.0-0.ci-2019-03-17-041405"
                    },
                    {
                        "completionTime": "2019-03-17T04:44:17Z",
                        "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-17-041405",
                        "startedTime": "2019-03-17T04:43:16Z",
                        "state": "Partial",
                        "version": "4.0.0-0.ci-2019-03-17-041405"
                    },
                    {
                        "completionTime": "2019-03-17T04:43:16Z",
                        "image": "registry.svc.ci.openshift.org/ocp/release@sha256:9972758232e864ad3f4756127ff4401ed1e313f4ae956f42e44354c83fa4d23b",
                        "startedTime": "2019-03-17T04:27:18Z",
                        "state": "Completed",
                        "version": "4.0.0-0.ci-2019-03-17-010422"
                    }
                ],
                "observedGeneration": 2,
                "versionHash": "vjzTSO3pwl0="
            }
        }
```

The desired version is `4.0.0-0.ci-2019-03-17-041405`

The only clusteroperator that is not at `4.0.0-0.ci-2019-03-17-041405` is monitoring

```
        {
            "apiVersion": "config.openshift.io/v1",
            "kind": "ClusterOperator",
            "metadata": {
                "creationTimestamp": "2019-03-17T04:33:34Z",
                "generation": 1,
                "name": "monitoring",
                "resourceVersion": "56407",
                "selfLink": "/apis/config.openshift.io/v1/clusteroperators/monitoring",
                "uid": "d37049c6-486d-11e9-8028-0a28398410f8"
            },
            "spec": {},
            "status": {
                "conditions": [
                    {
                        "lastTransitionTime": "2019-03-17T05:22:26Z",
                        "message": "Successfully rolled out the stack.",
                        "status": "True",
                        "type": "Available"
                    },
                    {
                        "lastTransitionTime": "2019-03-17T05:22:26Z",
                        "status": "False",
                        "type": "Progressing"
                    },
                    {
                        "lastTransitionTime": "2019-03-17T05:22:26Z",
                        "status": "False",
                        "type": "Failing"
                    }
                ],
                "extension": null,
                "relatedObjects": null,
                "versions": [
                    {
                        "name": "operator",
                        "version": "4.0.0-0.ci-2019-03-17-010422"
                    }
                ]
            }
        },
```

Comment 4 Abhinav Dahiya 2019-03-19 00:30:18 UTC
Also from CVO logs at https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/242/artifacts/e2e-aws-upgrade/pods/openshift-cluster-version_cluster-version-operator-565d946558-9ztzw_cluster-version-operator.log.gz

```
I0317 05:37:29.940297       1 sync_worker.go:462] Running sync for clusteroperator "monitoring" (197 of 303)
...
E0317 05:38:29.940627       1 task.go:58] error running apply for clusteroperator "monitoring" (197 of 303): Cluster operator monitoring is still updating
```

CVO was ONLY waiting for monitoring towards the end around `05:38` But the upgrade test had defined test failed at `05:37` looking at https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/242/#operator-run-template-e2e-aws-upgrade---e2e-aws-upgrade-container-test

Also looking at the `clusterversion.yaml` `.status.history` the upgrade was started at 04:43 and still after around 50 mins upgrade had not finished.
```
                    {
                        "completionTime": null,
                        "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-17-041405",
                        "startedTime": "2019-03-17T05:34:01Z",
                        "state": "Partial",
                        "version": "4.0.0-0.ci-2019-03-17-041405"
                    },
                    {
                        "completionTime": "2019-03-17T05:34:01Z",
                        "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-17-041405",
                        "startedTime": "2019-03-17T04:44:17Z",
                        "state": "Partial",
                        "version": "4.0.0-0.ci-2019-03-17-041405"
                    },
                    {
                        "completionTime": "2019-03-17T04:44:17Z",
                        "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-17-041405",
                        "startedTime": "2019-03-17T04:43:16Z",
                        "state": "Partial",
                        "version": "4.0.0-0.ci-2019-03-17-041405"
                    },
                    {
                        "completionTime": "2019-03-17T04:43:16Z",
                        "image": "registry.svc.ci.openshift.org/ocp/release@sha256:9972758232e864ad3f4756127ff4401ed1e313f4ae956f42e44354c83fa4d23b",
                        "startedTime": "2019-03-17T04:27:18Z",
                        "state": "Completed",
                        "version": "4.0.0-0.ci-2019-03-17-010422"
                    }
```

from an upgrade test that was green the history there shows upgrade finished in around similar time 35 mins for 4.0.0-0.ci-2019-03-18-223512

```
                    {
                        "completionTime": "2019-03-18T23:40:40Z",
                        "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-18-223512",
                        "startedTime": "2019-03-18T23:25:10Z",
                        "state": "Completed",
                        "version": "4.0.0-0.ci-2019-03-18-223512"
                    },
                    {
                        "completionTime": "2019-03-18T23:25:10Z",
                        "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-18-223512",
                        "startedTime": "2019-03-18T23:07:10Z",
                        "state": "Partial",
                        "version": "4.0.0-0.ci-2019-03-18-223512"
                    },
                    {
                        "completionTime": "2019-03-18T23:07:10Z",
                        "image": "registry.svc.ci.openshift.org/ocp/release:4.0.0-0.ci-2019-03-18-223512",
                        "startedTime": "2019-03-18T23:06:24Z",
                        "state": "Partial",
                        "version": "4.0.0-0.ci-2019-03-18-223512"
                    },
                    {
                        "completionTime": "2019-03-18T23:06:24Z",
                        "image": "registry.svc.ci.openshift.org/ocp/release@sha256:0dfafc0de33a116643c8f89552ed3b82110ef56bf5671f12a16df72bede9e973",
                        "startedTime": "2019-03-18T22:52:31Z",
                        "state": "Completed",
                        "version": "4.0.0-0.ci-2019-03-18-202114"
                    }
```

Comment 5 Ben Parees 2019-03-19 00:54:01 UTC
> CVO was ONLY waiting for monitoring towards the end around `05:38` But the upgrade test had defined test failed at `05:37` 

Abhinav it sounds like you're saying that basically this wasn't a monitoring upgrade failure, it was a case where the entire upgrade took longer than the test was willing to wait, right?  i.e. something else caused the upgrade to take a long time (50+ mins instead of 35), earlier in the process.

Comment 6 W. Trevor King 2019-03-19 08:06:32 UTC
$ curl -s https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/242/artifacts/e2e-
aws-upgrade/pods/openshift-cluster-version_cluster-version-operator-565d946558-9ztzw_cluster-version-operator.log.gz | gunzip | head -n1
I0317 05:34:00.593130       1 start.go:23] ClusterVersionOperator v3.11.0-226-ga573b856-dirty

But https://storage.googleapis.com/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/242/artifacts/e2e-aws-upgrade/pods/openshift-cluster-version_cluster-version-operator-565d946558-9ztzw_cluster-version-operator_previous.log.gz is empty?  Are there logs from the partial deploys started at 2019-03-17T04:43:16Z and 2019-03-17T04:44:17Z?  What causes a partial entry in the history?  Why is the current CVO pod so young?

Comment 10 W. Trevor King 2019-03-22 05:58:30 UTC
This error has not occurred in the 861 *-e2e-aws* CI runs since 2019-03-19T12:28Z (and possibly earlier).  Maybe some subsequent pull fixed it?  Or maybe it's just low-probability.

Comment 16 Junqi Zhao 2019-04-04 02:55:19 UTC
no such issue with our upgrade 
eg: upgrade OCP from 4.0.0-0.9 -> 4.0.0-0.10


Note You need to log in before you can comment on or make changes to this bug.