Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1687881 - UPGRADE Automated upgrade tests have never passed
Summary: UPGRADE Automated upgrade tests have never passed
Keywords:
Status: VERIFIED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Upgrade
Version: 4.1
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.1.0
Assignee: Clayton Coleman
QA Contact: liujia
URL:
Whiteboard:
: 1683648 (view as bug list)
Depends On:
Blocks: 1664187
TreeView+ depends on / blocked
 
Reported: 2019-03-12 14:41 UTC by Clayton Coleman
Modified: 2019-03-15 23:48 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:


Attachments (Terms of Use)

Description Clayton Coleman 2019-03-12 14:41:31 UTC
The new automated upgrade tests are failing due to what appears to be a certificate rotation / network connectivity issue.

https://openshift-gce-devel.appspot.com/build/origin-ci-test/logs/release-openshift-origin-installer-e2e-aws-upgrade-4.0/12

During upgrade a number of issues crop up, but one of the root issues is that etcd appears to be unreachable after upgrade.

2019-03-11 12:22:59.676142 I | embed: rejected connection from "127.0.0.1:52746" (error "tls: failed to verify client's certificate: x509: certificate specifies an incompatible key usage", ServerName "")
WARNING: 2019/03/11 12:22:59 Failed to dial 0.0.0.0:2379: connection error: desc = "transport: authentication handshake failed: remote error: tls: bad certificate"; please retry.

Until e2e upgrade jobs have passed more than once this issue will remain open and top priority.

Comment 1 Dan Winship 2019-03-13 00:09:06 UTC
I'm looking at a missing OVS flow problem right now... I'll either reassign this bug to myself or else file a new bug blocking this one once I figure out if that's the entire problem

Comment 2 Dan Winship 2019-03-13 02:28:48 UTC
Clayton, would it be possible to make e2e-aws-upgrade grab a set of logs from the cluster immediately before kicking off the upgrade? Eg, in https://gcsweb-ci.svc.ci.openshift.org/gcs/origin-ci-test/pr-logs/pull/22302/pull-ci-openshift-origin-master-e2e-aws-upgrade/2/, the cluster-network-operator log starts at 01:28:59, but the cluster was clearly running before that (CNO's first update to its operator status marks it as "Available: True").

Also, in this upgrade, it appears that none of the SDN pods were restarted (and, possibly as a result of that, the test passed). What exactly does the upgrade test do? It seems like it ought to fake an update of every image...

Comment 4 Clayton Coleman 2019-03-14 18:04:39 UTC
The bug you hit with e2e-aws-upgrade the PR job was fixed.  Will follow up with other bugs.

Comment 5 Anurag saxena 2019-03-14 20:28:39 UTC
Hi Clayton, 

Is it expected during the upgrade that the "oc get clusterversion" VERSION should report the version being upgraded? or i beleive it should show the old version until the new version is upgraded successfully 

Exisiting version on cluster
$ oc get clusteroperators.config.openshift.io | grep "NAME\|network"
NAME                                  VERSION                             AVAILABLE   PROGRESSING   FAILING   SINCE
network                               4.0.0-0.nightly-2019-03-13-233958   True        False         False     15m

After oc adm upgrade,

# oc get clusterversion
NAME      VERSION                        AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.0.0-0.ci-2019-03-14-150906   True        True          6m6s    Working towards 4.0.0-0.ci-2019-03-14-150906: 9% complete


//Anurag

Comment 6 W. Trevor King 2019-03-15 23:48:29 UTC
*** Bug 1683648 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.