Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1688310 - etcd broken since failed upgrade from 3.7.72 to 3.9.65
Summary: etcd broken since failed upgrade from 3.7.72 to 3.9.65
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Master
Version: 3.7.1
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: 3.7.z
Assignee: Sam Batschelet
QA Contact: ge liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-03-13 14:09 UTC by Pekka Wallendahl
Modified: 2019-03-26 16:13 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-03-26 16:13:50 UTC
Target Upstream Version:


Attachments (Terms of Use)
etcd container log (deleted)
2019-03-13 16:42 UTC, Pekka Wallendahl
no flags Details
etcd configuration (deleted)
2019-03-13 16:43 UTC, Pekka Wallendahl
no flags Details
master config (deleted)
2019-03-13 16:43 UTC, Pekka Wallendahl
no flags Details

Description Pekka Wallendahl 2019-03-13 14:09:19 UTC
Description of problem:

etcd broken since failed upgrade from 3.7.72 to 3.9.65


Version-Release number of selected component (if applicable):

3.7.72


How reproducible:

Full cluster is broken, so no way to tell


Steps to Reproduce:

1. This cluster was running on OCP 3.7.72 and etcd 3.2.7

2. Attempted to upgrade the cluster to OCP 3.9.65

3. The upgrade steps (which we follow in all other clusters) failed on the 3 masters .It upgraded only one of the master (but the master was not up) and left the other 2 masters in the old version and broken causing the cluster itself to be totally down

4. In an attempt to bring back the cluster up and running, we stopped 2 of the master nodes and kick started out steps to bring the cluster on a single master 

5. At first, we brought the etcd up on that master by adding the flag to the etcd_container.service called  --force-new-cluster . ETCD is up and running on the single master now .

6. After Red Hat support told us that this was not good idea, we brought back the other master nodes and rolled back to the multi-master configuration.


Actual results:

etcd is still giving us errors, see comments/attachments for info.


Expected results:

upgrade playbook to be able to finish the upgrade process


Additional info:

Comment 3 Pekka Wallendahl 2019-03-13 16:42:33 UTC
Created attachment 1543702 [details]
etcd container log

Comment 4 Pekka Wallendahl 2019-03-13 16:43:05 UTC
Created attachment 1543704 [details]
etcd configuration

Comment 5 Pekka Wallendahl 2019-03-13 16:43:31 UTC
Created attachment 1543705 [details]
master config


Note You need to log in before you can comment on or make changes to this bug.