Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1682925 - Gluster volumes never heal during oVirt 4.2->4.3 upgrade [NEEDINFO]
Summary: Gluster volumes never heal during oVirt 4.2->4.3 upgrade
Status: NEW
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: 5
Hardware: x86_64
OS: Linux
Target Milestone: ---
QA Contact:
Depends On:
Blocks: Gluster_5_Affecting_oVirt_4.3
TreeView+ depends on / blocked
Reported: 2019-02-25 20:41 UTC by Jason
Modified: 2019-03-27 14:31 UTC (History)
17 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed:
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
ravishankar: needinfo? (jthomasp)

Attachments (Terms of Use)
Glusterd log (deleted)
2019-02-25 20:41 UTC, Jason
no flags Details
Excerpt from glusterfsd.log. Whole log is >100MB (deleted)
2019-02-25 20:47 UTC, Jason
no flags Details
data brick log (deleted)
2019-02-25 21:00 UTC, Jason
no flags Details
data1ssd brick log (deleted)
2019-02-25 21:00 UTC, Jason
no flags Details
data2 brick log (deleted)
2019-02-25 21:01 UTC, Jason
no flags Details
engine brick log (deleted)
2019-02-25 21:01 UTC, Jason
no flags Details

Description Jason 2019-02-25 20:41:34 UTC
Created attachment 1538575 [details]
Glusterd log

Description of problem: Upgraded the hosted engine, then 2 of my 4 ovirt nodes.  After that, gluster never fully healed.  During troubleshooting with Telsin on IRC, we noticed that multiple glusterfsd processes were launched for each brick on the upgraded 4.3 nodes.  

Version-Release number of selected component (if applicable):
ovirt node 4.3
Gluster 5.3

How reproducible:
I have not tried to reproduce it.

Steps to Reproduce:

Actual results:

Expected results:

Additional info:
I am attaching logs as directed by Sahina Bose on the ovirt users mailing list.  The upgrade happened on February 20th and continued into February 21st until I rolled the two nodes back to oVirt node 4.2.

Comment 1 Jason 2019-02-25 20:47:07 UTC
Created attachment 1538577 [details]
Excerpt from glusterfsd.log.  Whole log is >100MB

Comment 2 Jason 2019-02-25 21:00:00 UTC
Created attachment 1538579 [details]
data brick log

Comment 3 Jason 2019-02-25 21:00:27 UTC
Created attachment 1538580 [details]
data1ssd brick log

Comment 4 Jason 2019-02-25 21:01:06 UTC
Created attachment 1538582 [details]
data2 brick log

Comment 5 Jason 2019-02-25 21:01:27 UTC
Created attachment 1538583 [details]
engine brick log

Comment 6 Sahina Bose 2019-03-27 06:27:37 UTC
Ravi, can you or someone on the team take a look?

Comment 7 Ravishankar N 2019-03-27 08:36:24 UTC
Hi Jason, this is probably a little late but what is the state now? For debugging incomplete heals, we would need the list of files (`gluster vol heal $volname info`) and the `getfattr -d -m. -e hex /path/to/brick/file-in-question` outputs of the files from all 3 the bricks of the replica along with the glustershd.log from all 3 nodes. Please also provide the output of` gluster volume info $volname`

Comment 8 Jason 2019-03-27 14:31:00 UTC
I reverted my nodes back to oVirt node 4.2 and they healed up just fine.  I do not have the results of the commands you've requested.  I plan to spin up a testing cluster, install 4.2 on it, then upgrade to 4.3 to see if there's still problems.  We have a lot of new hardware coming in soon, so I'll be light on time to mess with oVirt for a few weeks.

Note You need to log in before you can comment on or make changes to this bug.