Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1056276 - Self-Heal Daemon is consuming excessive CPU
Summary: Self-Heal Daemon is consuming excessive CPU
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: GlusterFS
Classification: Community
Component: replicate
Version: 3.3.0
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2014-01-21 21:02 UTC by Michael Webb
Modified: 2014-12-14 19:40 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-12-14 19:40:32 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)

Description Michael Webb 2014-01-21 21:02:17 UTC
We’re currently seeing extremely high CPU utilization and breakdown in gluster communication between our two peers during self-heal operations using GlusterFS 3.3.0. The self-healing daemon is crawling the file system.

The results appear very similar to these two bugs:

http://dev.gluster.com/pipermail/glusterfs/2011-September/006149.html

https://bugzilla.redhat.com/show_bug.cgi?id=812515

Have these been addressed in 3.3.0?

Are there other issues that could be causing this behavior?

We performed routine maintenance on Saturday the 18th, we have been seeing issues since the 20th. This involved:

1.)	Shutting down one server
2.)	Adding a phsyical drive array
3.)	Start the server
4.)	Allow it to replicate
5.)	Perform same steps for the other server

The new drive arrays are not online, nor initialized via the OS.

We have 2 servers in this replica both are at 100% cpu utilization, 1 replica volume with 1 brick on each server. Gluster peer status shows both machines online but touching a zero byte file take more than 60 seconds to complete. This normally takes much less than a second. We verified that network connectivity is up during this time using ping and ssh.

Comment 1 Niels de Vos 2014-11-27 14:54:37 UTC
The version that this bug has been reported against, does not get any updates from the Gluster Community anymore. Please verify if this report is still valid against a current (3.4, 3.5 or 3.6) release and update the version, or close this bug.

If there has been no update before 9 December 2014, this bug will get automatocally closed.


Note You need to log in before you can comment on or make changes to this bug.