Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 985388 - running dbench results in leaked fds leading to OOM killer killing glusterfsd.
Summary: running dbench results in leaked fds leading to OOM killer killing glusterfsd.
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: glusterfs
Version: 2.1
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: ---
Assignee: Pranith Kumar K
QA Contact: spandura
URL:
Whiteboard:
Depends On: 976800
Blocks: 977250
TreeView+ depends on / blocked
 
Reported: 2013-07-17 11:23 UTC by Pranith Kumar K
Modified: 2013-09-23 22:35 UTC (History)
8 users (show)

Fixed In Version: glusterfs-3.4.0.12rhs.beta6-1
Doc Type: Bug Fix
Doc Text:
Clone Of: 976800
Environment:
Last Closed: 2013-09-23 22:35:54 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Pranith Kumar K 2013-07-17 11:23:58 UTC
+++ This bug was initially created as a clone of Bug #976800 +++

Description of problem:
Running dbench on a distributed replicate volume causes leaked fds on the server thereby causing the OOM killer to kill the brick process.

Output of dmesg on server:
========================================================================
<snip>
VFS: file-max limit 188568 reached
.
.
.
Out of memory: Kill process 12235 (glusterfsd) score 215 or sacrifice child
Killed process 12235, UID 0, (glusterfsd) total-vm:3138856kB, anon-rss:466728kB, file-rss:1028kB
glusterfsd invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0, oom_score_adj=0
glusterfsd cpuset=/ mems_allowed=0
Pid: 12333, comm: glusterfsd Not tainted 2.6.32-358.6.2.el6.x86_64 #1

</snip>
========================================================================


How reproducible:
Always

Steps to Reproduce:
1. create a 2x2 distributed replicate volume and FUSE mount it
2. On the mount point,run "dbench -s -F -S -x  --one-byte-write-fix --stat-check 10"
3.Kill dbench after running for about 3 minutes.
4. On the server, do
 ls -l /proc/pid_of_brick(s)/fd|grep deleted


Actual results:
We can still see open/unlinked fds even though dbench was killed. Also if dbench is run till completion, we can observe some of the bricks getting killed by the oom killer (ps aux |grep glusterfsd)

Expected results:
Once dbench is killed, the brick processes must not have open/unlinked fds.

Additional info:
Bisected the leak to the following commit ID on upstream:
* 8909c28 - cluster/afr: fsync() guarantees POST-OP completion

--- Additional comment from Anand Avati on 2013-06-24 03:03:34 EDT ---

REVIEW: http://review.gluster.org/5248 (cluster/afr: Fix fd/memory leak on fsync) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu@redhat.com)

--- Additional comment from Anand Avati on 2013-06-24 12:45:57 EDT ---

COMMIT: http://review.gluster.org/5248 committed in master by Anand Avati (avati@redhat.com) 
------
commit 03f5172dd50b50988c65dd66e87a0d43e78a3810
Author: Pranith Kumar K <pkarampu@redhat.com>
Date:   Mon Jun 24 08:15:09 2013 +0530

    cluster/afr: Fix fd/memory leak on fsync
    
    Change-Id: I764883811e30ca9d9c249ad00b6762101083a2fe
    BUG: 976800
    Signed-off-by: Pranith Kumar K <pkarampu@redhat.com>
    Reviewed-on: http://review.gluster.org/5248
    Tested-by: Gluster Build System <jenkins@build.gluster.com>
    Reviewed-by: Jeff Darcy <jdarcy@redhat.com>

Comment 2 Amar Tumballi 2013-07-23 09:57:37 UTC
https://code.engineering.redhat.com/gerrit/10491

Comment 3 Pranith Kumar K 2013-07-26 09:18:43 UTC
Sac,
     There seems to be still one leaked fd bug if open-behind is not disabled(Is there a bug for it?). So for verifying this bug please disable open-behind.

Pranith.

Comment 4 spandura 2013-07-31 10:43:37 UTC
Verified this bug on the build:-
================================
root@king [Jul-31-2013-15:49:42] >rpm -qa | grep glusterfs-server
glusterfs-server-3.4.0.13rhs-1.el6rhs.x86_64

root@king [Jul-31-2013-15:49:49] >gluster --version
glusterfs 3.4.0.13rhs built on Jul 28 2013 15:22:56

Steps to verify:
================

Case 1:
~~~~~~~~~~~
1. create a 1 x 2 replicate volume

2. Create 2 Fuse mount

3. On both the mount points ,run "dbench -s -F -S -x  --one-byte-write-fix --stat-check 10"

4. Kill dbench after running for about 3 minutes.

5. On both the storage_nodes, do
 ls -l /proc/pid_of_brick(s)/fd|grep deleted 

Actual Result:
================
No fd leaks observed. 

Case 2:
~~~~~~~~~~~~~
1. create a 1 x 2 replicate volume. Set the volume option "open-behind" to "off"

2. Create 2 Fuse mount

3. On both the mount points ,run "dbench -s -F -S -x  --one-byte-write-fix --stat-check 10"

4. Kill dbench after running for about 3 minutes.

5. On both the storage_nodes, do
 ls -l /proc/pid_of_brick(s)/fd|grep deleted 

Actual Result:
================
No fd leaks observed.

Bug is fixed. Moving it to verified state. 

But there are fd leaks while running the same case with "open-behind" volume option set to "on" . Please refer to the bug 990510

Comment 5 Scott Haines 2013-09-23 22:35:54 UTC
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html


Note You need to log in before you can comment on or make changes to this bug.