Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.

Bug 1686399

Summary: listing a file while writing to it causes deadlock
Product: [Community] GlusterFS Reporter: Raghavendra G <rgowdapp>
Component: coreAssignee: bugs <bugs>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6CC: bugs
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-6.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1674412 Environment:
Last Closed: 2019-03-08 14:08:58 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1674412    
Bug Blocks:    

Description Raghavendra G 2019-03-07 11:34:23 UTC
+++ This bug was initially created as a clone of Bug #1674412 +++

Description of problem:

Following test case was given by Nithya.
Create a pure replicate volume and enable the following options:
Volume Name: xvol
Type: Replicate
Volume ID: 095d6083-ea82-4ec9-a3a9-498fbd5f8dbe
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 192.168.122.7:/bricks/brick1/xvol-1
Brick2: 192.168.122.7:/bricks/brick1/xvol-2
Brick3: 192.168.122.7:/bricks/brick1/xvol-3
Options Reconfigured:
server.event-threads: 4
client.event-threads: 4
performance.parallel-readdir: on
performance.readdir-ahead: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off


Fuse mount using:
mount -t glusterfs -o lru-limit=500 -s 192.168.122.7:/xvol /mnt/g1
mkdir /mnt/g1/dirdd

From terminal 1:
cd /mnt/g1/dirdd
while (true); do ls -lR dirdd; done

From terminal 2:
while true; do dd if=/dev/urandom of=/mnt/g1/dirdd/1G.file bs=1M count=1; rm -f /mnt/g1/dirdd/1G.file; done

On running this test, both dd and ls hang after some time.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

--- Additional comment from Raghavendra G on 2019-02-11 10:01:41 UTC ---

(gdb) thr 8
[Switching to thread 8 (Thread 0x7f28072d1700 (LWP 26397))]
#0  0x00007f2813a404cd in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007f2813a404cd in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f2813a3bdcb in _L_lock_812 () from /lib64/libpthread.so.0
#2  0x00007f2813a3bc98 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f2805e3122f in rda_inode_ctx_get_iatt (inode=0x7f27ec0010b8, this=0x7f2800012560, attr=0x7f28072d0700) at readdir-ahead.c:286
#4  0x00007f2805e3134d in __rda_fill_readdirp (ctx=0x7f27f800f290, request_size=<optimized out>, entries=0x7f28072d0890, this=0x7f2800012560) at readdir-ahead.c:326
#5  __rda_serve_readdirp (this=this@entry=0x7f2800012560, ctx=ctx@entry=0x7f27f800f290, size=size@entry=4096, entries=entries@entry=0x7f28072d0890, op_errno=op_errno@entry=0x7f28072d085c) at readdir-ahead.c:353
#6  0x00007f2805e32732 in rda_fill_fd_cbk (frame=0x7f27f801c1e8, cookie=<optimized out>, this=0x7f2800012560, op_ret=3, op_errno=2, entries=<optimized out>, xdata=0x0) at readdir-ahead.c:581
#7  0x00007f2806097447 in client4_0_readdirp_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f27f800f498) at client-rpc-fops_v2.c:2339
#8  0x00007f28149a29d1 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f2800051120, pollin=pollin@entry=0x7f280006a180) at rpc-clnt.c:755
#9  0x00007f28149a2d37 in rpc_clnt_notify (trans=0x7f28000513e0, mydata=0x7f2800051150, event=<optimized out>, data=0x7f280006a180) at rpc-clnt.c:922
#10 0x00007f281499f5e3 in rpc_transport_notify (this=this@entry=0x7f28000513e0, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f280006a180) at rpc-transport.c:542
#11 0x00007f2808d88f77 in socket_event_poll_in (notify_handled=true, this=0x7f28000513e0) at socket.c:2522
#12 socket_event_handler (fd=<optimized out>, idx=<optimized out>, gen=<optimized out>, data=0x7f28000513e0, poll_in=<optimized out>, poll_out=<optimized out>, poll_err=0, event_thread_died=0 '\000')
    at socket.c:2924
#13 0x00007f2814c5a926 in event_dispatch_epoll_handler (event=0x7f28072d0e80, event_pool=0x90d560) at event-epoll.c:648
#14 event_dispatch_epoll_worker (data=0x96f1e0) at event-epoll.c:762
#15 0x00007f2813a39dd5 in start_thread () from /lib64/libpthread.so.0
#16 0x00007f2813302b3d in clone () from /lib64/libc.so.6
[Switching to thread 7 (Thread 0x7f2806ad0700 (LWP 26398))]
#0  0x00007f2813a404cd in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007f2813a404cd in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f2813a3bdcb in _L_lock_812 () from /lib64/libpthread.so.0
#2  0x00007f2813a3bc98 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f2805e2cd85 in rda_mark_inode_dirty (this=this@entry=0x7f2800012560, inode=0x7f27ec009da8) at readdir-ahead.c:234
#4  0x00007f2805e2f3cc in rda_writev_cbk (frame=0x7f27f800ef48, cookie=<optimized out>, this=0x7f2800012560, op_ret=131072, op_errno=0, prebuf=0x7f2806acf870, postbuf=0x7f2806acf910, xdata=0x0)
    at readdir-ahead.c:769
#5  0x00007f2806094064 in client4_0_writev_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f27f801a7f8) at client-rpc-fops_v2.c:685
#6  0x00007f28149a29d1 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f2800051120, pollin=pollin@entry=0x7f27f8008320) at rpc-clnt.c:755
#7  0x00007f28149a2d37 in rpc_clnt_notify (trans=0x7f28000513e0, mydata=0x7f2800051150, event=<optimized out>, data=0x7f27f8008320) at rpc-clnt.c:922
#8  0x00007f281499f5e3 in rpc_transport_notify (this=this@entry=0x7f28000513e0, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f27f8008320) at rpc-transport.c:542
#9  0x00007f2808d88f77 in socket_event_poll_in (notify_handled=true, this=0x7f28000513e0) at socket.c:2522
#10 socket_event_handler (fd=<optimized out>, idx=<optimized out>, gen=<optimized out>, data=0x7f28000513e0, poll_in=<optimized out>, poll_out=<optimized out>, poll_err=0, event_thread_died=0 '\000')
    at socket.c:2924
#11 0x00007f2814c5a926 in event_dispatch_epoll_handler (event=0x7f2806acfe80, event_pool=0x90d560) at event-epoll.c:648
#12 event_dispatch_epoll_worker (data=0x96f4b0) at event-epoll.c:762
#13 0x00007f2813a39dd5 in start_thread () from /lib64/libpthread.so.0
#14 0x00007f2813302b3d in clone () from /lib64/libc.so.6


In writev and readdirp codepath inode and fd-ctx locks are acquired in opposite order causing a deadlock.

--- Additional comment from Worker Ant on 2019-03-07 11:24:16 UTC ---

REVIEW: https://review.gluster.org/22321 (performance/readdir-ahead: fix deadlock) posted (#1) for review on master by Raghavendra G

Comment 1 Worker Ant 2019-03-07 15:14:07 UTC
REVIEW: https://review.gluster.org/22322 (performance/readdir-ahead: fix deadlock) posted (#2) for review on release-6 by Raghavendra G

Comment 2 Worker Ant 2019-03-08 14:08:58 UTC
REVIEW: https://review.gluster.org/22322 (performance/readdir-ahead: fix deadlock) merged (#3) on release-6 by Shyamsundar Ranganathan

Comment 3 Shyamsundar 2019-03-25 16:33:31 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-6.0, please open a new bug report.

glusterfs-6.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://lists.gluster.org/pipermail/announce/2019-March/000120.html
[2] https://www.gluster.org/pipermail/gluster-users/