Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1674412 - listing a file while writing to it causes deadlock
Summary: listing a file while writing to it causes deadlock
Keywords:
Status: CLOSED NEXTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: core
Version: mainline
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks: 1686399
TreeView+ depends on / blocked
 
Reported: 2019-02-11 09:59 UTC by Raghavendra G
Modified: 2019-03-07 15:11 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1686399 (view as bug list)
Environment:
Last Closed: 2019-03-07 15:11:29 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Gluster.org Gerrit 22321 None Merged performance/readdir-ahead: fix deadlock 2019-03-07 15:11:27 UTC

Description Raghavendra G 2019-02-11 09:59:32 UTC
Description of problem:

Following test case was given by Nithya.
Create a pure replicate volume and enable the following options:
Volume Name: xvol
Type: Replicate
Volume ID: 095d6083-ea82-4ec9-a3a9-498fbd5f8dbe
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: 192.168.122.7:/bricks/brick1/xvol-1
Brick2: 192.168.122.7:/bricks/brick1/xvol-2
Brick3: 192.168.122.7:/bricks/brick1/xvol-3
Options Reconfigured:
server.event-threads: 4
client.event-threads: 4
performance.parallel-readdir: on
performance.readdir-ahead: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off


Fuse mount using:
mount -t glusterfs -o lru-limit=500 -s 192.168.122.7:/xvol /mnt/g1
mkdir /mnt/g1/dirdd

From terminal 1:
cd /mnt/g1/dirdd
while (true); do ls -lR dirdd; done

From terminal 2:
while true; do dd if=/dev/urandom of=/mnt/g1/dirdd/1G.file bs=1M count=1; rm -f /mnt/g1/dirdd/1G.file; done

On running this test, both dd and ls hang after some time.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Raghavendra G 2019-02-11 10:01:41 UTC
(gdb) thr 8
[Switching to thread 8 (Thread 0x7f28072d1700 (LWP 26397))]
#0  0x00007f2813a404cd in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007f2813a404cd in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f2813a3bdcb in _L_lock_812 () from /lib64/libpthread.so.0
#2  0x00007f2813a3bc98 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f2805e3122f in rda_inode_ctx_get_iatt (inode=0x7f27ec0010b8, this=0x7f2800012560, attr=0x7f28072d0700) at readdir-ahead.c:286
#4  0x00007f2805e3134d in __rda_fill_readdirp (ctx=0x7f27f800f290, request_size=<optimized out>, entries=0x7f28072d0890, this=0x7f2800012560) at readdir-ahead.c:326
#5  __rda_serve_readdirp (this=this@entry=0x7f2800012560, ctx=ctx@entry=0x7f27f800f290, size=size@entry=4096, entries=entries@entry=0x7f28072d0890, op_errno=op_errno@entry=0x7f28072d085c) at readdir-ahead.c:353
#6  0x00007f2805e32732 in rda_fill_fd_cbk (frame=0x7f27f801c1e8, cookie=<optimized out>, this=0x7f2800012560, op_ret=3, op_errno=2, entries=<optimized out>, xdata=0x0) at readdir-ahead.c:581
#7  0x00007f2806097447 in client4_0_readdirp_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f27f800f498) at client-rpc-fops_v2.c:2339
#8  0x00007f28149a29d1 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f2800051120, pollin=pollin@entry=0x7f280006a180) at rpc-clnt.c:755
#9  0x00007f28149a2d37 in rpc_clnt_notify (trans=0x7f28000513e0, mydata=0x7f2800051150, event=<optimized out>, data=0x7f280006a180) at rpc-clnt.c:922
#10 0x00007f281499f5e3 in rpc_transport_notify (this=this@entry=0x7f28000513e0, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f280006a180) at rpc-transport.c:542
#11 0x00007f2808d88f77 in socket_event_poll_in (notify_handled=true, this=0x7f28000513e0) at socket.c:2522
#12 socket_event_handler (fd=<optimized out>, idx=<optimized out>, gen=<optimized out>, data=0x7f28000513e0, poll_in=<optimized out>, poll_out=<optimized out>, poll_err=0, event_thread_died=0 '\000')
    at socket.c:2924
#13 0x00007f2814c5a926 in event_dispatch_epoll_handler (event=0x7f28072d0e80, event_pool=0x90d560) at event-epoll.c:648
#14 event_dispatch_epoll_worker (data=0x96f1e0) at event-epoll.c:762
#15 0x00007f2813a39dd5 in start_thread () from /lib64/libpthread.so.0
#16 0x00007f2813302b3d in clone () from /lib64/libc.so.6
[Switching to thread 7 (Thread 0x7f2806ad0700 (LWP 26398))]
#0  0x00007f2813a404cd in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007f2813a404cd in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f2813a3bdcb in _L_lock_812 () from /lib64/libpthread.so.0
#2  0x00007f2813a3bc98 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f2805e2cd85 in rda_mark_inode_dirty (this=this@entry=0x7f2800012560, inode=0x7f27ec009da8) at readdir-ahead.c:234
#4  0x00007f2805e2f3cc in rda_writev_cbk (frame=0x7f27f800ef48, cookie=<optimized out>, this=0x7f2800012560, op_ret=131072, op_errno=0, prebuf=0x7f2806acf870, postbuf=0x7f2806acf910, xdata=0x0)
    at readdir-ahead.c:769
#5  0x00007f2806094064 in client4_0_writev_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f27f801a7f8) at client-rpc-fops_v2.c:685
#6  0x00007f28149a29d1 in rpc_clnt_handle_reply (clnt=clnt@entry=0x7f2800051120, pollin=pollin@entry=0x7f27f8008320) at rpc-clnt.c:755
#7  0x00007f28149a2d37 in rpc_clnt_notify (trans=0x7f28000513e0, mydata=0x7f2800051150, event=<optimized out>, data=0x7f27f8008320) at rpc-clnt.c:922
#8  0x00007f281499f5e3 in rpc_transport_notify (this=this@entry=0x7f28000513e0, event=event@entry=RPC_TRANSPORT_MSG_RECEIVED, data=data@entry=0x7f27f8008320) at rpc-transport.c:542
#9  0x00007f2808d88f77 in socket_event_poll_in (notify_handled=true, this=0x7f28000513e0) at socket.c:2522
#10 socket_event_handler (fd=<optimized out>, idx=<optimized out>, gen=<optimized out>, data=0x7f28000513e0, poll_in=<optimized out>, poll_out=<optimized out>, poll_err=0, event_thread_died=0 '\000')
    at socket.c:2924
#11 0x00007f2814c5a926 in event_dispatch_epoll_handler (event=0x7f2806acfe80, event_pool=0x90d560) at event-epoll.c:648
#12 event_dispatch_epoll_worker (data=0x96f4b0) at event-epoll.c:762
#13 0x00007f2813a39dd5 in start_thread () from /lib64/libpthread.so.0
#14 0x00007f2813302b3d in clone () from /lib64/libc.so.6


In writev and readdirp codepath inode and fd-ctx locks are acquired in opposite order causing a deadlock.

Comment 2 Worker Ant 2019-03-07 11:24:16 UTC
REVIEW: https://review.gluster.org/22321 (performance/readdir-ahead: fix deadlock) posted (#1) for review on master by Raghavendra G

Comment 3 Worker Ant 2019-03-07 15:11:29 UTC
REVIEW: https://review.gluster.org/22321 (performance/readdir-ahead: fix deadlock) merged (#2) on master by Raghavendra G


Note You need to log in before you can comment on or make changes to this bug.