Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 1690082 - gssproxy thread may block indefinitely inside epoll_wait() due to race with a second thread closing gpmctx->epollfd
Summary: gssproxy thread may block indefinitely inside epoll_wait() due to race with a...
Keywords:
Status: ON_QA
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: gssproxy
Version: 8.1
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: rc
: 8.0
Assignee: Robbie Harwood
QA Contact: ipa-qe
URL:
Whiteboard:
Depends On: 1687899
Blocks: 1594286
TreeView+ depends on / blocked
 
Reported: 2019-03-18 18:09 UTC by Robbie Harwood
Modified: 2019-03-25 16:46 UTC (History)
6 users (show)

Fixed In Version: gssproxy-0.8.0-11.el8
Doc Type: Bug Fix
Doc Text:
(see rhel-7.7)
Clone Of: 1687899
Environment:
Last Closed:
Type: Bug
Target Upstream Version:


Attachments (Terms of Use)

Description Robbie Harwood 2019-03-18 18:09:35 UTC
+++ This bug was initially created as a clone of Bug #1687899 +++

Description of problem:
There is a possible race condition inside gssproxy where one thread may call into epoll_wait(gpmctx->epollfd, ...) and while inside, a second thread may call close on (gpmctx->epollfd).  The thread inside epoll_wait is not woken up despite the second thread's close() operation but may hang inside epoll_wait().

We discovered this issue when studying a possible kernel bug where processes could hang indefinitely waiting for rpc.gssd upcall:
https://bugzilla.redhat.com/show_bug.cgi?id=1511706#c41

gpm_make_call() may release the mutex too early and as a result a thread race may be possible.

int gpm_make_call(int proc, union gp_rpc_arg *arg, union gp_rpc_res *res)
...
    /* grab the lock for the whole conversation */
    ret = gpm_grab_sock(gpmctx);
...    
    /* Send request, receive response with timeout */
    ret = gpm_send_recv_loop(gpmctx, send_buffer, send_length, &recv_buffer,
                             &recv_length);
    /* release the lock */
    gpm_release_sock(gpmctx);
    sockgrab = false;         // at this point some other thread could grab the lock and race with this one
...
done:
    gpm_timer_close(gpmctx);  // sets timerfd = -1 as seen in corefile
    gpm_epoll_close(gpmctx);  // sets epollfd = -1 as seen in corefile
...




Version-Release number of selected component (if applicable):
gssproxy-0.7.0-21.el7

How reproducible:
TBD - should be reproducible with delay inserted in gssproxy code

Steps to Reproduce:
See attached program which demonstrates the possibility of a hang inside epoll_wait()

Actual results:
gssproxy thread may hang inside epoll_wait() indefinitely

Expected results:
no indefinite hang inside epoll_wait() in gssproxy


Additional info:

Since this problem is so well defined and has a patch under testing, it makes sense to file this as a specific bug.


Note You need to log in before you can comment on or make changes to this bug.