Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 818329 - NFS mount hanging
Summary: NFS mount hanging
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.2
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: rc
: ---
Assignee: nfs-maint
QA Contact: Filesystem QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-05-02 19:17 UTC by Mark Nipper
Modified: 2012-05-22 12:28 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-05-22 12:28:03 UTC
Target Upstream Version:


Attachments (Terms of Use)
rpc_debug output from NFS client hang (deleted)
2012-05-02 19:17 UTC, Mark Nipper
no flags Details
sysrq-trigger echo t output from NFS hang (deleted)
2012-05-02 19:22 UTC, Mark Nipper
no flags Details
rpc_debug output from NFS hang (deleted)
2012-05-02 19:26 UTC, Mark Nipper
no flags Details

Description Mark Nipper 2012-05-02 19:17:36 UTC
Created attachment 581699 [details]
rpc_debug output from NFS client hang

Description of problem:
Randomly, we have two RHEL 6.2 clients with NFS mounts that end up hanging / blocking / freezing.  The server is a RHEL 5.8 server, and the mounts are all NFSv3.

Version-Release number of selected component (if applicable):
The kernel on the clients is 2.6.32-220.13.1.el6.x86_64 and nfs-utils is nfs-utils-1.2.3-15.el6.x86_64.

How reproducible:
It takes anywhere from a day to a few weeks.  It seems to be rather random.

Steps to Reproduce:
1. occurs randomly
  
Actual results:
NFS mount stops working.

Expected results:
NFS mount shouldn't stop working.

Additional info:
I'm attaching the output from:
---
echo 0 > /proc/sys/sunrpc/rpc_debug
echo t > /proc/sysrq-trigger

Comment 1 Mark Nipper 2012-05-02 19:22:39 UTC
Created attachment 581700 [details]
sysrq-trigger echo t output from NFS hang

Comment 2 Mark Nipper 2012-05-02 19:26:13 UTC
Created attachment 581701 [details]
rpc_debug output from NFS hang

Comment 4 RHEL Product and Program Management 2012-05-06 04:06:16 UTC
Since RHEL 6.3 External Beta has begun, and this bug remains
unresolved, it has been rejected as it is not proposed as
exception or blocker.

Red Hat invites you to ask your support representative to
propose this request, if appropriate and relevant, in the
next release of Red Hat Enterprise Linux.

Comment 6 Jeff Layton 2012-05-08 10:55:08 UTC
Looks like the clients are just waiting on the server to respond. Have you sniffed traffic between the two? You might want to do so to see whether the server is ignoring calls from the client or something or maybe whether calls
are not going out at all for some reason.

If you need help tracking down the cause, then I'd suggest opening RH support
bug so that our support folks can help you with debugging.

Comment 7 Mark Nipper 2012-05-08 15:39:13 UTC
Well, we're an academic license, so we don't actually get any support (that I'm aware of anyway).

Having said that, this had been working okay previously.  It seems like one of the more recent kernel updates (within the last three or four released) was around the time we started having issues with this.  We have two identical machines, both acting as load balanced web servers with an older NetApp filer and a Linux server backing everything via NFS.  When this happens, the other web machine is working fine and the NFS mounts to the NetApp filer are still working without any problems.  There is still a perfectly operable network connection between the affected client and the Linux server on which the NFS mounts hang.

We had been using NFSv3, but we just switched to NFSv4 yesterday to see if the problem goes away exercising a different code path in the kernel.  I agree that it looks like the client is simply sending and waiting for a response.  But nothing has really changed in this setup except for newer kernel packages to account for why it was working previously and now suddenly, it's not.  Both web front ends experience the problem, just at different times.  But usually within a few days of the last reboot, one of the two will have gotten into this state.

If it's still happening with NFSv4, I'll try to grab everything happening between the client and server via tcpdump or wireshark.

Comment 8 J. Bruce Fields 2012-05-08 15:54:46 UTC
In addition to the network traffic, it might be worth trying the sysrq-t dump on the Linux server, just to see if the server threads are stuck.

Comment 9 Mark Nipper 2012-05-21 15:29:12 UTC
It's worth mentioning that since we moved both clients to NFSv4, we haven't had the problem again.  Something definitely seems wrong in the NFSv3 client.  But we're not especially keen on going back to debug it at this point.

Comment 10 Steve Dickson 2012-05-22 12:28:03 UTC
(In reply to comment #9)
> It's worth mentioning that since we moved both clients to NFSv4, we haven't
> had the problem again.  Something definitely seems wrong in the NFSv3
> client.  But we're not especially keen on going back to debug it at this
> point.
Fair enough... Since we can't reproduces this and moving forward fixes the issues Lets close this bz. If the problem reappears please feel free to reopen this bz...


Note You need to log in before you can comment on or make changes to this bug.