Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 236577 - [Realtime][Kernel][NFS] Connectathon NFSv3, UDP, Lock test fails against RHEL3 NFSv3 server
Summary: [Realtime][Kernel][NFS] Connectathon NFSv3, UDP, Lock test fails against RHEL...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: realtime-kernel
Version: 1.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Steve Dickson
QA Contact:
URL: http://rhts.lab.boston.redhat.com/cgi...
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-04-16 15:18 UTC by Jeff Burke
Modified: 2012-01-09 22:07 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-01-09 22:07:29 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Jeff Burke 2007-04-16 15:18:00 UTC
Description of problem:
While running the connectathon test suite the following test failed.
/kernel/filesystems/nfs/connectathon/rhel3-nfs/nfsvers=3_udp/lock

Version-Release number of selected component (if applicable):
 2.6.20-14.el5rt

How reproducible:
 Intermittant 10% of the time it fails.

Steps to Reproduce:
1. Run the RHTS test /kernel/filesystems/nfs/connectathon while running the RT
kernel
  
Actual results:
Start tests on path /mnt/rhel3-nfs/dhcp78-129.test [y/n]? 
sh ./runtests  -l -t /mnt/rhel3-nfs/dhcp78-129.test

Starting LOCKING tests: test directory /mnt/rhel3-nfs/dhcp78-129.test (arg: -t)

Testing native post-LFS locking

Creating parent/child synchronization pipes.

Test #1 - Test regions of an unlocked file.
	Parent: 1.1  - F_TEST  [               0,               1] PASSED.
	Parent: 1.2  - F_TEST  [               0,          ENDING] PASSED.
	Parent: 1.3  - F_TEST  [               0,7fffffffffffffff] PASSED.
	Parent: 1.4  - F_TEST  [               1,               1] PASSED.
	Parent: 1.5  - F_TEST  [               1,          ENDING] PASSED.
	Parent: 1.6  - F_TEST  [               1,7fffffffffffffff] PASSED.
	Parent: 1.7  - F_TEST  [7fffffffffffffff,               1] PASSED.
	Parent: 1.8  - F_TEST  [7fffffffffffffff,          ENDING] PASSED.
	Parent: 1.9  - F_TEST  [7fffffffffffffff,7fffffffffffffff] PASSED.

Test #2 - Try to lock the whole file.
	Parent: 2.0  - F_TLOCK [               0,          ENDING] PASSED.
	Child:  2.1  - F_TEST  [               0,               1] PASSED.
	Child:  2.2  - F_TEST  [               0,          ENDING] PASSED.
	Child:  2.3  - F_TEST  [               0,7fffffffffffffff] PASSED.
	Child:  2.4  - F_TEST  [               1,               1] PASSED.
	Child:  2.5  - F_TEST  [               1,          ENDING] PASSED.
	Child:  2.6  - F_TEST  [               1,7fffffffffffffff] PASSED.
	Child:  2.7  - F_TEST  [7fffffffffffffff,               1] PASSED.
	Child:  2.8  - F_TEST  [7fffffffffffffff,          ENDING] PASSED.
	Child:  2.9  - F_TEST  [7fffffffffffffff,7fffffffffffffff] PASSED.
	Parent: 2.10 - F_ULOCK [               0,          ENDING] PASSED.

Test #3 - Try to lock just the 1st byte.
	Parent: 3.0  - F_TLOCK [               0,               1] PASSED.
	Child:  3.1  - F_TEST  [               0,               1] PASSED.
	Child:  3.2  - F_TEST  [               0,          ENDING] PASSED.
	Child:  3.3  - F_TEST  [               1,               1] PASSED.
	Child:  3.4  - F_TEST  [               1,          ENDING] PASSED.
	Parent: 3.5  - F_ULOCK [               0,               1] PASSED.

Test #4 - Try to lock the 2nd byte, test around it.
	Parent: 4.0  - F_TLOCK [               1,               1] PASSED.
	Child:  4.1  - F_TEST  [               0,               1] PASSED.
	Child:  4.2  - F_TEST  [               0,               2] PASSED.
	Child:  4.3  - F_TEST  [               0,          ENDING] PASSED.
	Child:  4.4  - F_TEST  [               1,               1] PASSED.
	Child:  4.5  - F_TEST  [               1,               2] PASSED.
	Child:  4.6  - F_TEST  [               1,          ENDING] PASSED.
	Child:  4.7  - F_TEST  [               2,               1] PASSED.
	Child:  4.8  - F_TEST  [               2,               2] PASSED.
	Child:  4.9  - F_TEST  [               2,          ENDING] PASSED.
	Parent: 4.10 - F_ULOCK [               1,               1] PASSED.

Test #5 - Try to lock 1st and 2nd bytes, test around them.
	Parent: 5.0  - F_TLOCK [               0,               1] PASSED.
	Parent: 5.1  - F_TLOCK [               2,               1] PASSED.
	Child:  5.2  - F_TEST  [               0,               1] PASSED.
	Child:  5.3  - F_TEST  [               0,               2] PASSED.
	Child:  5.4  - F_TEST  [               0,          ENDING] PASSED.
	Child:  5.5  - F_TEST  [               1,               1] PASSED.
	Child:  5.6  - F_TEST  [               1,               2] PASSED.
	Child:  5.7  - F_TEST  [               1,          ENDING] PASSED.
	Child:  5.8  - F_TEST  [               2,               1] PASSED.
	Child:  5.9  - F_TEST  [               2,               2] PASSED.
	Child:  5.10 - F_TEST  [               2,          ENDING] PASSED.
	Child:  5.11 - F_TEST  [               3,               1] PASSED.
	Child:  5.12 - F_TEST  [               3,               2] PASSED.
	Child:  5.13 - F_TEST  [               3,          ENDING] PASSED.
	Parent: 5.14 - F_ULOCK [               0,               1] PASSED.
	Parent: 5.15 - F_ULOCK [               2,               1] PASSED.

Test #6 - Try to lock the MAXEOF byte.
	Parent: 6.0  - F_TLOCK [7fffffffffffffff,               1] PASSED.
	Child:  6.1  - F_TEST  [7ffffffffffffffe,               1] PASSED.
	Child:  6.2  - F_TEST  [7ffffffffffffffe,               2] PASSED.
	Child:  6.3  - F_TEST  [7ffffffffffffffe,          ENDING] PASSED.
	Child:  6.4  - F_TEST  [7fffffffffffffff,               1] PASSED.
	Child:  6.5  - F_TEST  [7fffffffffffffff,               2] PASSED.
	Child:  6.6  - F_TEST  [7fffffffffffffff,          ENDING] PASSED.
	Child:  6.7  - F_TEST  [8000000000000000,          ENDING] PASSED.
	Child:  6.8  - F_TEST  [8000000000000000,               1] PASSED.
	Child:  6.9  - F_TEST  [8000000000000000,7fffffffffffffff] PASSED.
	Child:  6.10 - F_TEST  [8000000000000000,8000000000000000] PASSED.
	Parent: 6.11 - F_ULOCK [7fffffffffffffff,               1] PASSED.

Test #7 - Test parent/child mutual exclusion.
	Parent: 7.0  - F_TLOCK [             ffc,               9] PASSED.
	Parent: Wrote 'aaaa eh' to testfile [ 4092, 7 ].
	Parent: Now free child to run, should block on lock.
	Parent: Check data in file to insure child blocked.
	Parent: Read 'aaaa eh' from testfile [ 4092, 7 ].
	Parent: 7.1  - COMPARE [             ffc,               7] PASSED.
	Parent: Now unlock region so child will unblock.
	Parent: 7.2  - F_ULOCK [             ffc,               9] PASSED.
	Child:  7.3  - F_LOCK  [             ffc,               9] PASSED.
	Child:  Write child's version of the data and release lock.
	Parent: Now try to regain lock, parent should block.
	Child:  Wrote 'bebebebeb' to testfile [ 4092, 9 ].
	Child:  7.4  - F_ULOCK [             ffc,               9] PASSED.
	Parent: 7.5  - F_LOCK  [             ffc,               9] PASSED.
	Parent: Check data in file to insure child unblocked.
	Parent: Read 'bebebebeb' from testfile [ 4092, 9 ].
	Parent: 7.6  - COMPARE [             ffc,               9] PASSED.
	Parent: 7.7  - F_ULOCK [             ffc,               9] PASSED.

Test #8 - Rate test performing lock/unlock cycles.
	Parent: Performed 1000 lock/unlock cycles in 560 msecs. [214285 lpm].

Test #10 - Make sure a locked region is split properly.
	Parent: 10.0  - F_TLOCK [               0,               3] PASSED.
	Parent: 10.1  - F_ULOCK [               1,               1] PASSED.
	Child:  10.2  - F_TEST  [               0,               1] PASSED.
	Child:  10.3  - F_TEST  [               2,               1] PASSED.
	Child:  10.4  - F_TEST  [               3,          ENDING] PASSED.
	Child:  10.5  - F_TEST  [               1,               1] PASSED.
	Parent: 10.6  - F_ULOCK [               0,               1] PASSED.
	Parent: 10.7  - F_ULOCK [               2,               1] PASSED.
	Child:  10.8  - F_TEST  [               0,               3] PASSED.
	Parent: 10.9  - F_ULOCK [               0,               1] PASSED.
	Parent: 10.10 - F_TLOCK [               1,               3] PASSED.
	Parent: 10.11 - F_ULOCK [               2,               1] PASSED.
	Child:  10.12 - F_TEST  [               1,               1] PASSED.
	Child:  10.13 - F_TEST  [               3,               1] PASSED.
	Child:  10.14 - F_TEST  [               4,          ENDING] PASSED.
	Child:  10.15 - F_TEST  [               2,               1] PASSED.
	Child:  10.16 - F_TEST  [               0,               1] PASSED.

Test #11 - Make sure close() releases the process's locks.
	Parent: 11.0  - F_TLOCK [               0,          ENDING] PASSED.
	Parent: Closed testfile.
	Child:  11.1  - F_TLOCK [               0,          ENDING] PASSED.
	Child:  11.2  - F_ULOCK [               0,          ENDING] PASSED.
	Parent: 11.3  - F_TLOCK [              1d,             5b7] PASSED.
	Parent: 11.4  - F_TLOCK [            2000,              57] PASSED.
	Parent: Closed testfile.
	Child:  11.5  - F_TLOCK [               0,          ENDING] PASSED.
	Child:  11.6  - F_ULOCK [               0,          ENDING] PASSED.
	Parent: Wrote '123456789abcdef' to testfile [ 0, 16 ].
	Parent: 11.7  - F_TLOCK [               0,          ENDING] PASSED.
	Parent: Wrote '123456789abcdef' to testfile [ 13, 16 ].
	Parent: Closed testfile.
	Child:  11.8  - F_TLOCK [               0,          ENDING] PASSED.
	Child:  11.9  - F_ULOCK [               0,          ENDING] PASSED.
	Parent: Wrote '123456789abcdef' to testfile [ 0, 16 ].
	Parent: 11.10 - F_TLOCK [               0,          ENDING] PASSED.
	Parent: Truncated testfile.
	Parent: Closed testfile.
	Child:  11.11 - F_TLOCK [               0,          ENDING] PASSED.
	Child:  11.12 - F_ULOCK [               0,          ENDING] PASSED.

Test #12 - Signalled process should release locks.
	Child:  12.0  - F_TLOCK [               0,          ENDING] PASSED.
	Parent: Killed child process.
	Parent: 12.1  - F_TLOCK [               0,          ENDING] FAILED!
	Parent: **** Expected success, returned EAGAIN...
	Parent: **** Probably implementation error.

** PARENT pass 1 results: 39/39 pass, 0/0 warn, 1/1 fail (pass/total).

**  CHILD pass 1 results: 64/64 pass, 0/0 warn, 0/0 fail (pass/total).
lock tests failed

Expected results:
 This test should pass 100% of the time.

Additional info:
 Here is a link to the packet capture while the test was running.
http://rhts.lab.boston.redhat.com/testlogs/18994/73670/344426/rhel3-nfs_nfsvers=3_udp_lock.cap.bz2

Comment 1 Steve Dickson 2007-05-01 14:40:40 UTC
Have you every seen this on TCP mounts? 

Comment 2 Jeff Burke 2007-05-01 14:42:18 UTC
Nope, if i did I would have open another BZ

Comment 3 Steve Dickson 2007-05-01 15:35:20 UTC
Well putting the server under load, I am able to
get the lock tests to hang, but in test 8 (the rate
test) instead of test 7 (the blocking test). 

One other notable, is using a RHEL5 xen guest I
do see the same hang, but it appears the xen
kernel does recover.. 

Comment 4 Steve Dickson 2007-05-01 19:03:56 UTC
A couple of things... 

1) It appears this is a UDP only thing since I
am not able to reproduce the hang using TCP mounts... 

2) the ethereal trace is no longer avaiable... 

Comment 5 Jeff Burke 2007-05-01 19:54:36 UTC
Steve,
   RHTS was upgraded yesterday, The link was broken. You can find the
information here.
http://armstrong.rhts.boston.redhat.com/testlogs/18994/73670/344426/rhel3-nfs_nfsvers=3_udp_lock.cap.bz2

Thanks Jeff

Comment 6 Steve Dickson 2007-05-03 21:40:47 UTC
Ok... after further review... it appears this problem only
appears on RT kernels... Using a non-rt kernel, I was not
able to get the cthon test to hang... 

Comment 7 Tim Burke 2007-06-14 19:36:38 UTC
Steve - have you differentiated whether this issue is RT specific vs a more
recent upstream issue?  Clark does a generic kernel build which is just the
upstream stuff without the RT patches.  Does the flaw occur there?

Comment 8 Steve Dickson 2007-06-15 09:45:52 UTC
I generally use Rawhide kernels for my "generic" kernels. when I ran
using those I was not able to reproduce the problem. But its been
a while since I tried... let me set up another test to see how 
things have changed... if they have... 

Comment 9 Peter Staubach 2008-04-28 19:45:10 UTC
This issue has been seen on vanilla RHEL-5.  It was also observed upstream
and should be fixed there.

There is also a testsuite timing dependency which can lead to this
behavior.  The testsuite in RHTS has been patched.

Comment 10 Clark Williams 2008-05-27 22:11:41 UTC
I haven't seen this on recent  RHTS runs. closing for now

Comment 11 Peter Staubach 2008-05-28 20:51:36 UTC
I see this, or at least, I did before I constructed the testsuite
changes and the kernel changes to address it.

Comment 12 Clark Williams 2009-02-10 23:30:46 UTC
do we still see this on recent (2.6.24.7-101.el5rt) RT kernels?

Comment 13 Jeff Burke 2009-02-11 20:36:43 UTC
Clark,
  Yes we still see this with the current version 2.6.24.7-101.el5rt

Here are the links to the results:
http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=6270210

http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=6278249

Thanks,
Jeff

Comment 14 Luis Claudio R. Goncalves 2009-02-25 21:05:12 UTC
Isn't this bug related to BZ486645? Jon Masters added a patch to kernel -104 that may help fixing this issue (though I haven't been seeing these failures recently - at least not the ones that used to freeze the machine).

Comment 15 Jeff Burke 2009-02-25 21:40:44 UTC
I do not believe4 so. This is a different type of failure. It is a race condition. The latest RHEL5.4 kernel fixed the issue.
https://bugzilla.redhat.com/show_bug.cgi?id=448929

Comment 16 Beth Uptagrafft 2009-09-22 19:40:17 UTC
Do we still see this on recent (2.6.24.7-132.el5rt) RT kernels?

Comment 17 Jeff Burke 2009-09-23 13:00:39 UTC
Looking at the latest results. I do not see the failure. But it is a timing bug.

Comment 19 Clark Williams 2012-01-09 22:07:29 UTC
Haven't seen this issue on our 2.6.33  or 3.0.9 kernels


Note You need to log in before you can comment on or make changes to this bug.