Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 235940 - [JAVA_BLOCKER] timekeeping starvation (sched_football hangs on RHEL5 RT)
Summary: [JAVA_BLOCKER] timekeeping starvation (sched_football hangs on RHEL5 RT)
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise MRG
Classification: Red Hat
Component: realtime-kernel
Version: 1.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Clark Williams
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-04-10 21:45 UTC by IBM Bug Proxy
Modified: 2008-02-27 19:58 UTC (History)
0 users

Fixed In Version: 2.6.21-2.el5rt
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-05-15 02:19:47 UTC


Attachments (Terms of Use)
tested cycles-accumulated patch sent to Ingo (deleted)
2007-04-10 21:45 UTC, IBM Bug Proxy
no flags Details | Diff
sysctrl-fix.patch (deleted)
2007-04-25 01:40 UTC, IBM Bug Proxy
no flags Details


Links
System ID Priority Status Summary Last Updated
IBM Linux Technology Center 33424 None None None Never

Description IBM Bug Proxy 2007-04-10 21:45:36 UTC
LTC Owner is: johnstul@us.ibm.com
LTC Originator is:  ankita@in.ibm.com


Problem description:
System hangs when running sched_football on RHEL5 RT.


Ok, I reproduced and it does look like the box is hung and its not just the
network. I need to dig and make sure its not the
oldtimekeeping-is-being-preempted issue of way back when.

Yep. update_wall_time() is currently called from a soft-irq, which can be
preempted. I'll need to re-implement the accumulate cycles code I did back for
2.6.16-rt

Ingo has picked up the patch upstream. Now we just need to get it pushed back
into 2.6.20-rt8

Comment 1 IBM Bug Proxy 2007-04-10 21:45:36 UTC
Created attachment 152183 [details]
tested cycles-accumulated patch sent to Ingo

Comment 2 IBM Bug Proxy 2007-04-10 23:35:47 UTC
----- Additional Comments From jstultz@us.ibm.com (prefers email at johnstul@us.ibm.com)  2007-04-10 19:32 EDT -------
Any objections to the patch from RedHat's side? 

Comment 3 Tim Burke 2007-04-13 19:10:43 UTC
Assigning to Clark to consider backporting Ingo's patch from 21 to 20.


Comment 4 john stultz 2007-04-23 20:13:40 UTC
Just a small update, Keith Mannthey has seemingly triggered this issue w/
2.6.21-rt (which includes my fix). So we're digging to figure out if additional
changes may be needed.



Comment 5 IBM Bug Proxy 2007-04-24 22:05:33 UTC
----- Additional Comments From jstultz@us.ibm.com (prefers email at johnstul@us.ibm.com)  2007-04-24 18:01 EDT -------
I'm confirmed an issue still exists with regards to timekeeping starvation in
2.6.21-rc6-rt0. I'm still working to narrow it down. 

Comment 6 IBM Bug Proxy 2007-04-25 01:26:12 UTC
----- Additional Comments From jstultz@us.ibm.com (prefers email at johnstul@us.ibm.com)  2007-04-24 21:22 EDT -------
Ok! I found it! Ingo has added a vsyscall hack (.sysctl_enabled = 2) on x86_64,
where the vsyscall just quickly returns the last calculated value of xtime
instead of reading the clocksource hardware (to improve speed).

This means gettimeofday has tick resolution, and further it loses robustness
under heavy -RT load (allows for starvation, as xtime won't be incremented until
the timer_softirq  runs).

I'm testing a patch to set the .sysctl_enabled back to 1 as the safe default. 

Comment 7 IBM Bug Proxy 2007-04-25 01:40:45 UTC
Created attachment 153395 [details]
sysctrl-fix.patch

Comment 8 IBM Bug Proxy 2007-04-25 01:41:01 UTC
----- Additional Comments From jstultz@us.ibm.com (prefers email at johnstul@us.ibm.com)  2007-04-24 21:36 EDT -------
 
don't enable the vsyscall_gtod returns walltime hack by default

Simply sets the vsyscall .sysctrl_enable to 1 instead of 2 to avoid the
vsyscall returns xtime hack that is prone to starvation. 

Comment 9 IBM Bug Proxy 2007-04-25 01:45:41 UTC
----- Additional Comments From jstultz@us.ibm.com (prefers email at johnstul@us.ibm.com)  2007-04-24 21:42 EDT -------
Patch sent to Ingo. 

Comment 10 IBM Bug Proxy 2007-05-03 00:05:51 UTC
----- Additional Comments From jstultz@us.ibm.com (prefers email at johnstul@us.ibm.com)  2007-05-02 20:02 EDT -------
I've confirmed Ingo included the patch in 2.6.21-rt1.
This can be closed once RH rebases on 2.6.21. 

Comment 11 IBM Bug Proxy 2007-05-10 22:25:46 UTC
----- Additional Comments From jstultz@us.ibm.com (prefers email at johnstul@us.ibm.com)  2007-05-10 18:23 EDT -------
Verified fixed in 2.6.21-2.el5rt kernel from Clark's repo. 


Note You need to log in before you can comment on or make changes to this bug.