Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 455694 - Linux Kernel hang on __delay() function
Summary: Linux Kernel hang on __delay() function
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.6
Hardware: x86_64
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Prarit Bhargava
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-07-17 07:17 UTC by Cheng Ho Lin
Modified: 2009-12-09 14:31 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-12-09 14:31:44 UTC


Attachments (Terms of Use)
Intel In-Target Probe snapshot (deleted)
2008-07-17 07:17 UTC, Cheng Ho Lin
no flags Details

Description Cheng Ho Lin 2008-07-17 07:17:54 UTC
Description of problem:
-----------------------

On our system of Linux Advance Server 4.6, warm boot test will hang from time 
to time. By probing the system with "Intel In-Target Probe" and checking 
with "System.map-2.6.9-67.ELsmp", we found that the CPU falled into a forever 
loop in __delay(), linux-2.6.9-final/arch/x86_64/lib/delay.c . The kernel 
source code is listed as follows:

void __delay(unsigned long loops)
{
	unsigned long bclock, now;
	
	rdtscl(bclock);
	do
	{
		rep_nop(); 
		rdtscl(now);
	}
	while((now-bclock) < loops);
} 

And the corresponding assembly code is listed below:

	rdtsc
	mov rcx, rax
loop:
	pause
	rdtsc
	sub rax, rcx
	cmp rax, rdi
	jb	loop
	ret

This piece of code may lead problem on TSC value wrap-up. For example,
if the rcx (bclock) is 0xfffffffffffffffe in the beginning, and the next rax 
(now) are 3, 15, 27 .... and so on. The system may hang up on __delay() .

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
Linux kernel version : 2.6.9-67


How reproducible:
-----------------
Just repeat to warm boot via cron job.

Steps to Reproduce:
1. add "*/5 * * * * date > reboot.log; /sbin/reboot" into crontab

Comment 1 Cheng Ho Lin 2008-07-17 07:17:54 UTC
Created attachment 312012 [details]
Intel In-Target Probe snapshot

Comment 2 Prarit Bhargava 2008-07-18 12:12:59 UTC
Cheng, please attach a sysreport from the system.

Thanks,

P.

Comment 4 Prarit Bhargava 2008-07-22 14:48:23 UTC
I came up with a proposed patch and started testing and came across a similar
issue which appears to have been resolved upstream.  __delay can be restarted on
another processor.  When this happens the values of bclock and now are bogus and
this causes wackiness within the __delay function.

I'll submit a patch for both issues.

P.

Comment 5 Prarit Bhargava 2008-07-22 15:01:51 UTC
The more I look at this issue, the more I agree that while this is a bug I
wonder if this is really the issue the reporter is hitting.

The tsc is a 64-bit counter linked to the frequency of the CPU.  For simplicity,
let's assume that the CPU frequency is 2.0 GHz.

That means the tsc will wrap every 4G X 2 seconds (64 bits divided by 31 bits).

AFAICT, that is roughly 2.3 million hours, or ~ 100,000 days, or 200 years.  (If
I have my math right)

I suppose that quantatw could have run a system this long ;).

IMO, it is much more likely that the quantatw ran into the strange issue I ran
into -- the __delay was suspended and restarted on another CPU.


P.
 

Comment 6 Prarit Bhargava 2008-07-22 15:08:41 UTC
Marking as NOTABUG.

P.

Comment 7 Cheng Ho Lin 2008-07-23 00:41:34 UTC
1. After refering linux-2.6.26/arch/x86/lib/delay_64.c to modify __delay(), 
the system passed warm boot testing for more than 5 days. While it will hang 
up every 2~3 dayes warm boot testing before.

The code is listed below for convenience:

void __delay(unsigned long loops)
{
	unsigned bclock, now;
	int cpu;

	preempt_disable();
	cpu = smp_processor_id();
	rdtscl(bclock);
	for (;;) {
		rdtscl(now);
		if ((now - bclock) >= loops)
			break;

		/* Allow RT tasks to run */
		preempt_enable();
		rep_nop();
		preempt_disable();

		/*
		 * It is possible that we moved to another CPU, and
		 * since TSC's are per-cpu we need to calculate
		 * that. The delay must guarantee that we wait "at
		 * least" the amount of time. Being moved to another
		 * CPU could make the wait longer but we just need to
		 * make sure we waited long enough. Rebalance the
		 * counter for this CPU.
		 */
		if (unlikely(cpu != smp_processor_id())) {
			loops -= (now - bclock);
			cpu = smp_processor_id();
			rdtscl(bclock);
		}
	}
	preempt_enable();
}

2. Since all the series of server machines under developing are scheduled to 
perform other tests. I am sorry that i could not gather sysreport.

Comment 8 Prarit Bhargava 2008-07-23 10:11:08 UTC
Fred, are you saying that you are hitting the issue described in comment #4? 
That switching between CPUs is causing your problem?

I'm confused -- because your initial bug report implies that you thought you had
a tsc overflow issue.

P.

Comment 9 Cheng Ho Lin 2008-07-23 10:55:56 UTC
In the beginning, we guess the problem is due to TSC value wrap-up. But after 
bug re-producing and investigation, we switch to the direction as described in 
http://www.chineselinuxuniversity.net/articles/12762.shtml . Therefore, we 
modify __delay() and verify it.

PS. By probing with ITP, the BSP is in __delay() and the other three AP are 
all in smp_really_stop_cpu(). In principle, the other processors will not 
restart __delay().

void smp_stop_cpu(void)
{
	/*
	 * Remove this CPU:
	 */
	cpu_clear(smp_processor_id(), cpu_online_map);
	local_irq_disable();
	disable_local_APIC();
	local_irq_enable(); 
}

static void smp_really_stop_cpu(void *dummy)
{
	smp_stop_cpu(); 
	for (;;) 
		asm("hlt"); 
}

Comment 10 Prarit Bhargava 2008-07-23 11:14:23 UTC
Fred,

AFAICT, in order for this to happen, CONFIG_PREEMPT must be on in the .config --
it isn't in RHEL5.  So I suspect that there is something else going on.

Could you attach your test program to this BZ?  I'll run the test to see if I
can hit the issue.

P.

Comment 11 Cheng Ho Lin 2008-07-24 00:42:29 UTC
Hi Prarit,

The OS version in issue is RedHat AS 4 update 6 rather than RHEL5.
As i check the system files, CONFIG_PREEMPT in .config is off.

Our test procedure is via crontab:

*/5 * * * * echo "reboot test"; date > reboot.log; /sbin/reboot

BTW, in our another project (different hardware architecture) SLES 10 also 
hang up on __delay() after about 9 days of warm-boot tests.

Comment 12 Brian Maly 2008-07-24 03:57:24 UTC
This seems like a BIOS issue. The passoff back to the firmware (when leaving the
OS during a reboot) seems incomplete or broken and as a result the hardware may
not be re-initialized properly for the next boot. Can we try some different
reboot flags to see if it triggers a proper hardware reset during reboot?


Can you try the following boot args and see if the issue goes away? Im guessing
a you want to use the 'cold' flag since the warm reboot hangs.

Try boot with each (one at a time), then try a reboot and see if it hangs:
reboot=hard,cold
reboot=triple,cold
reboot=bios,cold
reboot=kbd,cold

For point of reference, here are all the possible flags for RHEL4 (for
experimentation purposes):

/* reboot=b[ios] | t[riple] | k[bd] [, [w]arm | [c]old] | [a]cpi
   bios	  Use the CPU reboot vector for warm reset
   warm   Don't set the cold reboot flag
   cold   Set the cold reboot flag
   triple Force a triple fault (init)
   kbd    Use the keyboard controller. cold reset (default)
   acpi   Use the ACPI reset mechanism defined in the FADT
 */ 


Note You need to log in before you can comment on or make changes to this bug.