Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 159799 - A lot of stopped processes on a SMP system with HTT enabled
Summary: A lot of stopped processes on a SMP system with HTT enabled
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Don Howard
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-06-08 08:26 UTC by Vlady
Modified: 2007-11-30 22:07 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-11-07 21:31:33 UTC


Attachments (Terms of Use)

Description Vlady 2005-06-08 08:26:45 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.8) Gecko/20050512 Firefox/1.0.4

Description of problem:
We have number of 2 CPU servers with HTT enabled on the processors.

They run various RedHat kernels with versions 2.4.21 and above.
On all of these servers we encounter a problems with a lot of "stoped processes", e.g. shown with status ``T'' in the output of ps(1) command.

If the servers run unattended long time their total number of processes become numerous. This affects servers' performance and in some cases causes errors/problems of type "cannot fork".

Version-Release number of selected component (if applicable):
kernel-2.4.21 and above

How reproducible:
Always

Steps to Reproduce:
1. Run multiprocessor machine with HTT enabled on the processors
2. Leave it running and executing some tasks some days
3. Execute ps -ax | grep T
  

Actual Results:  From day to day the number of "stopped processes" (with status ``T'' in the ps(1) output ) becomes bigger and bigger.

Expected Results:  Normally, there should be a few number of processes with such a status.

Additional info:

I supose the bug/problem is caused by the process scheduler, working in a SMP HTT enabled machine.

Comment 1 Ernie Petrides 2005-06-09 00:18:19 UTC
Thanks for your report, Vlady.  Could you please specify the exact
(most recent) kernel version that exhibits this problem?  Also, can
you give us an idea of what sort of processes are stopped?  And do
they go away if they are killed?

Comment 2 Don Howard 2005-06-09 00:34:05 UTC
Vlady -

Could you also capture, 'ps -ax', sysrq-t and sysrq-m output from a system that
is experiencing this problem?

Comment 3 Vlady 2005-06-09 08:08:46 UTC
Below is a small part of the ``ps -ax | grep T" command results on a server with
2.4.21-27.0.4.ELsmp kernel and HTT enabled.

26337 ?        T      0:00 cut -b 7-
24464 ?        T      0:00 sed s/$/<NL>/g
23302 ?        T      0:00 /bin/sed s|/|.|g
  9958 ?        T      0:00 id -u
  8551 ?        T      0:00 netstat -tln
22400 ?        T      0:00 mkdir -p /some/dir
  1697 ?        T      0:00 netstat -tln
13750 ?        T      0:00 /bin/sed s|/dev/||
20597 ?        T      0:00 netstat -tln
25570 ?        T      0:00 /bin/sed s|/dev/||
  7115 ?        T      0:00 /bin/sh /bin/egrep -q (^|:)/usr/X11R6/bin($|:)
22075 ?        T      0:00 id -un
  1386 ?        T      0:00 /usr/bin/tty
  2272 ?        T      0:00 sh -c date +%Z 2> /dev/null
21237 ?        T      0:00 sh -c sysctl fs.file-max
28206 ?        T      0:00 /bin/sh -c /usr/lib/sa/sa2 -A
32587 ?        T      0:00 sh -c date +%Z 2> /dev/null
30494 ?        T      0:00 sh -c date +%Z 2> /dev/null
 4591 ?         T      0:00 netstat -tln
 9427 ?         T      0:00 /bin/sed s|/dev/||


Sorry, but i can't supply you with the results of SysRq*. All our servers which
experince "stopped processes" problem are in production and don't want to
experiement with their kernels! :(

Comment 4 Vlady 2005-06-09 08:24:55 UTC
Also, i don't have console access to our servers, so i can't even excute sysreq
+ t or sysreq + m keyboard secuences.

Comment 5 Don Howard 2005-06-09 17:40:27 UTC
Vlady -

You can use sysrq-trigger remotely:

# enable sysrq-trigger
$ echo 1 > /proc/sys/kernel/sysrq 

# sysrq-t
$ echo t > /proc/sysrq-trigger

# sysrq-m
$ echo m > /proc/sysrq-trigger


The sysrq info is really important - I can't make any suggestions unless I can
see where these process are blocking in the kernel.

Comment 6 Don Howard 2005-11-07 21:31:33 UTC
Hi Vlady

Were you able to use the sysreq-trigger mechanism I mentioned above? I'll need
the sysrq-t & sysrq-m info in order to see what's happening with the stopped
processes.

For now, I'm going to close this issue.  Please re-open it if you are able to
collect the info (and are still experiencing the problem).


Note You need to log in before you can comment on or make changes to this bug.