Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 159663 - kill (SIGSTOP) does not stop all child threads
Summary: kill (SIGSTOP) does not stop all child threads
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Jan Kratochvil
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-06-06 18:33 UTC by Jeff Johnston
Modified: 2007-11-30 22:07 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-11-19 18:10:23 UTC


Attachments (Terms of Use)
TestFailure.c test case (deleted)
2005-06-06 19:34 UTC, Jeff Johnston
no flags Details
Comment 1 attachment fixed up. (deleted)
2007-11-19 18:09 UTC, Jan Kratochvil
no flags Details

Description Jeff Johnston 2005-06-06 18:33:19 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.2) Gecko/20040805 Netscape/7.2

Description of problem:
I have an application that forks a new process which creates 2 child threads that are in an infinite loop.  The application PTRACE attaches to the new process and its threads.  The main thread sits and waits forever in a pthread_join.  

From the parent process of the fork, I send a kill (SIGSTOP) to the child process (aka the parent thread of the child threads) and the parent process waits in a poll() call for a SIGCHLD to interrupt it.  A SIGCHLD occurs and a loop of waitpid NOHANG gets SIGSTOP events for the main thread and sometimes one of the child threads.  All of the child threads are not stopped and this is verified by looking at /proc.  It is not a timing issue as the stopped was not delivered after an hour.

Version-Release number of selected component (if applicable):
kernel-2.6.9-10.ELsmp

How reproducible:
Always

Steps to Reproduce:
1.Compile attached test case: gcc -g TestFailure.c -lpthread
2.Run test case ./a.out
3.Wait for it to hang and then look at /proc/xxxx/task/yyyy/status for all 3 threads listed
  

Actual Results:  Only 1, maybe 2 stop events occur after the kill (pid, SIGSTOP).  All three threads are not stopped.

Expected Results:  The SIGSTOP should have been broadcast to the main thread and all child threads.  Using tkill to individually stop the threads does work.

Additional info:

Comment 1 Jeff Johnston 2005-06-06 19:34:14 UTC
Created attachment 115173 [details]
TestFailure.c test case

Comment 2 Jeff Johnston 2005-06-06 19:37:10 UTC
Test also fails on FC4: 2.6.11-1.1238_FC4smp

Comment 4 Jan Kratochvil 2007-11-19 18:09:51 UTC
Created attachment 263751 [details]
Comment 1 attachment fixed up.

Resolved as NOTABUG by a knowledge from Roland: Process under ptrace(2) no
longer stops completely by a single SIGSTOP, under ptrace(2) SIGSTOP applies
only to a single task of the process group.

F8 kernel (kernel-2.6.23.1-42.fc8.x86_64) behaves for me also according to the
Comment 0 field `Actual Results' and not according to its `Expected Results'.

The F8 kernel behaves right in this respect as it behaves the same as RHEL-4
kernels in this respect and these behave the same se kernel.org/upstream
kernels in this respect.  kernel.org/upstream kernels define the ptrace(2)
behavior according to Roland. :-)  Therefore F8 kernels are correct.

The ptrace(2) behavior may be scary but there are possibilities how to code the
userland to reach any desired goals.

Fixed up the testcase to avoid any races in it and to behave according to the
Comment 0 field `Expected Results'.  The testcase modifications were made
DIFF-friendly.

My testcase uses tkill(2) as it is the only safe way how to target specific
task of a process group by a signal.  Even the upstream GDB kill_lwp()
implementation uses tkill(2).


log:
Attaching 21358
stop event 19 on <21358>
continuing first stop
lwpid is 21358
lwpid is 21359
Attaching 21359
stop event 19 on <21359>
continuing first stop
lwpid is 21360
Attaching 21360
stop event 19 on <21360>
continuing first stop
About to kill main thread
stop event 19 on <21358>
stop event 19 on <21359>
stop event 19 on <21360>
Manually check to see what threads are stopped.
[hang]
and now
$ grep Stat /proc/{21358,21359,21360}/status
/proc/21358/status:State:	T (tracing stop)
/proc/21359/status:State:	T (tracing stop)
/proc/21360/status:State:	T (tracing stop)


Note You need to log in before you can comment on or make changes to this bug.