Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 453438 - [ia64] clone2 (pthread_create) crashes with -f
Summary: [ia64] clone2 (pthread_create) crashes with -f
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: strace
Version: 5.2
Hardware: ia64
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Roland McGrath
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-06-30 16:50 UTC by Jan Kratochvil
Modified: 2009-01-20 22:10 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-01-20 22:10:04 UTC


Attachments (Terms of Use)
Fix. (deleted)
2008-06-30 16:50 UTC, Jan Kratochvil
no flags Details | Diff


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2009:0233 normal SHIPPED_LIVE strace bug-fix update 2009-01-20 16:06:35 UTC

Description Jan Kratochvil 2008-06-30 16:50:54 UTC
Description of problem:
Currently one cannot `strace -f' multithreaded processes.

Version-Release number of selected component (if applicable):
strace-4.5.16-1.el5.1.ia64
kernel-2.6.18-94.el5.ia64

How reproducible:
Always.

Steps to Reproduce:
cat >thread.c <<EOH; gcc -o thread thread.c -pthread; strace -f ./thread
#include <pthread.h>
void *start (void *arg) { return arg; }
pthread_t thread1;
int main () { pthread_create (&thread1, NULL, start, NULL); sleep (1); return 0; }
EOH

Actual results:
execve("./thread", ["./thread"], [/* 41 vars */]) = 1
...
clone2(Process 8979 attached
child_stack=0x200000000031c000, stack_size=0x9feb80,
flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEAR
TID, parent_tidptr=0x2000000000d1b2d0, tls=0x2000000000d1b910,
child_tidptr=0x2000000000d1b2d0) = 8979
...
[pid  8978] nanosleep({1, 0},  <unfinished ...>
[pid  8979] --- SIGSEGV (Segmentation fault) @ 2000000000236d20 (3d0f00) ---
Process 8979 detached
+++ killed by SIGSEGV +++


Expected results:
execve("./thread", ["./thread"], [/* 41 vars */]) = 1
...
clone2(Process 9008 attached
child_stack=0x200000000031c000, stack_size=0x9feb80,
flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID,
parent_tidptr=0x2000000000d1b2d0, tls=0x2000000000d1b910,
child_tidptr=0x2000000000d1b2d0) = 9008
...
[pid  9007] nanosleep({1, 0},  <unfinished ...>
[pid  9008] get_robust_list(0x2000000000d1b2e0, 0x18, 0) = 0
[pid  9008] exit(0)                     = ?
Process 9008 detached
<... nanosleep resumed> {1, 0})         = 0
exit_group(0)                           = ?

Additional info:
Patch posted to upstream <strace-devel@lists.sourceforge.net>:


In the case of `child_stack=0' (such as is in the case of FORK glibc call) or
for the parent of the `child_stack!=0' sample above the call RESTORE_ARG0 still
rewrites a memory not containing the modifying syscall argument, just in such
case nothing crashes.  In the case of a new stack (a child of PTHREAD_CREATE)
RESTORE_ARG0 corrupts the IN0 stacked register and glibc crashes at
glibc/sysdeps/unix/sysv/linux/ia64/clone2.S:
1:      ld8 out1=[in0],8        /* Retrieve code pointer.       */

IMO according to ia64 RSE (Register Stack Engine) IMO there is no access for
the caller to the passed registers after the callee returns, therefore
RESTORE_ARG* should be a nop there.  Still a review from someone with a better
RSE proficiency regarding the kernel syscalls would be useful.

Fix tested on RHEL-5 kernel-2.6.18-94.el5.ia64.  Older kernels (such as
kernel-2.6.18-53.el5.ia64) do not crash as they have a bug causing strace not
tracing the children (as strace is unable to force CLONE_PTRACE there).


Trace of the former/buggy strace:
wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP}], __WALL, NULL) = 3932
rt_sigprocmask(SIG_BLOCK, [HUP INT QUIT PIPE TERM], NULL, 8) = 0
ptrace(PTRACE_PEEKUSER, 3932, psr, NULL) = 16
ptrace(PTRACE_PEEKUSER, 3932, r15, NULL) = 1213
ptrace(PTRACE_PEEKUSER, 3932, r10, NULL) = 0
ptrace(PTRACE_PEEKUSER, 3932, r8, NULL) = 1
ptrace(PTRACE_PEEKUSER, 3932, ar.bsp, NULL) = 0x600007ffffe7c1f0
ptrace(PTRACE_PEEKUSER, 3932, cfm, NULL) = 1167
ptrace(PTRACE_PEEKDATA, 3932, 0x600007ffffe7c1c0, NULL) = 0x3d0f00
ptrace(PTRACE_PEEKDATA, 3932, 0x600007ffffe7c1c8, NULL) = 0x200000000031c000
ptrace(PTRACE_PEEKDATA, 3932, 0x600007ffffe7c1d0, NULL) = 0x9feb80
ptrace(PTRACE_PEEKDATA, 3932, 0x600007ffffe7c1d8, NULL) = 0x2000000000d1b2d0
ptrace(PTRACE_PEEKDATA, 3932, 0x600007ffffe7c1e0, NULL) = 0x2000000000d1b2d0
ptrace(PTRACE_PEEKDATA, 3932, 0x600007ffffe7c1e8, NULL) = 0x2000000000d1b910
ptrace(PTRACE_PEEKUSER, 3932, ar.bsp, NULL) = 0x600007ffffe7c1f0
ptrace(PTRACE_PEEKUSER, 3932, cfm, NULL) = 1167
ptrace(PTRACE_POKEDATA, 3932, 0x600007ffffe7c1c0, 0x3d2f00) = 0
write(2, "clone2(", 7)                  = 7
ptrace(PTRACE_SYSCALL, 3932, 0x1, SIG_0) = 0
--- SIGCHLD (Child exited) @ a000000000010621 (1f400000f5c) ---
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP}], __WALL, NULL) = 3933
rt_sigprocmask(SIG_BLOCK, [HUP INT QUIT PIPE TERM], NULL, 8) = 0
write(2, "Process 3933 attached (waiting for parent)\n", 43) = 43
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
wait4(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGTRAP}], __WALL, NULL) = 3932
rt_sigprocmask(SIG_BLOCK, [HUP INT QUIT PIPE TERM], NULL, 8) = 0
ptrace(PTRACE_PEEKUSER, 3932, psr, NULL) = 16
ptrace(PTRACE_PEEKUSER, 3932, r8, NULL) = 3933
ptrace(PTRACE_PEEKUSER, 3932, r10, NULL) = 0
ptrace(PTRACE_PEEKUSER, 3932, r10, NULL) = 0
ptrace(PTRACE_PEEKUSER, 3932, r8, NULL) = 3933
ptrace(PTRACE_PEEKUSER, 3932, ar.bsp, NULL) = 0x600007ffffe7c1f0
ptrace(PTRACE_PEEKUSER, 3932, cfm, NULL) = 1167
ptrace(PTRACE_POKEDATA, 3932, 0x600007ffffe7c1c0, 0x3d0f00) = 0
ptrace(PTRACE_POKEDATA, 3932, 0x600007ffffe7c1c8, 0x200000000031c000) = 0
### New BSP is set for the new thread: vvv
ptrace(PTRACE_PEEKUSER, 3933, ar.bsp, NULL) = 0x200000000031c078
ptrace(PTRACE_PEEKUSER, 3933, cfm, NULL) = 1167
### These two lines corrupt it: vvv
ptrace(PTRACE_POKEDATA, 3933, 0x200000000031c048, 0x3d0f00) = 0
ptrace(PTRACE_POKEDATA, 3933, 0x200000000031c050, 0x200000000031c000) = 0
### These two lines corrupt it: ^^^
ptrace(PTRACE_SYSCALL, 3933, 0x1, SIG_0) = 0
--- SIGCHLD (Child exited) @ a000000000010621 (1f400000f5d) ---
write(2, "Process 3933 resumed (parent 3932 ready)\n", 41) = 41

Comment 1 Jan Kratochvil 2008-06-30 16:50:54 UTC
Created attachment 310600 [details]
Fix.

Comment 2 RHEL Product and Program Management 2008-06-30 17:00:23 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 3 Jan Kratochvil 2008-06-30 17:05:57 UTC
RHEL-4 is not affected by this bug:
kernel-2.6.9-67.EL.ia64
strace-4.5.16-1.el4.2.ia64
despite the corruption of unknown data occurs there for the child with a new stack.

Therefore this Bug it is a regression against RHEL-4.
It is not a regression since RHEL-5.1 as `-f' did not work there at all.


Comment 4 Eric Bachalo 2008-07-18 15:15:48 UTC
This problem will be fixed in 
strace RHEL 5.3 rebase to version 4.5.17

http://bugzilla.redhat.com/show_bug.cgi?id=455874

Comment 6 Roland McGrath 2008-08-29 00:26:50 UTC
built 4.5.18-1.el5

Comment 11 errata-xmlrpc 2009-01-20 22:10:04 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-0233.html


Note You need to log in before you can comment on or make changes to this bug.