Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 85559 - waitpid produces strange results
Summary: waitpid produces strange results
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Linux
Classification: Retired
Component: kernel
Version: 9
Hardware: i386
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Arjan van de Ven
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2003-03-04 12:29 UTC by Michael Young
Modified: 2007-04-18 16:51 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2003-04-24 18:22:47 UTC


Attachments (Terms of Use)

Description Michael Young 2003-03-04 12:29:37 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030221

Description of problem:
If you try to run a UML kernel on phoebe, (either a generic build (2.4.20 plus
uml patch 1) or vmlinuz-2.4.18-19.8.0uml from redhat 8.0), it quickly exits with
Kernel panic: outer trampoline didn't exit with SIGKILL
having failed an internal safety check.
The relevant bit of code generating this error is in arch/um/kernel/process.c, 

        /* Start the process and wait for it to kill itself */
        new_pid = clone(outer_tramp, (void *) sp, clone_flags, &arg);
        if(new_pid < 0) return(-errno);
        while((err = waitpid(new_pid, &status, 0) < 0) && (errno == EINTR)) ;
        if(err < 0) panic("Waiting for outer trampoline failed - errno = %d",
                          errno);
        if(!WIFSIGNALED(status) || (WTERMSIG(status) != SIGKILL))
                panic("outer trampoline didn't exit with SIGKILL");

if you hack the code a bit you find that
WIFSIGNALED(status)=1 and WTERMSIG(status)=82, which doesn't make a lot of sense
to me. There was no problem on 8.0 (vmlinuz-2.4.18-19.8.0uml worked unmodified)
and booting the main system kernel with the nosysinfo flag makes no difference.
If you disable the test altogether the uml system boots normally.

I have observed the bug with several kernel/glibc versions up to
kernel-2.4.20-2.49 and glibc-2.3.1-51

Steps to Reproduce:
1. Try to boot a uml kernel (no uml file system needed as it doesn't get that far!)

Comment 1 Arjan van de Ven 2003-03-04 12:32:34 UTC
wouldn't be surprised if uml has a signal bug here; if it has SIGCHILD set to
SIG_IGN then waitpid is a nop....

Comment 2 Michael Young 2003-03-07 10:26:47 UTC
Yes. It looks like status is unchanged by waitpid (eg. if you set it explicitly
beforehand, the numbers change), and that SIGCHLD is set to SIG_IGN at least
some of the time - explicitly setting it to SIG_DFL removes the warnings.

Comment 3 Arjan van de Ven 2003-03-07 11:08:13 UTC
that's an application bug. the kernel will even printk a warning for it ;)
basically waitpid() while SIGCHILD is SIG_IGN is undefined behavior, you can
either get your child, or if timing is unlucky, the child is reaped by init
(which is the posix specified behavior of SIG_IGN SIGCHILD) before you hit
waitpid(). NPTL changed the timinig of this so the later is more happening more
frequent.

Comment 4 Michael Young 2003-04-24 18:22:47 UTC
Fixed in uml-patch-2.4.20-4.


Note You need to log in before you can comment on or make changes to this bug.