Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.

Bug 85559

Summary: waitpid produces strange results
Product: [Retired] Red Hat Linux Reporter: Michael Young <m.a.young>
Component: kernelAssignee: Arjan van de Ven <arjanv>
Status: CLOSED CURRENTRELEASE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 9   
Target Milestone: ---   
Target Release: ---   
Hardware: i386   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2003-04-24 18:22:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Michael Young 2003-03-04 12:29:37 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030221

Description of problem:
If you try to run a UML kernel on phoebe, (either a generic build (2.4.20 plus
uml patch 1) or vmlinuz-2.4.18-19.8.0uml from redhat 8.0), it quickly exits with
Kernel panic: outer trampoline didn't exit with SIGKILL
having failed an internal safety check.
The relevant bit of code generating this error is in arch/um/kernel/process.c, 

        /* Start the process and wait for it to kill itself */
        new_pid = clone(outer_tramp, (void *) sp, clone_flags, &arg);
        if(new_pid < 0) return(-errno);
        while((err = waitpid(new_pid, &status, 0) < 0) && (errno == EINTR)) ;
        if(err < 0) panic("Waiting for outer trampoline failed - errno = %d",
                          errno);
        if(!WIFSIGNALED(status) || (WTERMSIG(status) != SIGKILL))
                panic("outer trampoline didn't exit with SIGKILL");

if you hack the code a bit you find that
WIFSIGNALED(status)=1 and WTERMSIG(status)=82, which doesn't make a lot of sense
to me. There was no problem on 8.0 (vmlinuz-2.4.18-19.8.0uml worked unmodified)
and booting the main system kernel with the nosysinfo flag makes no difference.
If you disable the test altogether the uml system boots normally.

I have observed the bug with several kernel/glibc versions up to
kernel-2.4.20-2.49 and glibc-2.3.1-51

Steps to Reproduce:
1. Try to boot a uml kernel (no uml file system needed as it doesn't get that far!)

Comment 1 Arjan van de Ven 2003-03-04 12:32:34 UTC
wouldn't be surprised if uml has a signal bug here; if it has SIGCHILD set to
SIG_IGN then waitpid is a nop....

Comment 2 Michael Young 2003-03-07 10:26:47 UTC
Yes. It looks like status is unchanged by waitpid (eg. if you set it explicitly
beforehand, the numbers change), and that SIGCHLD is set to SIG_IGN at least
some of the time - explicitly setting it to SIG_DFL removes the warnings.

Comment 3 Arjan van de Ven 2003-03-07 11:08:13 UTC
that's an application bug. the kernel will even printk a warning for it ;)
basically waitpid() while SIGCHILD is SIG_IGN is undefined behavior, you can
either get your child, or if timing is unlucky, the child is reaped by init
(which is the posix specified behavior of SIG_IGN SIGCHILD) before you hit
waitpid(). NPTL changed the timinig of this so the later is more happening more
frequent.

Comment 4 Michael Young 2003-04-24 18:22:47 UTC
Fixed in uml-patch-2.4.20-4.