|Summary:||X server crashes randomly with signal 11 on epia M10000 board|
|Product:||[Fedora] Fedora||Reporter:||Lucas Maneos <redhat>|
|Component:||xorg-x11||Assignee:||X/OpenGL Maintenance List <xgl-maint>|
|Status:||CLOSED WONTFIX||QA Contact:||David Lawrence <dkl>|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2006-03-07 11:59:46 UTC||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Cloudforms Team:||---||Target Upstream Version:|
Description Lucas Maneos 2005-05-16 16:11:23 UTC
From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux ppc; en-US; rv:1.7.7) Gecko/20050421 Firefox/1.0.3 (Debian package 1.0.3-2) Description of problem: The X server is crashing randomly on this machine (epia M10000 motherboard, VIA CLE266 chip). Not entirely sure if hardware is relevant, the problem occurs with both the 'via' and 'vesa' drivers. Previously this box was running RH9 for years with no such problem. An identical machine with FC3 also works fine. I haven't managed to identify any particular circumstances that will always trigger the crash, although firefox seems to trigger it often. Also, running twm seems much more stable than metacity - perhaps it's something to do with pixmaps? Version-Release number of selected component (if applicable): xorg-x11-6.8.2-31 How reproducible: Sometimes Steps to Reproduce: 1. Start an X session (runlevel 5 or startx from the console, doesn't make a difference) 2. Run some programs and wait for it to crash. Firefox seems to trigger it often. Actual Results: X server crashes, console is stuck in graphics mode and unusable. Expected Results: Stable X operation. Additional info: Xorg.0.log shows the following: Fatal server error: Caught signal 11. Server aborting (and then suggests checking itself for additional information) The kernel doesn't log anything, so the problem appears to be entirely in user space.
Comment 1 Mike A. Harris 2005-05-16 16:32:20 UTC
Please try to narrow the problem down using a single video driver (via) to specific steps to reproduce. If there is a particular web page that triggers this, or specific action in firefox, ie: moving the scroll bars, or somesuch, this would be useful to know. Please attach sysreport, X server log, X config file, /var/log/messages, output of lsmod from after the X server is started - as individual file attachments using the link below. Setting status to "NEEDINFO"
Comment 2 Mike A. Harris 2005-05-16 16:34:00 UTC
Also, I notice this is reported for "i386", but bugzilla says your web browser is PPC. Is it just different machine you're reporting it from, or should it be marked as a "ppc" problem?
Comment 3 Lucas Maneos 2005-05-16 17:36:20 UTC
Created attachment 114430 [details] xorg.conf I've commented out modules fbdevhw, glx, record and dri but it hasn't made a difference.
Comment 4 Lucas Maneos 2005-05-16 17:37:07 UTC
Created attachment 114431 [details] Xorg.0.log.old - X server log after crash
Comment 5 Lucas Maneos 2005-05-16 17:40:13 UTC
Created attachment 114432 [details] Xorg.0.log - X server log while server is still running Not sure if this is significant, but note that 'Frame buffer start' is different every time the server starts.
Comment 6 Lucas Maneos 2005-05-16 17:41:39 UTC
Created attachment 114433 [details] /var/log/messages contents All syslog output (*.*) between X server startup and crash.
Comment 8 Lucas Maneos 2005-05-16 17:47:54 UTC
Sysreport output to follow later (it seems to be running an rpm -Va, which seems to be running prelink which takes ages). I am indeed reporting from another machine, the problem is on i386. I still haven't managed to isolate a specific thing that will trigger the crash, but using firefox/mozilla for a few minutes seems to do it every time.
Comment 10 Lucas Maneos 2005-05-17 14:09:33 UTC
The problem appears to be, at least in part, windowmanager-related. I had been running twm since yesterday afternoon without incident, wheras after switching to metacity or WindowMaker the X server crashes again after a few minutes (the crash frequency is actually much higher with wmaker). Now running a kde session (for the last half hour or so) and it seems stable so far.
Comment 11 Lucas Maneos 2005-05-17 19:52:37 UTC
Managed to get a core file, but a stack backtrace isn't very illuminating: (gdb) bt #0 0x007cab04 in malloc_consolidate () from /lib/libc.so.6 #1 0x007cbe4d in _int_malloc () from /lib/libc.so.6 #2 0x007cd552 in malloc () from /lib/libc.so.6 #3 0x080e523b in Xalloc () #4 0x080e5e3d in Xcalloc () #5 0xb7f98541 in ?? () #6 0x00001000 in ?? () #7 0x00000000 in ?? () Is there a xorg-x11-debuginfo RPM somewhere?
Comment 12 Lucas Maneos 2005-05-22 11:05:24 UTC
The driver might not be relevant actually. Just stuck an old s3 virge dx card in the box, and it still segfaulted after a few minutes of firefox use. May or may not be relevant: display was corrupted in all modes above 640x480 no matter what driver options I tried, and kudzu probing the card produced a kernel oops.
Comment 13 Olivier Baudron 2005-09-01 12:06:06 UTC
Can you try with the latest FC4 update (6.8.2-37.FC4.45) ? Also, the backtrace would look better if you had linked with libefence. Can you try it?
Comment 14 Lucas Maneos 2005-09-04 09:34:27 UTC
Ran for ~ 1 hour before crashing with signal 11, so definitely an improvement. How would I go about linking with libefence? If someone could provide an appropriate RPM it would be a great help as building xorg on this box takes quite a while.
Comment 15 Lucas Maneos 2005-09-04 09:37:26 UTC
Created attachment 118436 [details] xorg log from 6.8.2-37.FC4.45
Comment 16 Olivier Baudron 2005-09-04 18:41:09 UTC
(In reply to comment #14) You don't need to recompile. Here are the steps to follow: 1. Install the ElectricFence package. 2. Boot in runlevel 3 3. $ export LD_PRELOAD=/usr/lib/libefence.so $ startx Then try to crash xorg and backtrace in the coredump. Thanks for testing and posting the results.
Comment 17 Mike A. Harris 2005-09-14 09:31:01 UTC
Reviewing the log file unfortunately doesn't show any clues as to what the problem might be. Also, unfortunately... creating a useful backtrace from the X server is a bit more complicated than comment #16. You have to: 1) Rebuild the src.rpm and enable DebugBuild so that symbols are not stripped from the server during rpm packaging. Add ".debug" to the Release field so you know it is a debug build, and also not an official Red Hat build. X does not have a debuginfo package for reasons that I wont go into in the bug report other than to say it is not easily possible due to the X ELF loader, and other ugly factors. 2) Install the newly built debuggable x packages. 3) Edit the config file and add option NoTrapSignals to the serverflags section (see Xorg/xorg.conf manpages for details) 4) Run the server as root, because SUID process will not produce core files. 5) Make sure ulimit is set to allow corefiles. 6) Trigger a crash. At this point things get fun and exciting. Since the X server dynamically loads it's modules using it's own custom ELF loader, and gdb doesn't have any clue about the X server's custom ELF loader, normal GNU gdb does not have the ability to debug a running X server or make much useful sense out of most core dumps in practice, although it is sometimes worthwhile trying. If nothing can be obtained that way usefully, then there are 2 options: 1) Compile a statically linked X server with debugging enabled, and try debugging that, or backtracing a corefile generated by the static server. or 2) ftp://people.redhat.com/mharris/hacks has a customized version of gdb which I no longer maintain or support, which may or may not be useful in trying to debug the modular X server. It used to work in the RHL 8.0 days or thereabouts, but I stopped using it ages ago. xf86Msg() and friends is what I use mostly nowadays. At this point, it seems like this is possibly a driver specific issue, or that it at least requires having the hardware in order to do further diagnosis. Unfortunately we do not have this via hardware, and are thus unable to troubleshoot or debug the problem directly any further. If this issue turns out to still be reproduceable in the latest updates for this Fedora Core release, please file a bug report in the X.Org bugzilla located at http://bugs.freedesktop.org in the "xorg" component. Once you've filed your bug report to X.Org, if you paste the new bug URL here, Red Hat will continue to track the issue in the centralized X.Org bug tracker, and will review any bug fixes that become available for consideration in future updates. Setting status to "NEEDINFO_REPORTER", and awaiting upstream bug report URL for tracking. Thanks in advance.
Comment 18 Mike A. Harris 2005-09-26 20:51:38 UTC
This problem sounds hardware specific, and we do not have this hardware to attempt to reproduce. Please file a bug in X.Org bugzilla for this issue, and attach all relevant details to the X.org bug report. http://bugs.freedesktop.org in the "xorg" component. Once you've filed your bug report to X.Org, if you paste the new bug URL here, Red Hat will continue to track the issue in the centralized X.Org bug tracker, and will review any bug fixes that become available for consideration in future updates.
Comment 19 Mike A. Harris 2006-01-18 02:01:30 UTC
It's been over 4 months since we've had any feedback on this issue. Unfortunately, we do not have this hardware available in our lab for direct diagnosis, so we require a 2 way communication link with the reporter, or someone else who has the hardware directly available to them to diagnose, who is willing to spend some time troubleshooting in order for any progress to be made. Has the problem vanished in a more recent Fedora xorg update? If the problem is no longer present, or if there is no longer any interest in tracking this issue, please update the report to indicate the current state of the issue, so we can proceed. If the issue is still present in the latest Fedora Core 4 updates, I would strongly encourage testing of the latest rawhide X, which is most easily done by installing Fedora Core 5 test2. If the problem exists still in the latest X.Org X11 builds in Fedora development, it is probably going to require direct investigation by the upstream via driver maintainers, as they have access directly to the hardware in question. In this case, please file a bug report in X.Org bugzilla, at http://bugs.freedesktop.org in the "xorg" component, detailing the issue, and attaching your X server log and config file as individual file attachments. If there is an X.Org bug for this issue already, or if you file one, please paste the URL here so we can track the issue. If a fix is available from X.org, we will consider including it in future updates. Thanks in advance.
Comment 20 Lucas Maneos 2006-01-22 17:45:03 UTC
Sorry for the lack of communication, things are pretty hectic here at the moment :-( The problem is still present with xorg-x11-6.8.2-37.FC4.49.2 RPMs, but I don't think it's a Xorg issue to be honest - up-to-date FC3 on identical hardware works fine. Maybe a compiler issue?