|Summary:||(busted vdso) i686 SMP kernel stuck during boot, UP works|
|Product:||[Fedora] Fedora||Reporter:||Warren Togami <wtogami>|
|Component:||kernel||Assignee:||Dave Jones <davej>|
|Status:||CLOSED RAWHIDE||QA Contact:||Brian Brock <bbrock>|
|Version:||rawhide||CC:||gjunk, jspaleta, pfrields, rbh00, roland, sbruno, tech-fedora-bugzilla, tsukahara.ken, wtogami|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|Last Closed:||2005-05-28 01:23:49 UTC||Type:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
|Cloudforms Team:||---||Target Upstream Version:|
|Bug Depends On:|
Description Warren Togami 2005-05-22 03:42:33 UTC
Description of problem: Dual CPU Opteron with FC4 32bit installed. Bootup gets stuck at: EXT3-fs: mounted filesystem with ordered data mode. Switching to new root unmounting old /proc unmounting old /sys cfq: depth 4 reached, tagging now on CFQ is not at fault, because elevator=deadline gets stuck there too, without the CFQ message of course. This point seems to be after initrd's "init" script, where it seems to load SELinux. Booting the UP kernel gets past this point with these kind of messages: security: 3 users, 6 roles, 760 types, 87 bools security: 55 classes, 170468 rules SELinux: Completing initialization. Disabling selinux in /etc/sysconfig/selinux or booting with maxcpus=1 makes no difference. Version-Release number of selected component (if applicable): WORKING kernel-2.6.11-1.1253_FC4smp WORKING kernel-2.6.11-1.1267_FC4smp WORKING kernel-2.6.11-1.1268_FC4smp (broke somewhere here, no builds available in between) BROKEN kernel-2.6.11-1.1275_FC4smp BROKEN kernel-2.6.11-1.1276_FC4smp BROKEN kernel-2.6.11-1.1286_FC4smp BROKEN kernel-2.6.11-1.1323_FC4smp BROKEN kernel-2.6.11-1.1337_FC4smp Hardware ======== Tyan motherboard 2x1.4GHz Opteron Adaptec I2O with i2o_block driver Bug #158410 mentions similar behavior with a SATA controller on dual Opteron.
Comment 2 Warren Togami 2005-05-22 04:42:52 UTC
x86_64 FC4 on the same hardware does not have this SMP kernel problem. What we don't know however is if plain i686 SMP hardware is affected by this problem, in that case we should fix this before FC4. If it is unaffected then this shouldn't be a blocker.
Comment 3 Sean Bruno 2005-05-22 16:31:45 UTC
I was able to boot the kernel from http://people.redhat.com/wtogami/temp/kernel-smp-2.6.11-1.1267_FC4.i686.rpm on a dual Opteron system built on the ASUS K8N-DL with 246 model opterons. However the Broadcom NetXtreme Ethernet controller(tg3) seems to have an issue as I am unable to get on the network with this kernel. If I boot off of the uniprocessor kernel from FC4T3 or any rawhide update, the ethernet controller works just fine.
Comment 4 Warren Togami 2005-05-23 05:03:13 UTC
*** Bug 157691 has been marked as a duplicate of this bug. ***
Comment 5 Warren Togami 2005-05-23 05:06:05 UTC
Bug 157691 confirms that this is a general i686 SMP problem that affects both 32bit AMD64 and Pentium 4/Xeon. We should try to avoid releasing FC4 with this problem.
Comment 6 Warren Togami 2005-05-23 05:13:46 UTC
*** Bug 156664 has been marked as a duplicate of this bug. ***
Comment 7 Warren Togami 2005-05-23 05:19:09 UTC
Gene, in Bug 156664 #c3 you mention that the SMP kernel successfully boots on a dual Pentium4 without HT? Could you please attach a text file containing /proc/cpuinfo from that machine?
Comment 8 Warren Togami 2005-05-23 05:33:28 UTC
It would help if somebody with a serial console could do the following procedure: 1) Apply the below patch to /sbin/mkinitrd script. --- mkinitrd.orig 2005-05-22 19:28:32.000000000 -1000 +++ mkinitrd 2005-05-22 19:29:22.000000000 -1000 @@ -749,6 +749,8 @@ echo "echo Mounting root filesystem" >> $RCFILE echo "mount -o $rootopts --ro -t $rootfs $rootdev /sysroot" >> $RCFILE + echo "echo Enabling Magic SysRQ" >> $RCFILE + echo "echo echo 1 > /proc/sys/kernel/sysrq" >> $RCFILE echo "echo Switching to new root" >> $RCFILE if [ -n "$UDEV_KEEP_DEV" ]; then echo "switchroot --movedev /sysroot" >> $RCFILE 2) Create a new initrd image for the latest SMP kernel. Make a backup of the existing initrd just in case you somehow screw it up. Doing this would be something like: mv /boot/initrd-2.6.11-1.XXXX_FC4smp.img /boot/initrd-2.6.11-1.XXXX_FC4smp.img.backup /sbin/mkinitrd /boot/initrd-2.6.11-1.XXXX_FC4smp.img 2.6.11-1.XXXX_FC4smp 3) Reboot using that new initrd. When it gets stuck, hit ALT-SysRQ-T. Save the entire dump into a text file and attach it in this bug.
Comment 9 Warren Togami 2005-05-23 05:36:50 UTC
Oops... one too many echos. --- mkinitrd.orig 2005-05-22 19:28:32.000000000 -1000 +++ mkinitrd 2005-05-22 19:37:04.000000000 -1000 @@ -749,6 +749,8 @@ echo "echo Mounting root filesystem" >> $RCFILE echo "mount -o $rootopts --ro -t $rootfs $rootdev /sysroot" >> $RCFILE + echo "echo Enabling Magic SysRQ" >> $RCFILE + echo "echo 1 > /proc/sys/kernel/sysrq" >> $RCFILE echo "echo Switching to new root" >> $RCFILE if [ -n "$UDEV_KEEP_DEV" ]; then echo "switchroot --movedev /sysroot" >> $RCFILE
Comment 10 Warren Togami 2005-05-23 20:33:34 UTC
If your i686 SMP boots with the FC4 smp kernel, please submit your /proc/cpuinfo in an attachment. If you lock up during boot, please attach alt-sysrq-T as indicated in Comment #8 and #9 and /proc/cpuinfo.
Comment 11 Jef Spaleta 2005-05-23 20:49:20 UTC
Created attachment 114745 [details] contents of /proc/cpuinfo for i686 smp system using 1315 kernel I have an smp i686 machine booting with 1315 rawhide smp kernel. I'll try booting into 1340 as soon as i'm physically at the machine again. uname -a Linux local.localdomain 2.6.11-1.1315_FC4smp #1 SMP Mon May 16 17:14:20 EDT 2005 i686 athlon i386 GNU/Linux uptime 16:47:30 up 2 days, 21:08, 3 users, load average: 0.04, 0.05, 0.07 attached is the output of /proc/cpuinfo
Comment 12 Jef Spaleta 2005-05-23 20:49:46 UTC
Created attachment 114746 [details] contents of /proc/cpuinfo for i686 smp system using 1315 kernel I have an smp i686 machine booting with 1315 rawhide smp kernel. I'll try booting into 1340 as soon as i'm physically at the machine again. uname -a Linux local.localdomain 2.6.11-1.1315_FC4smp #1 SMP Mon May 16 17:14:20 EDT 2005 i686 athlon i386 GNU/Linux uptime 16:47:30 up 2 days, 21:08, 3 users, load average: 0.04, 0.05, 0.07 attached is the output of /proc/cpuinfo
Comment 13 gene c 2005-05-24 01:05:50 UTC
Created attachment 114757 [details] /proc/cpuinfo - 2.6.11-1.1319_FC4smp - machine boots fine
Comment 14 Jef Spaleta 2005-05-24 01:09:19 UTC
(In reply to comment #11) > I have an smp i686 machine booting with 1315 rawhide smp kernel. > I'll try booting into 1340 as soon as i'm physically at the machine again. sorry about the double comment ealier. Booted the i686 smp machine into 1340 smp kernel. I have selinux in permissive mode, but from other comments in this report so far that shouldn't matter I don't think. -jef
Comment 15 David Sklar 2005-05-24 02:38:02 UTC
Created attachment 114760 [details] /proc/cpuinfo for P4 w/HT -- can't boot 1340 My i686 SMP (Dell GX280 with 1 P4 and HT turned on) hangs on booting with 1340 (and has since 1276). The last SMP kernel I successfully booted with was 1261 (but I haven't tried anything between 1261 and 1276). The UP kernels boot fine. When the boot hangs (after the LVM message) I can't reboot with Ctrl-Alt-Del (no serial console; USB keyboard is completely unresponsive, hitting caps lock/num lock doesn't change keyboard lights). Upgraded to most recent Dell BIOS (A05, from A04) with no change. /proc/cpuinfo is attached.
Comment 16 gene c 2005-05-24 02:51:07 UTC
Created attachment 114761 [details] Picture of end of Alt-Sysrq-T when hung Same system as my early report - HT single CPU - sata disk Sorry no serial console - I know its not enuff but this is what was left on screen when I did Alt-Sysrq-T when it was hung. gene/
Comment 17 Warren Togami 2005-05-24 21:05:30 UTC
My current theory is that it is failing to boot only on "newer" i686 SMP machines. We need to find a common theme here. Can you folks try rebuilding upstream vanilla 2.6.12-rc4-gitX using the SMP config file from /boot/config-*? We need to know if it is an upstream problem, or something we added.
Comment 18 gene c 2005-05-25 02:17:10 UTC
I built 2.6.12.rc4-git8 using the config-2.6.11-1.1340_FC4smp config from /boot. I had to comment out IPMI stuff as it gave compile errors. Sweet - this kernel boots no problem at all. Best regards, gene
Comment 19 Warren Togami 2005-05-25 02:23:14 UTC
Created attachment 114810 [details] SysRQ Show State when it gets stuck
Comment 20 Richard Hitt 2005-05-25 09:31:47 UTC
Hi again Warren. I built and tested successfully. Working backwards from git8, I found the same problem Gene did in git8, git7, git6, git5. git4 built okay. I booted git4 and verified with gkrellm that there appeared to be two CPUs. I'd also built with plain 2.6.12-rc4 so I tried booting that, and it too came up fine with two CPUs showing. Between Gene and me we've tested rc4, rc4-git4, and rc4-git8.
Comment 21 Warren Togami 2005-05-25 10:42:19 UTC
Bingo! I rebuilt the 1355 after commenting out patch 810 (exec-shield) and 813 (vdso), and the smp kernel successfully booted. Arjan suggested this may indicate a busted vdso, so I tried 1355smp with "vdso=0" and it too successfully booted. Busted vdso?
Comment 22 David Sklar 2005-05-25 13:39:01 UTC
I was locking up (see Comment #15), but if I give vdso=0 to 1341smp (the latest kernel yum finds right now), it boots just fine and /proc/cpuinfo shows me two CPUs (which are really 1 P4 with HT on).
Comment 23 Roland McGrath 2005-05-25 23:10:04 UTC
I committed the one-liner change to execshield.patch, which needed the update because of upstream changes. Dave's next build hopefully wins. @@ -21,9 +21,9 @@ diff -urNp --exclude-from=/home/davej/.e + /* + * Push current_thread_info()->sysenter_return to the stack. + * A tiny bit of offset fixup is necessary - 4*4 means the 4 words -+ * pushed above, and the word being pushed now: ++ * pushed above; +8 corresponds to copy_thread's esp0 setting. + */ -+ pushl (TI_sysenter_return-THREAD_SIZE+4*4)(%esp) ++ pushl (TI_sysenter_return-THREAD_SIZE+8+4*4)(%esp) /* * Load the potential sixth argument from user stack. * Careful about security.
Comment 24 Jeremy Katz 2005-05-26 01:51:31 UTC
1363 works on my box that didn't work before. Placing in MODIFIED. If anyone continues to have problems as of 1363FC4, please reopen.
Comment 25 gene c 2005-05-27 03:55:15 UTC
Confirmed fixed for me too using 1363_FC4smp. Thanks!