Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 592806 - rhel4 PV guest installations busted on rhel 5.5 i386 intel boxboro dom0
Summary: rhel4 PV guest installations busted on rhel 5.5 i386 intel boxboro dom0
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel-xen
Version: 4.7.z
Hardware: i386
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Xen Maintainance List
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-05-17 02:48 UTC by yanfu,wang
Modified: 2010-05-19 20:46 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2010-05-17 10:11:17 UTC
Target Upstream Version:


Attachments (Terms of Use)
virt workflow xml to reproduce (deleted)
2010-05-17 02:48 UTC, yanfu,wang
no flags Details

Description yanfu,wang 2010-05-17 02:48:57 UTC
Created attachment 414444 [details]
virt workflow xml to reproduce

Description of problem:
When trying to install rhel4 paravirt guests on rhel5.5 dom0 on Intel boxboro, installation crashes kernel backtrace:

SMP 

Modules linked in: dm_snapshot dm_mirror dm_zero dm_mod ext3 jbd msdos raid6 raid5 xor raid1 raid0 xenblk xennet sr_mod sd_mod scsi_mod cdrom loop nfs nfs_acl lockd sunrpc vfat fat cramfs

CPU:    0

EIP:    0061:[<c0112510>]    Not tainted VLI

EFLAGS: 00010246   (2.6.9-78.ELxenU) 

EIP is at pgd_free+0x11b/0x158

eax: 00000000   ebx: cb812000   ecx: 00000400   edx: 80000000

esi: 00000000   edi: cb812000   ebp: 00000003   esp: c327ef6c

ds: 007b   es: 007b   ss: 0068

Process 05-pam_console. (pid: 765, threadinfo=c327e000 task=dfac6070)

Stack: de4ab300 de4ab300 de4ab300 dfac6070 00000000 c011ad70 cb80e000 dfac65c0 

       c011ecb4 de4ab300 00000001 c163f4c0 00000000 c327e000 c327e000 c011efc6 

       00000000 00000000 00000000 4014d6f8 c327e000 c010740f 00000000 00000000 

Call Trace:

 [<c011ad70>] __mmdrop+0x21/0x3a

 [<c011ecb4>] do_exit+0x1f4/0x412

 [<c011efc6>] sys_exit_group+0x0/0x11

 [<c010740f>] syscall_call+0x7/0xb

Code: 8b 04 98 89 f1 c1 e0 0c 81 e1 ff 0f 00 00 89 c6 09 ce 6a 00 8d 9e ff ff ff bf 89 df 53 e8 55 01 00 00 59 31 c0 b9 00 04 00 00 5e <f3> ab 53 ff 35 44 31 36 c0 e8 b6 26 03 00 80 3d 04 77 2f c0 00 

 <0>Fatal exception: panic in 5 seconds

Kernel panic - not syncing: Fatal exception

 KERNEL PANIC!


Version-Release number of selected component (if applicable):
kernel-xen-2.6.18-194.3.1.el5 

How reproducible:
always

Steps to Reproduce:
1.Install a rhel5.5 GA tree, and upgrade kernel to 2.6.18-194.3.1.el5.

2.virt-install --name rhel4u7_i386_pv --mac 00:16:3E:50:83:E7 --location nfs:bigpapi.bos.redhat.com:/vol/engarchive2/redhat/released/RHEL-4/U7/AS/i386/tree --paravirt --file /var/lib/xen/images/rhel4u7_i386_pv.img -s 10 --debug --extra-args ks=http://lab2.rhts.eng.bos.redhat.com/cblr/svc/op/ks/system/guest-80-131.rhts.eng.bos.redhat.com --prompt --nographics --noreboot                                                                   

3.Continue with installation process
  
Actual results:
kernel panic


Expected results:
should complete


Additional info:
https://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=156033
https://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=156034
pls refer to attached xml to submit job.

Comment 1 yanfu,wang 2010-05-17 05:43:19 UTC
Install a rhel5.5 GA tree, hadn't upgraded to the lastest 5.5.z kernel, and same problem occur on yet, pls refer to the below job:
http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=157517

Comment 2 yanfu,wang 2010-05-17 10:11:17 UTC
4.7.z does not support Boxboro (Nehalem-EX), pls refer to bz491338.

Comment 3 Andrew Jones 2010-05-17 14:22:44 UTC
(In reply to comment #2)
> 4.7.z does not support Boxboro (Nehalem-EX), pls refer to bz491338.    

The bz referenced here is for a deadlock, but the backtrace in this bug is in paging code, and looks unrelated. Does this machine have >= 64G of memory? If so, then this bug is likely a dup of bug 504988.

Comment 4 yanfu,wang 2010-05-18 03:22:18 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > 4.7.z does not support Boxboro (Nehalem-EX), pls refer to bz491338.    
> 
> The bz referenced here is for a deadlock, but the backtrace in this bug is in
> paging code, and looks unrelated. Does this machine have >= 64G of memory? If
> so, then this bug is likely a dup of bug 504988.    

I check the machine intel-s3e36-01.lab.bos.redhat.com which my job run on it, seems not.
[root@intel-s3e36-01 ~]# cat /proc/meminfo 
MemTotal:     14042112 kB
MemFree:      13552432 kB
Buffers:         31072 kB
Cached:         144136 kB
SwapCached:          0 kB
Active:          76292 kB
Inactive:       140100 kB
HighTotal:    13303304 kB
HighFree:     13096432 kB
LowTotal:       738808 kB
LowFree:        456000 kB
SwapTotal:     3899384 kB
SwapFree:      3899384 kB
Dirty:              28 kB
Writeback:          84 kB
AnonPages:       41164 kB
Mapped:          12196 kB
Slab:            39600 kB
PageTables:       2076 kB
NFS_Unstable:        0 kB
Bounce:              0 kB
CommitLimit:  10920440 kB
Committed_AS:   276988 kB
VmallocTotal:   114680 kB
VmallocUsed:      9796 kB
VmallocChunk:   104784 kB

Comment 5 Andrew Jones 2010-05-18 14:16:41 UTC
[root@intel-s3e36-01 ~]# uname -a
Linux intel-s3e36-01.lab.bos.redhat.com 2.6.32-19.el6.x86_64 #1 SMP Tue Mar 9 17:48:46 EST 2010 x86_64 x86_64 x86_64 GNU/Linux
[root@intel-s3e36-01 ~]# free -m
             total       used       free     shared    buffers     cached
Mem:         64439       2429      62009          0         75        752
-/+ buffers/cache:       1602      62837
Swap:        66495          0      66495

Looks like it has 64G.

Comment 6 yanfu,wang 2010-05-19 09:13:33 UTC
(In reply to comment #5)
> [root@intel-s3e36-01 ~]# uname -a
> Linux intel-s3e36-01.lab.bos.redhat.com 2.6.32-19.el6.x86_64 #1 SMP Tue Mar 9
> 17:48:46 EST 2010 x86_64 x86_64 x86_64 GNU/Linux
> [root@intel-s3e36-01 ~]# free -m
>              total       used       free     shared    buffers     cached
> Mem:         64439       2429      62009          0         75        752
> -/+ buffers/cache:       1602      62837
> Swap:        66495          0      66495
> 
> Looks like it has 64G.    


hi Andrew,
Thanks your reminder, I checked my failed jobs about the mem size again, seems there is limit with the available memory in kernel-xen on host, the info is below:
 
********** System Information **********
Hostname                = intel-s3e36-01.lab.bos.redhat.com
Kernel Version          = 2.6.18-194.el5xen
Machine Hardware Name   = i686
Processor Type          = i686
uname -a output         = Linux intel-s3e36-01.lab.bos.redhat.com 2.6.18-194.el5xen #1 SMP Tue Mar 16 22:08:06 EDT 2010 i686 i686 i386 GNU/Linux
Swap Size               = 3807 MB
Mem Size                = 13713 MB

pls refer to these below links, sorry I can't reserve the machine to double check since there's problem on inventory today.
http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=14084523
http://rhts.redhat.com/cgi-bin/rhts/test_log.cgi?id=14058503
http://rhts.redhat.com/testlogs/2010/05/156034/400833/3242361/sys.log

Comment 7 Andrew Jones 2010-05-19 20:46:18 UTC
(In reply to comment #6)
> hi Andrew,
> Thanks your reminder, I checked my failed jobs about the mem size again, seems
> there is limit with the available memory in kernel-xen on host, the info is

Right, there are limits. 32G is the limit for a 64b dom0, and 16G for a 32b. There's a warning in the third log file you linked to.

<4>RAM exceeds maximum supported memory for x86, Truncating to 64GB
<4>Warning only 4GB will be used.
<4>Use a PAE enabled kernel.

When I hopped on the machine it was booted to a bare-metal kernel, so I just did 'free -m'. If you want to check system memory on a dom0 (kernel-xen) machine then you should do 'xm info | grep total_mem'.

Since I saw 64G on the system, and also the warning I've copied above in the log, then I'm pretty sure this bug is a dup of bug 504988. I guess leaving it closed as NOTABUG is fine as well though.


Note You need to log in before you can comment on or make changes to this bug.