Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 229850 - EIP in blktab
Summary: EIP in blktab
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel-xen
Version: rawhide
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Stephen Tweedie
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-02-23 19:39 UTC by Karl MacMillan
Modified: 2007-11-30 22:11 UTC (History)
4 users (show)

Fixed In Version: kernel-xen-2.6-2.6.20-2925.4.3.fc7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-04-27 07:34:56 UTC


Attachments (Terms of Use)

Description Karl MacMillan 2007-02-23 19:39:31 UTC
Description of problem:

Installing paravirt FC6 with xen kernel 2.6.19-1.2898.2.3.fc7xen reliably
results in the following kernel error:

Feb 22 12:18:38 localhost kernel: CPU:    1
Feb 22 12:18:38 localhost kernel: EIP:    0061:[<ee4ecfab>]    Not tainted VLI
Feb 22 12:18:38 localhost kernel: EFLAGS: 00010246   (2.6.19-1.2898.2.3.fc7xen #1)
Feb 22 12:18:38 localhost kernel: EIP is at dispatch_rw_block_io+0x96/0x853 [blktap]
Feb 22 12:18:38 localhost kernel: eax: ebcd3940   ebx: e77e89a4   ecx: ea81a801
  edx: 00000000
Feb 22 12:18:38 localhost kernel: esi: ee27e550   edi: d40b7fbc   ebp: e77e89b4
  esp: d40b7b9c
Feb 22 12:18:38 localhost kernel: ds: 007b   es: 007b   ss: 0069
Feb 22 12:18:39 localhost kernel: Process xvd 1 (pid: 4302, ti=d40b7000
task=c1bfacf0 task.ti=d40b7000)
Feb 22 12:18:39 localhost kernel: Stack: 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 
Feb 22 12:18:39 localhost kernel:        d40b7f1c 0000000a 00000000 ea81a800
d40b7f50 e77e89a4 00000001 00000000 
Feb 22 12:18:39 localhost kernel:        00000000 00000000 0b000000 000000d1
00041d25 00000000 00000016 00000016 
Feb 22 12:18:39 localhost kernel: Call Trace:
Feb 22 12:18:39 localhost kernel:  [<ee4ee0aa>] tap_blkif_schedule+0x29f/0x3df
[blktap]
Feb 22 12:18:39 localhost kernel:  [<c0431928>] kthread+0xc0/0xec
Feb 22 12:18:39 localhost kernel:  [<c040580f>] kernel_thread_helper+0x7/0x10
Feb 22 12:18:39 localhost kernel:  =======================
Feb 22 12:18:39 localhost kernel: Code: 50 c7 44 24 70 00 00 00 00 81 38 00 00
ad de 74 10 ff 44 24 70 83 c0 04 83 7c 24 70 20 74 0c eb e8 81 7c 24 70 00 00 ad
de 75 0d 
<0f> 0b b3 04 19 e5 4e ee e9 80 07 00 00 8b 44 24 30 8a 40 01 0f 
Feb 22 12:18:39 localhost kernel: EIP: [<ee4ecfab>]
dispatch_rw_block_io+0x96/0x853 [blktap] SS:ESP 0069:d40b7b9c
Feb 22 12:31:34 localhost kernel:  <6>xenbr0: port 3(vif1.0) entering disabled state
Feb 22 12:31:34 localhost kernel: device vif1.0 left promiscuous mode
Feb 22 12:31:34 localhost kernel: xenbr0: port 3(vif1.0) entering disabled state
Feb 22 12:31:34 localhost kernel: BUG: unable to handle kernel paging request at
virtual address d3151008
Feb 22 12:31:34 localhost kernel:  printing eip:
Feb 22 12:31:34 localhost kernel: c0459a05
Feb 22 12:31:34 localhost kernel: 10e1b000 -> *pde = 00000000:0cf55001
Feb 22 12:31:34 localhost kernel: 11955000 -> *pme = 00000000:0208f067
Feb 22 12:31:34 localhost kernel: 0008f000 -> *pte = 00000000:0b751061
Feb 22 12:31:34 localhost kernel: Oops: 0003 [#2]
Feb 22 12:31:34 localhost kernel: SMP 
Feb 22 12:31:34 localhost kernel: last sysfs file: /class/net/eth0/carrier

This is on a dual Xeon running 32bit (dell precision workstation 470) using
virt-manager to do the install over ftp. The virtual disk is a regular file.
This happens while anaconda is formatting the filesystem (normally) or
installing packages (once).


Version-Release number of selected component (if applicable):

kernel-xen-2.6.18-1.2849.fc6
xen-devel-3.0.4-6.fc7
xen-libs-3.0.4-6.fc7
xen-3.0.4-6.fc7
kernel-xen-2.6.19-1.2898.2.3.fc7

libvirt-0.2.0-3.fc7
virt-manager-0.3.1-2.fc7
libvirt-python-0.2.0-3.fc7
python-virtinst-0.101.0-2.fc7

This also happened with a previous version of xen and libvirt/virt-manager.

How reproducible:

Always

Steps to Reproduce:

Create a paravirt domain and install FC6 over ftp / http with virt-manager.

Comment 1 Karl MacMillan 2007-02-23 19:42:18 UTC
Forgot to add:

The system doesn't lock up at this point, but the guest domain is mainly
unresponsive. The guest can't be stopped or destroyed either via virt-manager or
xm. Eventually the system locks up without producing further errors.

Comment 2 Daniel Berrange 2007-02-23 19:47:14 UTC
I've seen exactly same crashes on x86_64 rawhide kernel-xen in Dom0, so don't
think this is arch specific. It is a little non-deterministic - I can run
existing VMs under light load without it hitting too often, but if I do a fresh
VM install it'll crash nearly everytime. So there's some wierd race there in the
blktap code I reckon.

Comment 3 Mark McLoughlin 2007-02-28 14:05:48 UTC
I'm seeing this on x86_64 rawhide, reliably happening when anaconda starts
glibc-common. This call trace looks a bit more useful:

Kernel BUG at drivers/xen/blktap/blktapmain.c:1203
invalid opcode: 0000 [1] SMP 
last sysfs file: /devices/pci0000:00/0000:00:00.0/irq
CPU 2 
Modules linked in: xt_physdev bridge netloop netbk blktap blkbk autofs4 hidp
rfcomm l2cap bluetooth sunrpc ip_conntrack_netbi
os_ns xt_state ip_conntrack nfnetlink ipt_REJECT iptable_filter ip_tables
xt_tcpudp ip6t_REJECT ip6table_filter ip6_tables x_
tables ipv6 dm_multipath video sbs i2c_ec button battery asus_acpi ac lp e1000
snd_intel8x0 snd_ac97_codec snd_ac97_bus snd_s
eq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss
parport_pc snd_mixer_oss i2c_i801 i2c_core parport
 floppy serial_core shpchp ide_cd snd_pcm snd_timer snd soundcore snd_page_alloc
cdrom pcspkr sg dm_snapshot dm_zero dm_mirro
r dm_mod ata_piix libata sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd
Pid: 3787, comm: xvd 1 Not tainted 2.6.19-1.2898.2.3.fc7xen #1
RIP: e030:[<ffffffff88381118>]  [<ffffffff88381118>]
:blktap:dispatch_rw_block_io+0x98/0x966
RSP: e02b:ffff8800c4121aa0  EFLAGS: 00010246
RAX: 00000000dead0000 RBX: ffff8800dcd79e80 RCX: 0000000000000001
RDX: ffff8800e3349ac0 RSI: ffff8800c4121e70 RDI: ffff8800dcd79e80
RBP: ffff8800e33496c0 R08: 0000070000000221 R09: 00000700000002ba
R10: 0000070000000321 R11: 0000070000000243 R12: ffff8800dcd79e90
R13: ffff8800e473e000 R14: 0000000000002e6c R15: ffff8800e33496c0
FS:  00002aaaaaac9fa0(0000) GS:ffffffff805b9100(0000) knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 00000000de45e000 CR4: 0000000000002620
Process xvd 1 (pid: 3787, threadinfo ffff8800c4120000, task ffff8800e25340c0)
Stack:  0000000000000000 ffff8800e473e000 ffff8800c4121e70 ffff8800dcd79e80
 0000000000000001 0000000000000000 0000000000000000 0100000000000000
 000000000000001e 000000000000014a 0000000200000000 0000000100000001
Call Trace:
 [<ffffffff80283327>] find_busiest_group+0x1db/0x447
 [<ffffffff802629a6>] _spin_unlock_irq+0x9/0x10
 [<ffffffff80260d84>] thread_return+0x64/0xfe
 [<ffffffff8022f469>] __wake_up+0x38/0x4f
 [<ffffffff883824a0>] :blktap:tap_blkif_schedule+0x2ef/0x42f
 [<ffffffff883821b1>] :blktap:tap_blkif_schedule+0x0/0x42f
 [<ffffffff80298770>] keventd_create_kthread+0x0/0x66
 [<ffffffff80233789>] kthread+0xd0/0x100
 [<ffffffff8025ea98>] child_rip+0xa/0x12
 [<ffffffff80298770>] keventd_create_kthread+0x0/0x66
 [<ffffffff802336b9>] kthread+0x0/0x100
 [<ffffffff8025ea8e>] child_rip+0x0/0x12

Code: 0f 0b 68 31 29 38 88 c2 b3 04 e9 8a 08 00 00 48 8b 54 24 10 
RIP  [<ffffffff88381118>] :blktap:dispatch_rw_block_io+0x98/0x966
 RSP <ffff8800c4121aa0>


Comment 4 Mark McLoughlin 2007-02-28 14:08:03 UTC
Worryingly, if I then destroy the guest, Dom0 oops and dies too

Comment 5 Mark McLoughlin 2007-02-28 15:06:48 UTC
Okay, found it ... the problem seems to be that some csets are being merged into
blktap.c, but not blktapmain.c

In this case, we're missing:

  http://lists.xensource.com/archives/html/xen-changelog/2006-11/msg00464.html

I've tested kernel-xen-2.6.19-1.2898.2.3.fc7 with the missing patch and a
paravirt install completes successfully

So, a couple of other things we should do:

  - re-submit the blktap modular build fix upstream to help prevent these
    kind of merge errors:

      http://lists.xensource.com/archives/html/xen-devel/2006-09/msg00859.html

  - review the other differences between blktap.c and blktapmain.c - currently
    it looks like it might just be devfs removals we made ourselves

Comment 6 Tatsuro Enokura 2007-03-20 10:00:18 UTC
Description of problem:

  Installing paravirt Fedora7 test2 with xen kernel 2.6.19-1.2898.2.3.fc7xen
reliably
  results in the following kernel error:

Mar  8 14:57:56 coolmint kernel: kernel BUG at drivers/xen/blktap/blktapmain.c:1203!
Mar  8 14:57:56 coolmint kernel: invalid opcode: 0000 [#1]
Mar  8 14:57:56 coolmint kernel: SMP 
Mar  8 14:57:56 coolmint kernel: last sysfs file:
/devices/pci0000:00/0000:00:1c.0/0000:02:00.0/irq
Mar  8 14:57:56 coolmint kernel: Modules linked in: xt_physdev iptable_filter
ip_tables x_tables i915 drm bridge netloop 
netbk blktap blkbk autofs4 hidp rfcomm l2cap bluetooth sunrpc dm_mirror dm_mod
video sbs i2c_ec button battery asus_acpi 
ac ipv6 lp snd_hda_intel snd_hda_codec snd_seq_dummy snd_seq_oss
snd_seq_midi_event snd_seq snd_seq_device parport_pc 
snd_pcm_oss snd_mixer_oss ide_cd parport i2c_i801 irda cdrom snd_pcm serio_raw
sky2 sg crc_ccitt i2c_core snd_timer 
pcspkr snd soundcore iTCO_wdt snd_page_alloc serial_core joydev ata_piix libata
sd_mod scsi_mod ext3 jbd ehci_hcd 
ohci_hcd uhci_hcd
Mar  8 14:57:56 coolmint kernel: CPU:    1
Mar  8 14:57:56 coolmint kernel: EIP:    0061:[<ee510fab>]    Not tainted VLI
Mar  8 14:57:56 coolmint kernel: EFLAGS: 00010246   (2.6.19-1.2898.2.3.fc7xen #1)
Mar  8 14:57:56 coolmint kernel: EIP is at dispatch_rw_block_io+0x96/0x853 [blktap]
Mar  8 14:57:56 coolmint kernel: eax: e34d1e40   ebx: ebde79a4   ecx: e8f00801 
 edx: 00000000
Mar  8 14:57:56 coolmint kernel: esi: ee554b38   edi: ec3ddfbc   ebp: ebde79b4 
 esp: ec3ddb9c
Mar  8 14:57:56 coolmint kernel: ds: 007b   es: 007b   ss: 0069
Mar  8 14:57:56 coolmint kernel: Process xvd 1 (pid: 3234, ti=ec3dd000
task=e855a590 task.ti=ec3dd000)
Mar  8 14:57:56 coolmint kernel: Stack: 00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000 
Mar  8 14:57:56 coolmint kernel:        ec3ddc9c 00000000 00000000 e8f00800
ec3ddf50 ebde79a4 00000001 00000000 
Mar  8 14:57:56 coolmint kernel:        00000000 00000000 01000000 000000b0
004f2755 00000000 00000002 00000002 
Mar  8 14:57:56 coolmint kernel: Call Trace:
Mar  8 14:57:56 coolmint kernel:  [<ee5120aa>] tap_blkif_schedule+0x29f/0x3df
[blktap]
Mar  8 14:57:56 coolmint kernel:  [<c0431928>] kthread+0xc0/0xec
Mar  8 14:57:56 coolmint kernel:  [<c040580f>] kernel_thread_helper+0x7/0x10
Mar  8 14:57:56 coolmint kernel:  =======================
Mar  8 14:57:56 coolmint kernel: Code: 50 c7 44 24 70 00 00 00 00 81 38 00 00 ad
de 74 10 ff 44 24 70 83 c0 04 83 7c 24 
70 20 74 0c eb e8 81 7c 24 70 00 00 ad de 75 0d <0f> 0b b3 04 19 25 51 ee e9 80
07 00 00 8b 44 24 30 8a 40 01 0f 
Mar  8 14:57:56 coolmint kernel: EIP: [<ee510fab>]
dispatch_rw_block_io+0x96/0x853 [blktap] SS:ESP 0069:ec3ddb9c


This is on a Core Duo running 32bit (Fujitsu FMV-S8225) using
virt-install to do the install over http. The virtual disk is a regular file.
This happens while anaconda is formatting the filesystem or installing packages.


Version-Release number of selected component (if applicable):
  xen-3.0.4-7.fc7
  xen-libs-3.0.7-9.fc7
  xen-devel-3.0.7-9.fc7
  kernel-xen-2.6.19-1.2898.2.3.fc7

  libvirt: 0.2.0(revision: 1.445)
  virt-install: 0.3.1(changeset 117: 2e5b60ecbd93)


How reproducible:
  Always

Steps to Reproduce:
  Create a paravirt domain and install Fedora7 test2 over ftp / http with
virt-install.

  virt-install --name=F7test2_PV --file=/root/F7test2_PV.img --file-size=5
--ram=512 \
  --paravirt --location=http://10.131.236.20/f7test2_x86 --nographics


Comment 8 Mark McLoughlin 2007-04-27 07:34:56 UTC
Should be fixed in rawhide


Note You need to log in before you can comment on or make changes to this bug.