Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 233653 - race condition in nbd driver triggers BUG in kunmap and kernel panic
Summary: race condition in nbd driver triggers BUG in kunmap and kernel panic
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Neil Horman
QA Contact: Martin Jenner
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2007-03-23 16:27 UTC by Paul Clements
Modified: 2018-10-19 22:42 UTC (History)
1 user (show)

Fixed In Version: RHBA-2007-0791
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-11-15 16:23:17 UTC


Attachments (Terms of Use)
patch #1 (deleted)
2007-03-23 16:27 UTC, Paul Clements
no flags Details | Diff
patch #2 (deleted)
2007-03-23 16:28 UTC, Paul Clements
no flags Details | Diff


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2007:0791 normal SHIPPED_LIVE Updated kernel packages available for Red Hat Enterprise Linux 4 Update 6 2007-11-14 18:25:55 UTC

Description Paul Clements 2007-03-23 16:27:00 UTC
Description of problem:
There is a race condition in the nbd driver that triggers the following BUG in
kunmap:


Mar 18 10:54:28 dm1 kernel: ------------[ cut here ]------------
Mar 18 10:54:28 dm1 kernel: kernel BUG at mm/highmem.c:193!
Mar 18 10:54:28 dm1 kernel: invalid operand: 0000 [#1]
Mar 18 10:54:28 dm1 kernel: SMP
Mar 18 10:54:28 dm1 kernel: Modules linked in: nbd dgrp(U) parport_pc lp parport
autofs4 i2c_dev i2c_core sunrpc button battery ac md5 ipv6 joydev uhci_hcd
ehci_hcd hw_random e1000 floppy sg st dm_snapshot dm_zero dm_mirror ext3 jbd
raid1 dm_mod ata_piix libata megaraid_mbox megaraid_mm mptscsih mptsas mptspi
mptfc mptscsi mptbase sd_mod scsi_mod
Mar 18 10:54:28 dm1 kernel: CPU:    3
Mar 18 10:54:28 dm1 kernel: EIP:    0060:[<c014b6c2>]    Not tainted VLI
Mar 18 10:54:28 dm1 kernel: EFLAGS: 00010246   (2.6.9-42.0.3.ELsmp)
Mar 18 10:54:28 dm1 kernel: EIP is at kunmap_high+0x42/0x80
Mar 18 10:54:28 dm1 kernel: eax: 000000c7   ebx: 00000000   ecx: c043d008   edx:
00000000
Mar 18 10:54:28 dm1 kernel: esi: d67f2a00   edi: 00001000   ebp: 00000000   esp:
cb3d1c38
Mar 18 10:54:28 dm1 kernel: ds: 007b   es: 007b   ss: 0068
Mar 18 10:54:28 dm1 kernel: Process pdflush (pid: 27445, threadinfo=cb3d1000
task=de4dd6b0)
Mar 18 10:54:28 dm1 kernel: Stack: d3ea6610 f8a385ac d516a180 f8a3b920 e4010e0c
13956025 01000000 e4010e0c
Mar 18 10:54:28 dm1 kernel:        e4010e0c 17000000 00903afc 00100000 f7abd028
f8a3b920 f8a3b934 f7abd028
Mar 18 10:54:28 dm1 kernel:        00000008 f8a38a2d e4010e0c f7abd028 f7abd028
00000008 c0224448 dea7772c
Mar 18 10:54:28 dm1 kernel: Call Trace:
Mar 18 10:54:28 dm1 kernel:  [<f8a385ac>] nbd_send_req+0x283/0x2ec [nbd]
Mar 18 10:54:28 dm1 kernel:  [<f8a38a2d>] do_nbd_request+0x142/0x1cd [nbd]
Mar 18 10:54:28 dm1 kernel:  [<c0224448>] __generic_unplug_device+0x2b/0x2d
Mar 18 10:54:28 dm1 kernel:  [<c02255eb>] __make_request+0x421/0x46c
Mar 18 10:54:28 dm1 kernel:  [<c02257c4>] generic_make_request+0x18e/0x19e
Mar 18 10:54:28 dm1 kernel:  [<c015f484>] bio_clone+0x84/0x9c
Mar 18 10:54:28 dm1 kernel:  [<f8884bce>] make_request+0x2a0/0x2cd [raid1]
Mar 18 10:54:28 dm1 kernel:  [<f8884bce>] make_request+0x2a0/0x2cd [raid1]
Mar 18 10:54:28 dm1 kernel:  [<c02257c4>] generic_make_request+0x18e/0x19e
Mar 18 10:54:28 dm1 kernel:  [<c01204f5>] autoremove_wake_function+0x0/0x2d
Mar 18 10:54:28 dm1 kernel:  [<c022589e>] submit_bio+0xca/0xd2
Mar 18 10:54:28 dm1 kernel:  [<c0129e39>] __mod_timer+0x101/0x10b
Mar 18 10:54:28 dm1 kernel:  [<c015f2bd>] bio_alloc+0x100/0x168
Mar 18 10:54:28 dm1 kernel:  [<c015ec74>] submit_bh+0x141/0x166
Mar 18 10:54:28 dm1 kernel:  [<c015d74d>] __block_write_full_page+0x1f0/0x2ea
Mar 18 10:54:28 dm1 kernel:  [<f89038e4>] ext3_get_block+0x0/0x6c [ext3]
Mar 18 10:54:28 dm1 kernel:  [<c015eabc>] block_write_full_page+0xc5/0xce
Mar 18 10:54:28 dm1 kernel:  [<f89038e4>] ext3_get_block+0x0/0x6c [ext3]
Mar 18 10:54:28 dm1 kernel:  [<f890425a>] ext3_ordered_writepage+0xce/0x13a [ext3]
Mar 18 10:54:28 dm1 kernel:  [<f890416c>] bget_one+0x0/0x7 [ext3]
Mar 18 10:54:28 dm1 kernel:  [<c0178962>] mpage_writepages+0x1c2/0x314
Mar 18 10:54:28 dm1 kernel:  [<f890418c>] ext3_ordered_writepage+0x0/0x13a [ext3]
Mar 18 10:54:28 dm1 kernel:  [<c014597c>] mapping_tagged+0x2b/0x33
Mar 18 10:54:28 dm1 kernel:  [<c01772cc>] __sync_single_inode+0x5f/0x1c1
Mar 18 10:54:28 dm1 kernel:  [<c0177660>] sync_sb_inodes+0x1a7/0x274
Mar 18 10:54:28 dm1 kernel:  [<c0145b04>] pdflush+0x0/0x1e
Mar 18 10:54:28 dm1 kernel:  [<c01777be>] writeback_inodes+0x91/0xde
Mar 18 10:54:28 dm1 kernel:  [<c0145286>] wb_kupdate+0x7b/0xde
Mar 18 10:54:28 dm1 kernel:  [<c0145a70>] __pdflush+0xec/0x180
Mar 18 10:54:28 dm1 kernel:  [<c0145b1e>] pdflush+0x1a/0x1e
Mar 18 10:54:28 dm1 kernel:  [<c014520b>] wb_kupdate+0x0/0xde
Mar 18 10:54:28 dm1 kernel:  [<c0145b04>] pdflush+0x0/0x1e
Mar 18 10:54:28 dm1 kernel:  [<c01341ed>] kthread+0x73/0x9b
Mar 18 10:54:28 dm1 kernel:  [<c013417a>] kthread+0x0/0x9b
Mar 18 10:54:28 dm1 kernel:  [<c01041f5>] kernel_thread_helper+0x5/0xb
Mar 18 10:54:28 dm1 kernel: Code: 08 0f 0b b7 00 51 60 2e c0 05 00 00 00 01 31
db c1 e8 0c 8b 14 85 20 b2 43 c0 4a 85 d2 89 14 85 20 b2 43 c0 74 05 4a 74 0a eb
17 <0f> 0b c1 00 51 60 2e c0 31 db 81 3d c0 cf 32 c0 c0 cf 32 c0 0f
Mar 18 10:54:28 dm1 kernel:  <0>Fatal exception: panic in 5 seconds


Version-Release number of selected component (if applicable):

RHEL 4 kernel version: 2.6.9-42.0.3.ELsmp


How reproducible: don't know -- intermittent


Steps to Reproduce:
1. The problem occurs when an I/O is sent over nbd and the reply for that I/O
comes back from the server before the sending routine has completed. This causes
pages to be freed before they get kunmapped, which results in a BUG. The bug
occurs on SMP systems as follows:

CPU0				CPU1
do_nbd_request
	add req to queuelist
	nbd_send_request
		send req head
		for each bio
			kmap
			send
				nbd_read_stat
					nbd_find_request
					nbd_end_request
			kunmap

When CPU1 finishes nbd_end_request, the request and all its associated
bio's are freed.  So when CPU0 calls kunmap whose argument is derived from
the last bio, it may crash.

Actual results: kernel panic

Expected results: no panic

Additional info: The 2 attached patches fix this problem. They went into the
mainline kernel 19Nov05.

Comment 1 Paul Clements 2007-03-23 16:27:00 UTC
Created attachment 150774 [details]
patch #1

Comment 2 Paul Clements 2007-03-23 16:28:26 UTC
Created attachment 150775 [details]
patch #2

Comment 3 RHEL Product and Program Management 2007-04-10 11:05:22 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 4 RHEL Product and Program Management 2007-04-18 22:25:49 UTC
This request was evaluated by Red Hat Kernel Team for inclusion in a Red
Hat Enterprise Linux maintenance release, and has moved to bugzilla 
status POST.

Comment 6 Jason Baron 2007-06-20 19:41:28 UTC
committed in stream U6 build 55.10. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/


Comment 9 errata-xmlrpc 2007-11-15 16:23:17 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0791.html



Note You need to log in before you can comment on or make changes to this bug.