Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 154639 (IT_71391) - kernel thread current->mm dereference in grab_swap_token causes oops
Summary: kernel thread current->mm dereference in grab_swap_token causes oops
Keywords:
Status: CLOSED ERRATA
Alias: IT_71391
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: All
OS: Linux
medium
high
Target Milestone: ---
: ---
Assignee: Rik van Riel
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 231639
TreeView+ depends on / blocked
 
Reported: 2005-04-13 09:08 UTC by Keith Holder
Modified: 2007-11-30 22:07 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-06-08 15:14:07 UTC


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2005:420 normal SHIPPED_LIVE Important: Updated kernel packages available for Red Hat Enterprise Linux 4 Update 1 2005-06-08 04:00:00 UTC

Description Keith Holder 2005-04-13 09:08:25 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.7.5) Gecko/20041110 Firefox/1.0

Description of problem:
Veritas has a kernel thread that performs asynchronous direct i/o on behalf of user programs. When the i/o data is available the kernel thread calls get_user_pages(), to make sure the user pages are present before moving the
data into user buffers. A side effect of that is, if the user program happens to be paged out, we end up in grab_swap_token, with current->mm set to NULL. Unfortunately the code dereferences current->mm without checking whether it is NULL.

stack functions back trace :-

grab_swap_token()
do_swap_page()
handle_pte_fault()
handle_mm_fault()
get_user_pages()
...
-----

A suitable fix would be to add the following to start of grab_swap_token()


if (!current->mm)
        return;



Version-Release number of selected component (if applicable):
kernel-smp-2.6.9-5.EL

How reproducible:
Sometimes

Steps to Reproduce:
1. Install veritas software stack onto system
2. Run Veritas' Oracle Data Manager stress suite
3.
  

Actual Results:  After several hours and under heavy stress the system oops in grab_swap_token()

Expected Results:  Function should just return if current->mm is NULL *or* allow an mm_sruct pointer
to be passed in as an argument. However, it still needs to check for mm_struct
pointer being NULL.

Additional info:

Comment 1 Rik van Riel 2005-04-13 12:55:14 UTC
I'll submit a (trivial) patch for the RHEL4 kernel to ignore a current->mm of NULL.

Btw, note that for most kernel threads current->mm tends to be &init_mm...

Comment 2 Rik van Riel 2005-04-13 14:24:16 UTC
Btw, have you verified that adding the check really fixes the issue, or does the
kernel simply crash elsewhere?  I am not 100% sure that running any task with a
NULL ->mm is valid...

Comment 3 Keith Holder 2005-04-14 10:44:42 UTC
When calling daemonize() to create a kernel thread, it calls exit_mm().
This sets tsk->mm to NULL and the thread/process runs with a 'lazy_tlb'.
Also, I thought all kernel threads didn't have an address space, hence mm
is supposed to be NULL.

Comment 4 Rik van Riel 2005-04-14 10:54:12 UTC
You're right.  Hmmm, I could've sworn they got moved to &init_mm.

Anyway, the patch has been submitted for inclusion into RHEL4 yesterday, and got
approved. Thank you for alerting us to this bug.

Comment 14 Keith Holder 2005-05-09 13:43:59 UTC
I have run a Veritas specific stress test (odmstress) non-stop for over 100 
hours with this patch fix (U1-kernel-2.6.9-6.43) and the testing was successful.

Comment 15 Tim Powers 2005-06-08 15:14:08 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2005-420.html



Note You need to log in before you can comment on or make changes to this bug.