Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 160539 - [RHEL3] hidden bomb of kmap_atomic/kunmap_atomic bug?
Summary: [RHEL3] hidden bomb of kmap_atomic/kunmap_atomic bug?
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: All
OS: Linux
Target Milestone: ---
Assignee: Dave Anderson
QA Contact: Brian Brock
Depends On:
Blocks: 168424
TreeView+ depends on / blocked
Reported: 2005-06-15 17:45 UTC by Issue Tracker
Modified: 2008-10-15 17:24 UTC (History)
4 users (show)

Fixed In Version: RHSA-2006-0144
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2006-03-15 16:05:43 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2006:0144 qe-ready SHIPPED_LIVE Moderate: Updated kernel packages available for Red Hat Enterprise Linux 3 Update 7 2006-03-15 05:00:00 UTC

Description Issue Tracker 2005-06-15 17:45:03 UTC
Escalated to Bugzilla from IssueTracker

Comment 23 Dave Anderson 2005-06-20 21:07:29 UTC
I don't think adding spin_lock_irqsave/restore in kmap_atomic() is going to fly.
It's worked that way since the beginning of time -- I'm sure as hell not going
to propose a spinlock be put in there!  ;-)

Anyway, this is the first time I've ever looked at this code, so help me get up
to speed.  The problem as described is in alloc_ldt() when reload is set:

static int alloc_ldt(mm_context_t *pc, int mincount, int reload)
        int oldsize, newsize, i;

        if (mincount <= pc->size)
                return 0;
         * LDT got larger - reallocate if necessary.
        oldsize = pc->size;
        mincount = (mincount+511)&(~511);
        newsize = mincount*LDT_ENTRY_SIZE;
        for (i = 0; i < newsize; i += PAGE_SIZE) {
                int nr = i/PAGE_SIZE;
                BUG_ON(i >= 64*1024);
                if (!pc->ldt_pages[nr]) {
                        pc->ldt_pages[nr] = alloc_page(GFP_HIGHUSER);
                        if (!pc->ldt_pages[nr])
                                return -ENOMEM;
        pc->size = mincount;
        if (reload) {
                if (current->mm->cpu_vm_mask != (1<<smp_processor_id()))
                        smp_call_function(flush_ldt, 0, 1, 1);
        return 0;


1. alloc_ldt() called load_LDT() on cpu A

2. alloc_ldt() called load_LDT() on CPU B, which returned, and then cpu B
   issued the flush_ldt() IPI:

   static void flush_ldt(void *mm)
           if (current->active_mm)

3. The IPI was received on cpu A while it was looping in alloc_ldt(), which
   caused it to essentially re-enter load_LDT().

Therefore, we could have a situation in load_LDT() where CPU A called
__kunmap_atomic_type() to clear a kmap pte, but got interrupted by the
flush_ldt() IPI before it did the subsequent __kmap_atomic().  The "re-entrant"
call to load_LDT() set up all the ptes, and upon completion, CPU A returned
back to the interrupted load_LDT() instance to do the __kmap_atomic() -- which
then sees the non-zero status of the kmap pte that it thought it just cleared,
causing an oops on "line 56 of the header": 

 * load one particular LDT into the current CPU
void load_LDT(mm_context_t *pc)
        struct page **pages = pc->ldt_pages;
        int count = pc->size;
        int nr_pages, i;

        if (!count) {
                pages = &default_ldt_page;
                count = 5;
        nr_pages = (count*LDT_ENTRY_SIZE + PAGE_SIZE-1) / PAGE_SIZE;

        for (i = 0; i < nr_pages; i++) {
                __kunmap_atomic_type(KM_LDT_PAGE0 - i);
                __kmap_atomic(pages[i], KM_LDT_PAGE0 - i);
                        (void *)__kmap_atomic_vaddr(KM_LDT_PAGE0), count);

Since the only caller of alloc_ldt() with reload set is write_ldt(), and
write_ldt() is only called from the sys_modify_ldt(), then that call to
load_IDT() in alloc_ldt() could only be done in process context.  I'm wondering
whether it would suffice to simply surround the calls to load_LDT() with
local_irq_disable() and local_irq_enable(): 

          if (reload) {
+                 local_irq_disable();
+                 local_irq_enable();
  #ifdef CONFIG_SMP
                  if (current->mm->cpu_vm_mask != (1<<smp_processor_id()))
                          smp_call_function(flush_ldt, 0, 1, 1);

Comment 24 Dave Anderson 2005-06-22 18:17:09 UTC
After unsuccessfully attempting to reproduce this problem with my own
hand-rolled multithreaded program -- with each thread doing nothing but
modify_ldt system calls (writes) -- I realized that the scenario that I
described, which presumably is the same scenario described by the reporter
below, is impossible:

cpu 1
smp_call_function(flush_ldt, 0, 1, 1)

cpu 2
       __kunmap_atomic_type(KM_LDT_PAGE0 - i)
IPI: flush_ldt    <------- unexpect intr
	__kunmap_atomic_type(KM_LDT_PAGE0 - i);
	__kmap_atomic(pages[i], KM_LDT_PAGE0 - i);	<--- the concerned entry is used
after this point
       __kmap_atomic(pages[i], KM_LDT_PAGE0 - i); <---- oops !!

Two processes with the same mm_struct cannot be operating simultaneously
in load_LDT().  The crucial section which makes its own call to load_LDT(),
and then issues the IPI, is in alloc_ldt() here:

        if (reload) {
                if (current->mm->cpu_vm_mask != (1<<smp_processor_id()))
                        smp_call_function(flush_ldt, 0, 1, 1);

However, alloc_ldt() can only be called by one owner at a time by
write_ldt(), because it's gated by the shared mm->context.sem semaphore:

        if (ldt_info.entry_number >= mm->context.size) {
                error = alloc_ldt(&current->mm->context,
                                       ldt_info.entry_number+1, 1);
                if (error < 0)
                        goto out_unlock;

The only other load_LDT() caller is switch_mm(), but that would be modifying
the kmap entries on another cpu.

So -- any other ideas?

Comment 25 Dave Anderson 2005-06-22 18:37:16 UTC
I have one -- I was presuming the other cpu sending the IPI was
a thread of the same process.  The true test would be to have
dissimilar processes issuing the IPIs.

The issue tracker (for some reason it didn't get forwarded to this bugzilla)

> using save_flags/cli/restore_flags should be enough, since every cpu has
> its own local km_type.

Since by definition we have to be process context while executing alloc_ldt(),
why wouldn't local_irq_disable() and local_irq_enable() suffice?

Comment 26 Dave Anderson 2005-06-22 21:22:53 UTC
As far as reproducing this goes, regardless whether I:

  1. continuously fork new processes that do the modify_ldt() calls, or
  2. have the same set of processes spin doing modify_ldt() calls, or
  3. have a combination of the two test procedures above simultaneously running,

I can never get the IPI to be issued from alloc_ldt():

        if (reload) {
                if (current->mm->cpu_vm_mask != (1<<smp_processor_id()))
                        smp_call_function(flush_ldt, 0, 1, 1);

It would seem that if test scenario 2 (spinning processes) is taking place,
at least some of the test processes would find themselves switching cpus.
I can get up to 6000 load_LDT calls per second (kmap'ing all 16 ldt entries),
but can't get the IPI to take place, which is essential for the oops.

I suppose I could hack up alloc_ldt() to always do the smp_call_function(),
but I would really like to be able to reproduce this legitimately. 

Comment 27 Dave Anderson 2005-06-23 19:35:40 UTC
Couple of data points...

I haven't been able to reproduce this problem with a "real" kernel
as of yet.  I've got a 4-cpu box with one single-threaded process
continously doing modify_ldt write system calls, and one 3-threaded
process with each thread doing the same.  I can see with top that
the threaded processes are quite often running simultaneously on at
least 2 or 3 cpus, while the single-threaded process is running on a
different cpu.  Recall that in order for the flush_ldt IPI to even
be issued, the issuer must be a multi-threaded process with one of
its other threads running on another cpu:

        if (reload) {
                if (current->mm->cpu_vm_mask != (1<<smp_processor_id()))
                        smp_call_function(flush_ldt, 0, 1, 1);

That being said, I went ahead on a 2-cpu box and kludged up a kernel 
that does the smp_call_function() above every time.  By doing that,
I was able to get a "kernel BUG in header file at line 56" oops,
as seen in the 3 oops in the supplied IT file:

FWIW, here are the backtraces, where the "ldt" process on cpu 0
issued the smp_call_function() to another "ldt" process on cpu 1,
which caught it in between the __kunmap_atomic_type() and
__kmap_atomic() calls in load_LDT():

crash> bt -a
PID: 6130   TASK: d4436000  CPU: 0   COMMAND: "ldt"
 #0 [d4437ed0] smp_call_function_interrupt at c011cbcf
 #1 [d4437ed8] call_call_function_interrupt at c03feca8
    EAX: 00000000  EBX: 00000001  ECX: 00000000  EDX: 00000001  EBP: 00002000
    DS:  0068      ESI: d4436000  ES:  0068      EDI: 00000001
    CS:  0060      EIP: c011caa4  ERR: fffffffb  EFLAGS: 00000297
 #2 [d4437f0c] smp_call_function at c011caa4
 #3 [d4437f48] alloc_ldt at c0111a96
 #4 [d4437f78] write_ldt at c011242b
 #5 [d4437fac] sys_modify_ldt at c01124a3
 #6 [d4437fc0] system_call at c03fe068
    EAX: 0000007b  EBX: 00000001  ECX: bfffe6e0  EDX: 00000010
    DS:  002b      ESI: bfffe7d4  ES:  002b      EDI: 00fb0798
    SS:  002b      ESP: bfffe6b4  EBP: bfffe708
    CS:  0023      EIP: b75fe97e  ERR: 0000007b  EFLAGS: 00000296

PID: 6129   TASK: d4464000  CPU: 1   COMMAND: "ldt"
 #0 [d4465dd0] netconsole_netdump at e2dd0703
 #1 [d4465de4] try_crashdump at c0129053
 #2 [d4465df4] die at c010c6a2
 #3 [d4465e08] do_invalid_op at c010c8b2
 #4 [d4465ea8] error_code (via invalid_op) at c03fe1c0
    EAX: 00000025  EBX: fffaa000  ECX: 00000001  EDX: c038c018  EBP: c0009d50
    DS:  0068      ESI: 13df6363  ES:  0068      EDI: 00000363
    CS:  0060      EIP: c0129007  ERR: ffffffff  EFLAGS: 00010286
 #5 [d4465ee4] __out_of_line_bug at c0129007
 #6 [d4465ef0] load_LDT at c011274f
 #7 [d4465f48] alloc_ldt at c0111a37
 #8 [d4465f78] write_ldt at c011242b
 #9 [d4465fac] sys_modify_ldt at c01124a3
#10 [d4465fc0] system_call at c03fe068
    EAX: 0000007b  EBX: 00000001  ECX: bfffed00  EDX: 00000010
    DS:  002b      ESI: bfffedf4  ES:  002b      EDI: 00fb0798
    SS:  002b      ESP: bfffecd4  EBP: bfffed28
    CS:  0023      EIP: b75fe97e  ERR: 0000007b  EFLAGS: 00000292

So it's a "proof-of-concept" reproducer, since if the issuer were
to have been multi-threaded with a cohort thread running on another
cpu, the oops would have occurred.

I'll run the same test with the local_irq_disable/enable wrappers.

Comment 29 Dave Anderson 2005-07-01 19:42:02 UTC
Can this fix wait for RHEL3-U7, or should a (belated) attempt to propose
it for RHEL3-U6 be made?

Comment 30 Ernie Petrides 2005-07-21 21:16:22 UTC
Putting in NEEDINFO.  Btw, U6 is now closed.

Comment 34 Dave Anderson 2005-08-29 17:47:47 UTC
Patch posted to rhkernel-list:

Comment 35 Ernie Petrides 2005-09-15 04:11:15 UTC
A fix for this problem has just been committed to the RHEL3 U7
patch pool this evening (in kernel version 2.4.21-37.2.EL).

Comment 38 Red Hat Bugzilla 2006-03-15 16:05:43 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

Note You need to log in before you can comment on or make changes to this bug.