Note: This is a beta release of Red Hat Bugzilla 5.0. The data contained within is a snapshot of the live data so any changes you make will not be reflected in the production Bugzilla. Also email is disabled so feel free to test any aspect of the site that you want. File any problems you find or give feedback here.
Bug 159991 - [taroon patch] fix for indefinite postponement under __alloc_pages()
Summary: [taroon patch] fix for indefinite postponement under __alloc_pages()
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: kernel
Version: 3.0
Hardware: All
OS: Linux
Target Milestone: ---
Assignee: Ernie Petrides
QA Contact: Brian Brock
Depends On:
Blocks: 156320
TreeView+ depends on / blocked
Reported: 2005-06-09 22:03 UTC by Tim Burke
Modified: 2007-11-30 22:07 UTC (History)
1 user (show)

Fixed In Version: RHSA-2005-663
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2005-09-28 15:20:45 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2005:663 qe-ready SHIPPED_LIVE Important: Updated kernel packages available for Red Hat Enterprise Linux 3 Update 6 2005-09-28 04:00:00 UTC

Description Tim Burke 2005-06-09 22:03:08 UTC
From: Ernie Petrides <petrides>
Date: Thu, 12 May 2005 21:02:45 -0400
Subject: [taroon patch] fix for indefinite postponement under __alloc_pages()
Tracking: 0978.petrides.rebal-laundry-zone.patch
Archives: 2005-May/msg00253.html
Status: committed in -32.5.EL


While trying to develop a reproducer for the repeated-OOM-kill problem,
I ran into a VM problem that effectively caused my test system to hang
(or more specifically, to make no visible progress nor allow ^C via ssh
sessions to kill run-away processes).

The scenario and reproducer are documented in the RHKL archives here:

After much consultation with Larry, it was determined that the two-process
test program managed to get the two-cpu test system into a condition of
"indefinite postponement" in concurrent loops of the following functions:


with the innermost function continually returning a non-zero "work done"
value.  This behavior comes from 0777.lwoodman.incorrect-oom-kill.patch
(committed to U5), which was a fix for inappropriate OOM killing when
progress was actually being made.  That fix makes rebalance_laundry_zone()
save a zone's inactive-laundry-page count before releasing a lock on the
zone.  Then, after reacquiring the lock, if the current count value differs
from the saved value, it is assumed that some progress has been made, and
a "work done" indicator is incremented.  This utlimately results in the
allocating process staying in the outermost loop to try again, and more
importantly, preventing do_try_to_free_pages() from calling out_of_memory().

The patch below fixes this problem by only bumping the "work done" value
if the current count has been reduced from the saved count.  It also
moves the last of the three tests under the zone lock (where it belongs).

Without this fix, the reproducer repeatedly "hung" my test system for
over an hour.  With this fix, the reproducer would be OOM-killed in 2-3

Please review/ack/nak as you see fit.

Thanks.  -ernie

--- linux-2.4.21/mm/vmscan.c.orig
+++ linux-2.4.21/mm/vmscan.c
@@ -847,27 +847,27 @@ int rebalance_laundry_zone(struct zone_s
                        if ((gfp_mask & __GFP_WAIT) && (work_done < max_work)) {
                               int timed_out;
                                /* Page is being freed, waiting on lru lock */
+                               local_count = zone->inactive_laundry_pages;
                                if (!atomic_inc_if_nonzero(&page->count)) {
-                                       local_count = zone->inactive_laundry_pages;
-                                       if (local_count !=
+                                       if (zone->inactive_laundry_pages <
+                                           local_count)
                                /* move page to tail so every caller won't wait
on it */
                                list_add(&page->lru, &zone->inactive_laundry_list);
-                               local_count = zone->inactive_laundry_pages;
                                timed_out = wait_on_page_timeout(page, 5 * HZ);
-                               if (local_count !=
zone->inactive_laundry_pages)+                               if
(zone->inactive_laundry_pages < local_count)
                                 * If we timed out and the page has been in
@@ -902,10 +902,10 @@ int rebalance_laundry_zone(struct zone_s
                        try_to_release_page(page, 0);
-                       if (local_count != zone->inactive_laundry_pages)
-                               work_done++;
+                       if (zone->inactive_laundry_pages < local_count)
+                               work_done++;
                        if (unlikely((page->buffers != NULL)) &&
                                        PageInactiveLaundry(page)) {

Comment 2 Ernie Petrides 2005-06-10 04:07:43 UTC
A fix for this problem was committed to the RHEL3 U6 patch pool
on 26-May-2005 (in kernel version 2.4.21-32.5.EL).

Comment 7 Red Hat Bugzilla 2005-09-28 15:20:46 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

Note You need to log in before you can comment on or make changes to this bug.