On Fri, 6 Apr 2012, Nikola Ciprich wrote:
> Hi, sorry it took me a bit longer.
> 
> here's my backport,

To 3.0.27 I presume; I've not tried it against 3.2.14,
haven't checked if that would be much the same or not.

> compiles fine, kernel boots without any problems.
> please review.

Thank you for doing the work, but I'm afraid it looks wrong to me.
I'd be more confident to leave it to Mel myself.  Comments below.

> n.
> 
> Signed-off-by: Nikola Ciprich <[email protected]
> 
> (backport of upstream commit cc715d99e529d470dde2f33a6614f255adea71f3)
> 
>     mm: vmscan: forcibly scan highmem if there are too many buffer_heads 
> pinning highmem
>     
>     Stuart Foster reported on bugzilla that copying large amounts of data
>     from NTFS caused an OOM kill on 32-bit X86 with 16G of memory.  Andrew
>     Morton correctly identified that the problem was NTFS was using 512
>     blocks meaning each page had 8 buffer_heads in low memory pinning it.
>     
>     In the past, direct reclaim used to scan highmem even if the allocating
>     process did not specify __GFP_HIGHMEM but not any more.  kswapd no longer
>     will reclaim from zones that are above the high watermark.  The intention
>     in both cases was to minimise unnecessary reclaim.  The downside is on
>     machines with large amounts of highmem that lowmem can be fully consumed
>     by buffer_heads with nothing trying to free them.
>     
>     The following patch is based on a suggestion by Andrew Morton to extend
>     the buffer_heads_over_limit case to force kswapd and direct reclaim to
>     scan the highmem zone regardless of the allocation request or watermarks.
>     
>     Addresses https://bugzilla.kernel.org/show_bug.cgi?id=42578
> 
> ---
> 
> diff -Naur linux-3.0/mm/vmscan.c 
> linux-3.0-cc715d99e529d470dde2f33a6614f255adea71f3-backport/mm/vmscan.c
> --- linux-3.0/mm/vmscan.c     2012-04-05 23:09:28.364000004 +0200
> +++ linux-3.0-cc715d99e529d470dde2f33a6614f255adea71f3-backport/mm/vmscan.c   
> 2012-04-05 23:25:30.989968627 +0200
> @@ -1581,6 +1581,14 @@

diff -p is often more helpful, and especially so for this patch:
here we are in shrink_active_list().

>                       putback_lru_page(page);
>                       continue;
>               }
> +             
> +             if (unlikely(buffer_heads_over_limit)) {
> +                     if (page_has_private(page) && trylock_page(page)) {
> +                             if (page_has_private(page))
> +                                     try_to_release_page(page, 0);
> +                             unlock_page(page);
> +                     }
> +             }
>  
>               if (page_referenced(page, 0, sc->mem_cgroup, &vm_flags)) {
>                       nr_rotated += hpage_nr_pages(page);

I don't think this does functional harm, but it duplicates the work
done by the pagevec_strip() you left in move_active_pages_to_lru(),
so the resulting source would be puzzling.

We could remove the pagevec_strip(), but really, it was just a
misunderstanding that led to my little buffer_heads_over_limit
cleanup in one place getting merged in with Mel's significant
buffer_heads_over_limit fix in another.

My cleanup doesn't deserve backporting to 3.0 or 3.2: I included it
in the 3.3 backport to avoid raised eyebrows, but once we get back
to kernels with pagevec_strip(), let's just leave this hunk out.

> @@ -2053,6 +2061,14 @@

Here we are in all_unreclaimable().

>       struct zoneref *z;
>       struct zone *zone;
>  
> +     /*
> +      * If the number of buffer_heads in the machine exceeds the maximum
> +      * allowed level, force direct reclaim to scan the highmem zone as
> +      * highmem pages could be pinning lowmem pages storing buffer_heads
> +      */
> +     if (buffer_heads_over_limit)
> +             sc->gfp_mask |= __GFP_HIGHMEM;
> +

But in Mel's patch that belongs to shrink_zones():
I don't see a reason to move it in the backport.

>       for_each_zone_zonelist_nodemask(zone, z, zonelist,
>                       gfp_zone(sc->gfp_mask), sc->nodemask) {
>               if (!populated_zone(zone))
> @@ -2514,7 +2530,8 @@

I think this hunk in balance_pgdat() is correct.

>                               (zone->present_pages +
>                                       KSWAPD_ZONE_BALANCE_GAP_RATIO-1) /
>                               KSWAPD_ZONE_BALANCE_GAP_RATIO);
> -                     if (!zone_watermark_ok_safe(zone, order,
> +                     if ((buffer_heads_over_limit && is_highmem_idx(i)) ||
> +                             !zone_watermark_ok_safe(zone, order,
>                                       high_wmark_pages(zone) + balance_gap,
>                                       end_zone, 0)) {
>                               shrink_zone(priority, zone, &sc);
> @@ -2543,6 +2560,17 @@

But this hunk in balance_pgdat() comes too late: it should set
end_zone in between the inactive_anon_is_low shrink_active_list
and the !zone_watermark_ok_safe() setting of end_zone higher up,
before the previous hunk.

>                               continue;
>                       }
>  
> +                     /*
> +                      * If the number of buffer_heads in the machine
> +                      * exceeds the maximum allowed level and this node
> +                      * has a highmem zone, force kswapd to reclaim from
> +                      * it to relieve lowmem pressure.
> +                      */
> +                     if (buffer_heads_over_limit && is_highmem_idx(i)) {
> +                             end_zone = i;
> +                             break;
> +                     }
> +
>                       if (!zone_watermark_ok_safe(zone, order,
>                                       high_wmark_pages(zone), end_zone, 0)) {
>                               all_zones_ok = 0;

Hugh
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to