Hi,

On Tue, 2005-03-08 at 09:28, Stephen C. Tweedie wrote:

> I think it should be OK just to move the page->mapping != mapping test
> above the page>index > end test.  Sure, if all the pages have been
> stolen by the time we see them, then we'll repeat without advancing
> "next"; but we're still making progress in that case because pages that
> were previously in this mapping *have* been removed, just by a different
> process.  If we're really concerned about that case we can play the
> trick that invalidate_mapping_pages() tries and do a "next++" in that
> case.

Variant on akpm's last patch with this change, plus one extra
s/page->index/page_index/ that he missed.  I've moved the whole
next=page_index+1 and wrapping check up to just after we've tested the
mapping, so even if we break early all of that updating has already been
done reliably and we don't end up setting next in two places in the
code.

A 400-task fsstress used to livelock reliably within minutes inside
O_DIRECT without any fix; this patch has been running for over an hour
now without problem.

--Stephen

Fix livelock in invalidate_inode_pages2_range

invalidate_inode_pages2_range can loop indefinitely if pagevec_lookup
returns only pages beyond the end of the range being invalidated.  Fix
it by advancing the restart pointer, "next", every time we see a page on
the correct mapping, not just if the page is within the range; and break
out early when we encounter such a page.  Based on a proposed fix by
akpm.

Signed-off-by: Stephen Tweedie <[EMAIL PROTECTED]>


--- linux-2.6-ext3/mm/truncate.c.=K0002=.orig
+++ linux-2.6-ext3/mm/truncate.c
@@ -268,25 +268,31 @@ int invalidate_inode_pages2_range(struct
                        min(end - next, (pgoff_t)PAGEVEC_SIZE - 1) + 1)) {
                for (i = 0; !ret && i < pagevec_count(&pvec); i++) {
                        struct page *page = pvec.pages[i];
+                       pgoff_t page_index;
                        int was_dirty;
 
                        lock_page(page);
-                       if (page->mapping != mapping || page->index > end) {
+                       if (page->mapping != mapping) {
                                unlock_page(page);
                                continue;
                        }
-                       wait_on_page_writeback(page);
-                       next = page->index + 1;
+                       page_index = page->index;
+                       next = page_index + 1;
                        if (next == 0)
                                wrapped = 1;
+                       if (page_index > end) {
+                               unlock_page(page);
+                               break;
+                       }
+                       wait_on_page_writeback(page);
                        while (page_mapped(page)) {
                                if (!did_range_unmap) {
                                        /*
                                         * Zap the rest of the file in one hit.
                                         */
                                        unmap_mapping_range(mapping,
-                                           page->index << PAGE_CACHE_SHIFT,
-                                           (end - page->index + 1)
+                                           page_index << PAGE_CACHE_SHIFT,
+                                           (end - page_index + 1)
                                                        << PAGE_CACHE_SHIFT,
                                            0);
                                        did_range_unmap = 1;
@@ -295,7 +301,7 @@ int invalidate_inode_pages2_range(struct
                                         * Just zap this page
                                         */
                                        unmap_mapping_range(mapping,
-                                         page->index << PAGE_CACHE_SHIFT,
+                                         page_index << PAGE_CACHE_SHIFT,
                                          PAGE_CACHE_SIZE, 0);
                                }
                        }

Reply via email to