From: Jens Axboe <[email protected]>
Date: Fri, 21 May 2010 20:00:35 +0200

commit 6423104b6a1e6f0c18be60e8c33f02d263331d5e upstream.

Commit 69b62d01 fixed up most of the places where we would enter
busy schedule() spins when disabling the periodic background
writeback. This fixes up the sb timer so that it doesn't get
hammered on with the delay disabled, and ensures that it gets
rearmed if needed when /proc/sys/vm/dirty_writeback_centisecs
gets modified.

bdi_forker_task() also needs to check for !dirty_writeback_centisecs
and use schedule() appropriately, fix that up too.

Signed-off-by: Jens Axboe <[email protected]>
Tested-by: Xavier Roche <[email protected]>
Signed-off-by: Jonathan Nieder <[email protected]>
---
Hi Greg,

This patch teaches pdflush not to busy-loop in certain circumstances.
It was applied to mainline during the 2.6.35 merge window.  Please
consider applying it to the 2.6.32.y-longterm tree,

Daniel Kobras wrote[1]:

> I've updated my devel machine to the latest kernel version, and now I can
> reproduce the problem. It's not the fsync(), actually, but but the stopping of
> the pdflush daemon that makes the flush-<major>:<minor> kernel threads enter a
> busy loop (spinning in bdi_writeback_task()). It can be triggered as simple 
> as:
>
>       echo 0 > /proc/sys/vm/dirty_writeback_centisecs
>
> (Use echo 500 > /proc/sys/vm/dirty_writeback_centisecs to restore a sane
> behaviour.) Suspending pdflush activity this way used to work before.  Now it
> seems to be interpreted as a zero wait time between iterations in the flush
> threads. Thus, I'd say this is a regression in the kernel.

Xavier Roche could reproduce the bug in

 - 2.6.34.4
 - 2.6.32.57

and could not reproduce the bug in

 - 2.6.24
 - 3.2.9
 - 2.6.32 + the above patch

More precisely, Xavier writes:

> It took me a while (the test machine was randomly freezing because of
> unrelated ACPI issue) but the patch does appears to fix the noflushd
> problem.
>
> With original 2.6.32-5-686 (fresh squeeze install on a x86 machine)
> after 15 minutes of uptime and noflushd running:
>
>   354 root      20   0     0    0    0 R 50.8  0.0   0:24.36 flush-8:0
> 13101 root      20   0     0    0    0 R 46.9  0.0   0:14.12 flush-8:32
>
> (/etc/init.d/noflushd stop "clears" the issue)
>
> With the patched kernel (linux-2.6_2.6.32.orig.tar.gz +
> writeback-fixups-for-dirty_writeback_centisecs.patch --
> linux-2.6_2.6.32-41.diff  not applied), I do not have anymore the flush
> problem.
>
> However, the ksoftirqd process shows up time to time and eats some CPU:
>
>     4 root      20   0     0    0    0 R 28.2  0.0   0:06.55 ksoftirqd/0
>
> This is not an issue as the load average stays very low (0,10) ; but the
> problem disappears totally when using 3.2.9.
>
> [ Note: if a disk is going idle because of a previous "/sbin/hdparm
> -SXXX /dev/sdXXX", the 100% CPU issue comes back (but can be solved by
> awakening the disk), which is not the case in 3.2.9, strangely. ]

Thanks,
Jonathan

[1] http://bugs.debian.org/594923

 include/linux/backing-dev.h |    1 +
 mm/backing-dev.c            |   15 ++++++++++-----
 mm/page-writeback.c         |    1 +
 3 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index b449e738533a..61e43a6f3141 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -105,6 +105,7 @@ void bdi_start_writeback(struct backing_dev_info *bdi, 
struct super_block *sb,
                                long nr_pages);
 int bdi_writeback_task(struct bdi_writeback *wb);
 int bdi_has_dirty_io(struct backing_dev_info *bdi);
+void bdi_arm_supers_timer(void);
 
 extern spinlock_t bdi_lock;
 extern struct list_head bdi_list;
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 67a33a5a1a93..d82440108131 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -41,7 +41,6 @@ static struct timer_list sync_supers_timer;
 
 static int bdi_sync_supers(void *);
 static void sync_supers_timer_fn(unsigned long);
-static void arm_supers_timer(void);
 
 static void bdi_add_default_flusher_task(struct backing_dev_info *bdi);
 
@@ -242,7 +241,7 @@ static int __init default_bdi_init(void)
 
        init_timer(&sync_supers_timer);
        setup_timer(&sync_supers_timer, sync_supers_timer_fn, 0);
-       arm_supers_timer();
+       bdi_arm_supers_timer();
 
        err = bdi_init(&default_backing_dev_info);
        if (!err)
@@ -364,10 +363,13 @@ static int bdi_sync_supers(void *unused)
        return 0;
 }
 
-static void arm_supers_timer(void)
+void bdi_arm_supers_timer(void)
 {
        unsigned long next;
 
+       if (!dirty_writeback_interval)
+               return;
+
        next = msecs_to_jiffies(dirty_writeback_interval * 10) + jiffies;
        mod_timer(&sync_supers_timer, round_jiffies_up(next));
 }
@@ -375,7 +377,7 @@ static void arm_supers_timer(void)
 static void sync_supers_timer_fn(unsigned long unused)
 {
        wake_up_process(sync_supers_tsk);
-       arm_supers_timer();
+       bdi_arm_supers_timer();
 }
 
 static int bdi_forker_task(void *ptr)
@@ -418,7 +420,10 @@ static int bdi_forker_task(void *ptr)
 
                        spin_unlock_bh(&bdi_lock);
                        wait = msecs_to_jiffies(dirty_writeback_interval * 10);
-                       schedule_timeout(wait);
+                       if (wait)
+                               schedule_timeout(wait);
+                       else
+                               schedule();
                        try_to_freeze();
                        continue;
                }
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 2c5d79236ead..52f71aebfc01 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -694,6 +694,7 @@ int dirty_writeback_centisecs_handler(ctl_table *table, int 
write,
        void __user *buffer, size_t *length, loff_t *ppos)
 {
        proc_dointvec(table, write, buffer, length, ppos);
+       bdi_arm_supers_timer();
        return 0;
 }
 
-- 
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to