Re: [PATCH 2/3] mm, vmscan: Only clear pgdat congested/dirty/writeback state when balanced
On 03/09/2017 08:56 AM, Mel Gorman wrote: > A pgdat tracks if recent reclaim encountered too many dirty, writeback > or congested pages. The flags control whether kswapd writes pages back > from reclaim context, tags pages for immediate reclaim when IO completes, > whether processes block on wait_iff_congested and whether kswapd blocks > when too many pages marked for immediate reclaim are encountered. > > The state is cleared in a check function with side-effects. With the patch > "mm, vmscan: fix zone balance check in prepare_kswapd_sleep", the timing > of when the bits get cleared changed. Due to the way the check works, > it'll clear the bits if ZONE_DMA is balanced for a GFP_DMA allocation > because it does not account for lowmem reserves properly. > > For the simoop workload, kswapd is not stalling when it should due to > the premature clearing, writing pages from reclaim context like crazy and > generally being unhelpful. > > This patch resets the pgdat bits related to page reclaim only when kswapd > is going to sleep. The comparison with simoop is then > > 4.11.0-rc14.11.0-rc1 >4.11.0-rc1 > vanilla fixcheck-v2 > clear-v2 > Ameanp50-Read 21670074.18 ( 0.00%) 20464344.18 ( 5.56%) > 19786774.76 ( 8.69%) > Ameanp95-Read 25456267.64 ( 0.00%) 25721423.64 ( -1.04%) > 24101956.27 ( 5.32%) > Ameanp99-Read 29369064.73 ( 0.00%) 30174230.76 ( -2.74%) > 27691872.71 ( 5.71%) > Ameanp50-Write1390.30 ( 0.00%) 1395.28 ( -0.36%) > 1011.91 ( 27.22%) > Ameanp95-Write 412901.57 ( 0.00%)37737.74 ( 90.86%) > 34874.98 ( 91.55%) > Ameanp99-Write 6668722.09 ( 0.00%) 666489.04 ( 90.01%) > 575449.60 ( 91.37%) > Ameanp50-Allocation 78714.31 ( 0.00%)86286.22 ( -9.62%) > 84246.26 ( -7.03%) > Ameanp95-Allocation 175533.51 ( 0.00%) 351812.27 (-100.42%) > 400058.43 (-127.91%) > Ameanp99-Allocation 247003.02 ( 0.00%) 6291171.56 (-2447.00%) > 10905600.00 (-4315.17%) > > Read latency is improved, write latency is mostly improved but allocation > latency is regressed. kswapd is still reclaiming inefficiently, > pages are being written back from writeback context and a host of other > issues. However, given the change, it needed to be spelled out why the > side-effect was moved. > > Signed-off-by: Mel GormanAcked-by: Vlastimil Babka
Re: [PATCH 2/3] mm, vmscan: Only clear pgdat congested/dirty/writeback state when balanced
On 03/09/2017 08:56 AM, Mel Gorman wrote: > A pgdat tracks if recent reclaim encountered too many dirty, writeback > or congested pages. The flags control whether kswapd writes pages back > from reclaim context, tags pages for immediate reclaim when IO completes, > whether processes block on wait_iff_congested and whether kswapd blocks > when too many pages marked for immediate reclaim are encountered. > > The state is cleared in a check function with side-effects. With the patch > "mm, vmscan: fix zone balance check in prepare_kswapd_sleep", the timing > of when the bits get cleared changed. Due to the way the check works, > it'll clear the bits if ZONE_DMA is balanced for a GFP_DMA allocation > because it does not account for lowmem reserves properly. > > For the simoop workload, kswapd is not stalling when it should due to > the premature clearing, writing pages from reclaim context like crazy and > generally being unhelpful. > > This patch resets the pgdat bits related to page reclaim only when kswapd > is going to sleep. The comparison with simoop is then > > 4.11.0-rc14.11.0-rc1 >4.11.0-rc1 > vanilla fixcheck-v2 > clear-v2 > Ameanp50-Read 21670074.18 ( 0.00%) 20464344.18 ( 5.56%) > 19786774.76 ( 8.69%) > Ameanp95-Read 25456267.64 ( 0.00%) 25721423.64 ( -1.04%) > 24101956.27 ( 5.32%) > Ameanp99-Read 29369064.73 ( 0.00%) 30174230.76 ( -2.74%) > 27691872.71 ( 5.71%) > Ameanp50-Write1390.30 ( 0.00%) 1395.28 ( -0.36%) > 1011.91 ( 27.22%) > Ameanp95-Write 412901.57 ( 0.00%)37737.74 ( 90.86%) > 34874.98 ( 91.55%) > Ameanp99-Write 6668722.09 ( 0.00%) 666489.04 ( 90.01%) > 575449.60 ( 91.37%) > Ameanp50-Allocation 78714.31 ( 0.00%)86286.22 ( -9.62%) > 84246.26 ( -7.03%) > Ameanp95-Allocation 175533.51 ( 0.00%) 351812.27 (-100.42%) > 400058.43 (-127.91%) > Ameanp99-Allocation 247003.02 ( 0.00%) 6291171.56 (-2447.00%) > 10905600.00 (-4315.17%) > > Read latency is improved, write latency is mostly improved but allocation > latency is regressed. kswapd is still reclaiming inefficiently, > pages are being written back from writeback context and a host of other > issues. However, given the change, it needed to be spelled out why the > side-effect was moved. > > Signed-off-by: Mel Gorman Acked-by: Vlastimil Babka
[PATCH 2/3] mm, vmscan: Only clear pgdat congested/dirty/writeback state when balanced
A pgdat tracks if recent reclaim encountered too many dirty, writeback or congested pages. The flags control whether kswapd writes pages back from reclaim context, tags pages for immediate reclaim when IO completes, whether processes block on wait_iff_congested and whether kswapd blocks when too many pages marked for immediate reclaim are encountered. The state is cleared in a check function with side-effects. With the patch "mm, vmscan: fix zone balance check in prepare_kswapd_sleep", the timing of when the bits get cleared changed. Due to the way the check works, it'll clear the bits if ZONE_DMA is balanced for a GFP_DMA allocation because it does not account for lowmem reserves properly. For the simoop workload, kswapd is not stalling when it should due to the premature clearing, writing pages from reclaim context like crazy and generally being unhelpful. This patch resets the pgdat bits related to page reclaim only when kswapd is going to sleep. The comparison with simoop is then 4.11.0-rc14.11.0-rc1 4.11.0-rc1 vanilla fixcheck-v2 clear-v2 Ameanp50-Read 21670074.18 ( 0.00%) 20464344.18 ( 5.56%) 19786774.76 ( 8.69%) Ameanp95-Read 25456267.64 ( 0.00%) 25721423.64 ( -1.04%) 24101956.27 ( 5.32%) Ameanp99-Read 29369064.73 ( 0.00%) 30174230.76 ( -2.74%) 27691872.71 ( 5.71%) Ameanp50-Write1390.30 ( 0.00%) 1395.28 ( -0.36%) 1011.91 ( 27.22%) Ameanp95-Write 412901.57 ( 0.00%)37737.74 ( 90.86%) 34874.98 ( 91.55%) Ameanp99-Write 6668722.09 ( 0.00%) 666489.04 ( 90.01%) 575449.60 ( 91.37%) Ameanp50-Allocation 78714.31 ( 0.00%)86286.22 ( -9.62%) 84246.26 ( -7.03%) Ameanp95-Allocation 175533.51 ( 0.00%) 351812.27 (-100.42%) 400058.43 (-127.91%) Ameanp99-Allocation 247003.02 ( 0.00%) 6291171.56 (-2447.00%) 10905600.00 (-4315.17%) Read latency is improved, write latency is mostly improved but allocation latency is regressed. kswapd is still reclaiming inefficiently, pages are being written back from writeback context and a host of other issues. However, given the change, it needed to be spelled out why the side-effect was moved. Signed-off-by: Mel Gorman--- mm/vmscan.c | 20 +++- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 4ea444142c2e..17b1afbce88e 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3091,17 +3091,17 @@ static bool zone_balanced(struct zone *zone, int order, int classzone_idx) if (!zone_watermark_ok_safe(zone, order, mark, classzone_idx)) return false; - /* -* If any eligible zone is balanced then the node is not considered -* to be congested or dirty -*/ - clear_bit(PGDAT_CONGESTED, >zone_pgdat->flags); - clear_bit(PGDAT_DIRTY, >zone_pgdat->flags); - clear_bit(PGDAT_WRITEBACK, >zone_pgdat->flags); - return true; } +/* Clear pgdat state for congested, dirty or under writeback. */ +static void clear_pgdat_congested(pg_data_t *pgdat) +{ + clear_bit(PGDAT_CONGESTED, >flags); + clear_bit(PGDAT_DIRTY, >flags); + clear_bit(PGDAT_WRITEBACK, >flags); +} + /* * Prepare kswapd for sleeping. This verifies that there are no processes * waiting in throttle_direct_reclaim() and that watermarks have been met. @@ -3134,8 +3134,10 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, int classzone_idx) if (!managed_zone(zone)) continue; - if (zone_balanced(zone, order, classzone_idx)) + if (zone_balanced(zone, order, classzone_idx)) { + clear_pgdat_congested(pgdat); return true; + } } return false; -- 2.11.0
[PATCH 2/3] mm, vmscan: Only clear pgdat congested/dirty/writeback state when balanced
A pgdat tracks if recent reclaim encountered too many dirty, writeback or congested pages. The flags control whether kswapd writes pages back from reclaim context, tags pages for immediate reclaim when IO completes, whether processes block on wait_iff_congested and whether kswapd blocks when too many pages marked for immediate reclaim are encountered. The state is cleared in a check function with side-effects. With the patch "mm, vmscan: fix zone balance check in prepare_kswapd_sleep", the timing of when the bits get cleared changed. Due to the way the check works, it'll clear the bits if ZONE_DMA is balanced for a GFP_DMA allocation because it does not account for lowmem reserves properly. For the simoop workload, kswapd is not stalling when it should due to the premature clearing, writing pages from reclaim context like crazy and generally being unhelpful. This patch resets the pgdat bits related to page reclaim only when kswapd is going to sleep. The comparison with simoop is then 4.11.0-rc14.11.0-rc1 4.11.0-rc1 vanilla fixcheck-v2 clear-v2 Ameanp50-Read 21670074.18 ( 0.00%) 20464344.18 ( 5.56%) 19786774.76 ( 8.69%) Ameanp95-Read 25456267.64 ( 0.00%) 25721423.64 ( -1.04%) 24101956.27 ( 5.32%) Ameanp99-Read 29369064.73 ( 0.00%) 30174230.76 ( -2.74%) 27691872.71 ( 5.71%) Ameanp50-Write1390.30 ( 0.00%) 1395.28 ( -0.36%) 1011.91 ( 27.22%) Ameanp95-Write 412901.57 ( 0.00%)37737.74 ( 90.86%) 34874.98 ( 91.55%) Ameanp99-Write 6668722.09 ( 0.00%) 666489.04 ( 90.01%) 575449.60 ( 91.37%) Ameanp50-Allocation 78714.31 ( 0.00%)86286.22 ( -9.62%) 84246.26 ( -7.03%) Ameanp95-Allocation 175533.51 ( 0.00%) 351812.27 (-100.42%) 400058.43 (-127.91%) Ameanp99-Allocation 247003.02 ( 0.00%) 6291171.56 (-2447.00%) 10905600.00 (-4315.17%) Read latency is improved, write latency is mostly improved but allocation latency is regressed. kswapd is still reclaiming inefficiently, pages are being written back from writeback context and a host of other issues. However, given the change, it needed to be spelled out why the side-effect was moved. Signed-off-by: Mel Gorman --- mm/vmscan.c | 20 +++- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 4ea444142c2e..17b1afbce88e 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3091,17 +3091,17 @@ static bool zone_balanced(struct zone *zone, int order, int classzone_idx) if (!zone_watermark_ok_safe(zone, order, mark, classzone_idx)) return false; - /* -* If any eligible zone is balanced then the node is not considered -* to be congested or dirty -*/ - clear_bit(PGDAT_CONGESTED, >zone_pgdat->flags); - clear_bit(PGDAT_DIRTY, >zone_pgdat->flags); - clear_bit(PGDAT_WRITEBACK, >zone_pgdat->flags); - return true; } +/* Clear pgdat state for congested, dirty or under writeback. */ +static void clear_pgdat_congested(pg_data_t *pgdat) +{ + clear_bit(PGDAT_CONGESTED, >flags); + clear_bit(PGDAT_DIRTY, >flags); + clear_bit(PGDAT_WRITEBACK, >flags); +} + /* * Prepare kswapd for sleeping. This verifies that there are no processes * waiting in throttle_direct_reclaim() and that watermarks have been met. @@ -3134,8 +3134,10 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, int classzone_idx) if (!managed_zone(zone)) continue; - if (zone_balanced(zone, order, classzone_idx)) + if (zone_balanced(zone, order, classzone_idx)) { + clear_pgdat_congested(pgdat); return true; + } } return false; -- 2.11.0
[PATCH 2/3] mm, vmscan: Only clear pgdat congested/dirty/writeback state when balanced
A pgdat tracks if recent reclaim encountered too many dirty, writeback or congested pages. The flags control whether kswapd writes pages back from reclaim context, tags pages for immediate reclaim when IO completes, whether processes block on wait_iff_congested and whether kswapd blocks when too many pages marked for immediate reclaim are encountered. The state is cleared in a check function with side-effects. With the patch "mm, vmscan: fix zone balance check in prepare_kswapd_sleep", the timing of when the bits get cleared changed. Due to the way the check works, it'll clear the bits if ZONE_DMA is balanced for a GFP_DMA allocation because it does not account for lowmem reserves properly. For the simoop workload, kswapd is not stalling when it should due to the premature clearing, writing pages from reclaim context like crazy and generally being unhelpful. This patch resets the pgdat bits related to page reclaim only when kswapd is going to sleep. The comparison with simoop is then 4.10.0-rc74.10.0-rc7 4.10.0-rc7 mmots-20170209 fixcheck-v1 clear-v1 Ameanp50-Read 22325202.49 ( 0.00%) 20026926.55 ( 10.29%) 19491134.58 ( 12.69%) Ameanp95-Read 26102988.80 ( 0.00%) 27023360.00 ( -3.53%) 24294195.20 ( 6.93%) Ameanp99-Read 30935176.53 ( 0.00%) 30994432.00 ( -0.19%) 30397053.16 ( 1.74%) Ameanp50-Write 976.44 ( 0.00%) 1905.28 (-95.12%) 1077.22 (-10.32%) Ameanp95-Write 15471.29 ( 0.00%)36210.09 (-134.05%) 36419.56 (-135.40%) Ameanp99-Write 35108.62 ( 0.00%) 479494.96 (-1265.75%) 102000.36 (-190.53%) Ameanp50-Allocation 76382.61 ( 0.00%)87603.20 (-14.69%) 87485.22 (-14.54%) Ameanp95-Allocation 12.39 ( 0.00%) 244491.38 (-91.34%) 204588.52 (-60.11%) Ameanp99-Allocation 187937.39 ( 0.00%) 1745237.33 (-828.63%) 631657.74 (-236.10%) Read latency is improved although write and allocation latency is impacted. Even with the patch, kswapd is still reclaiming inefficiently, pages are being written back from writeback context and a host of other issues. However, given the change, it needed to be spelled out why the side-effect was moved. Signed-off-by: Mel Gorman--- mm/vmscan.c | 20 +++- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 92fc66bd52bc..b47b430ca7ea 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3097,17 +3097,17 @@ static bool zone_balanced(struct zone *zone, int order, int classzone_idx) if (!zone_watermark_ok_safe(zone, order, mark, classzone_idx)) return false; - /* -* If any eligible zone is balanced then the node is not considered -* to be congested or dirty -*/ - clear_bit(PGDAT_CONGESTED, >zone_pgdat->flags); - clear_bit(PGDAT_DIRTY, >zone_pgdat->flags); - clear_bit(PGDAT_WRITEBACK, >zone_pgdat->flags); - return true; } +/* Clear pgdat state for congested, dirty or under writeback. */ +static void clear_pgdat_congested(pg_data_t *pgdat) +{ + clear_bit(PGDAT_CONGESTED, >flags); + clear_bit(PGDAT_DIRTY, >flags); + clear_bit(PGDAT_WRITEBACK, >flags); +} + /* * Prepare kswapd for sleeping. This verifies that there are no processes * waiting in throttle_direct_reclaim() and that watermarks have been met. @@ -3140,8 +3140,10 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, int classzone_idx) if (!managed_zone(zone)) continue; - if (zone_balanced(zone, order, classzone_idx)) + if (zone_balanced(zone, order, classzone_idx)) { + clear_pgdat_congested(pgdat); return true; + } } return false; -- 2.11.0
[PATCH 2/3] mm, vmscan: Only clear pgdat congested/dirty/writeback state when balanced
A pgdat tracks if recent reclaim encountered too many dirty, writeback or congested pages. The flags control whether kswapd writes pages back from reclaim context, tags pages for immediate reclaim when IO completes, whether processes block on wait_iff_congested and whether kswapd blocks when too many pages marked for immediate reclaim are encountered. The state is cleared in a check function with side-effects. With the patch "mm, vmscan: fix zone balance check in prepare_kswapd_sleep", the timing of when the bits get cleared changed. Due to the way the check works, it'll clear the bits if ZONE_DMA is balanced for a GFP_DMA allocation because it does not account for lowmem reserves properly. For the simoop workload, kswapd is not stalling when it should due to the premature clearing, writing pages from reclaim context like crazy and generally being unhelpful. This patch resets the pgdat bits related to page reclaim only when kswapd is going to sleep. The comparison with simoop is then 4.10.0-rc74.10.0-rc7 4.10.0-rc7 mmots-20170209 fixcheck-v1 clear-v1 Ameanp50-Read 22325202.49 ( 0.00%) 20026926.55 ( 10.29%) 19491134.58 ( 12.69%) Ameanp95-Read 26102988.80 ( 0.00%) 27023360.00 ( -3.53%) 24294195.20 ( 6.93%) Ameanp99-Read 30935176.53 ( 0.00%) 30994432.00 ( -0.19%) 30397053.16 ( 1.74%) Ameanp50-Write 976.44 ( 0.00%) 1905.28 (-95.12%) 1077.22 (-10.32%) Ameanp95-Write 15471.29 ( 0.00%)36210.09 (-134.05%) 36419.56 (-135.40%) Ameanp99-Write 35108.62 ( 0.00%) 479494.96 (-1265.75%) 102000.36 (-190.53%) Ameanp50-Allocation 76382.61 ( 0.00%)87603.20 (-14.69%) 87485.22 (-14.54%) Ameanp95-Allocation 12.39 ( 0.00%) 244491.38 (-91.34%) 204588.52 (-60.11%) Ameanp99-Allocation 187937.39 ( 0.00%) 1745237.33 (-828.63%) 631657.74 (-236.10%) Read latency is improved although write and allocation latency is impacted. Even with the patch, kswapd is still reclaiming inefficiently, pages are being written back from writeback context and a host of other issues. However, given the change, it needed to be spelled out why the side-effect was moved. Signed-off-by: Mel Gorman --- mm/vmscan.c | 20 +++- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 92fc66bd52bc..b47b430ca7ea 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3097,17 +3097,17 @@ static bool zone_balanced(struct zone *zone, int order, int classzone_idx) if (!zone_watermark_ok_safe(zone, order, mark, classzone_idx)) return false; - /* -* If any eligible zone is balanced then the node is not considered -* to be congested or dirty -*/ - clear_bit(PGDAT_CONGESTED, >zone_pgdat->flags); - clear_bit(PGDAT_DIRTY, >zone_pgdat->flags); - clear_bit(PGDAT_WRITEBACK, >zone_pgdat->flags); - return true; } +/* Clear pgdat state for congested, dirty or under writeback. */ +static void clear_pgdat_congested(pg_data_t *pgdat) +{ + clear_bit(PGDAT_CONGESTED, >flags); + clear_bit(PGDAT_DIRTY, >flags); + clear_bit(PGDAT_WRITEBACK, >flags); +} + /* * Prepare kswapd for sleeping. This verifies that there are no processes * waiting in throttle_direct_reclaim() and that watermarks have been met. @@ -3140,8 +3140,10 @@ static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order, int classzone_idx) if (!managed_zone(zone)) continue; - if (zone_balanced(zone, order, classzone_idx)) + if (zone_balanced(zone, order, classzone_idx)) { + clear_pgdat_congested(pgdat); return true; + } } return false; -- 2.11.0