[linux-yocto] [PATCH 2/2] nfsd: Only set PF_LESS_THROTTLE when really needed.

2014-08-18 Thread Yang Shi
From: NeilBrown commit 8658452e4a588da603f6cb5ee2615deafcd82b71 upstream PF_LESS_THROTTLE has a very specific use case: to avoid deadlocks and live-locks while writing to the page cache in a loop-back NFS mount situation. It therefore makes sense to *only* set PF_LESS_THROTTLE in this situation

[linux-yocto] [PATCH 0/2] Integrate loopback nfs mount kernel code from 3.16

2014-08-18 Thread Yang Shi
Integrate Neil Brown's nfs loopback mount patches from 3.16. 46dbf93 nfsd: Only set PF_LESS_THROTTLE when really needed. 309c169 SUNRPC: track whether a request is coming from a loop-back interface. -- ___ linux-yocto mailing list linux-yocto@yoctoproj

[linux-yocto] [PATCH 1/2] SUNRPC: track whether a request is coming from a loop-back interface.

2014-08-18 Thread Yang Shi
From: NeilBrown commit ef11ce24875a8a540adc185e7bce3d7d49c8296f upstream If an incoming NFS request is coming from the local host, then nfsd will need to perform some special handling. So detect that possibility and make the source visible in rq_local. Signed-off-by: NeilBrown Signed-off-by:

[linux-yocto] [PATCH 27/28] sched/numa: Fix use of spin_{un}lock_irq() when interrupts are disabled

2014-08-18 Thread Yang Shi
From: Steven Rostedt commit e9dd685ce81815811fb4da72e6ab10a694ac8468 upstream As Peter Zijlstra told me, we have the following path: do_exit() exit_itimers() itimer_delete() spin_lock_irqsave(&timer->it_lock, &flags); timer_delete_hook(timer); kc->timer_del(timer) := p

[linux-yocto] [PATCH 26/28] arch/x86/mm/numa.c: use for_each_memblock()

2014-08-18 Thread Yang Shi
From: Emil Medve commit af4459d3636790735fccd83f0337c8380a0a4cc2 upstream Signed-off-by: Emil Medve Cc: Ingo Molnar Cc: Thomas Gleixner Cc: "H. Peter Anvin" Cc: Yinghai Lu Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds Signed-off-by: Yang Shi --- arch/x86/mm/numa.c | 6 +++--

[linux-yocto] [PATCH 28/28] numa, sched: fix load_to_imbalanced logic inversion

2014-08-18 Thread Yang Shi
From: Rik van Riel commit 1662867a9b2574bfdb9d4e97186aa131218d7210 upstream This function is supposed to return true if the new load imbalance is worse than the old one. It didn't. I can only hope brown paper bags are in style. Now things converge much better on both the 4 node and 8 node sys

[linux-yocto] [PATCH 25/28] mm: numa: add migrated transhuge pages to LRU the same way as base pages

2014-08-18 Thread Yang Shi
From: Mel Gorman commit 11de9927f9dd3cb0a0f18064fa4b6976fc37e79c upstream Migration of misplaced transhuge pages uses page_add_new_anon_rmap() when putting the page back as it avoided an atomic operations and added the new page to the correct LRU. A side-effect is that the page gets marked acti

[linux-yocto] [PATCH 19/28] sched/numa: Count pages on active node as local

2014-08-18 Thread Yang Shi
From: Rik van Riel commit 792568ec6a31ca560ca4d528782cbc6cd2cea8b0 upstream The NUMA code is smart enough to distribute the memory of workloads that span multiple NUMA nodes across those NUMA nodes. However, it still has a pretty high scan rate for such workloads, because any memory that is lef

[linux-yocto] [PATCH 24/28] sched/numa: Decay ->wakee_flips instead of zeroing

2014-08-18 Thread Yang Shi
From: Rik van Riel commit 096aa33863a5e48de52d2ff30e0801b7487944f4 upstream Affine wakeups have the potential to interfere with NUMA placement. If a task wakes up too many other tasks, affine wakeups will get disabled. However, regardless of how many other tasks it wakes up, it gets re-enabled

[linux-yocto] [PATCH 18/28] sched/numa: Initialize newidle balance stats in sd_numa_init()

2014-08-18 Thread Yang Shi
From: Jason Low commit 2b4cfe64dee0d84506b951d81bf55d9891744d25 upstream Also initialize the per-sd variables for newidle load balancing in sd_numa_init(). Signed-off-by: Jason Low Acked-by: morten.rasmus...@arm.com Cc: daniel.lezc...@linaro.org Cc: alex@linaro.org Cc: pre...@linux.vnet.ib

[linux-yocto] [PATCH 11/28] sched/numa: Move task_numa_free() to __put_task_struct()

2014-08-18 Thread Yang Shi
From: Mike Galbraith commit 156654f491dd8d52687a5fbe1637f472a52ce75b upstream Bad idea on -rt: [ 908.026136] [] rt_spin_lock_slowlock+0xaa/0x2c0 [ 908.026145] [] task_numa_free+0x31/0x130 [ 908.026151] [] finish_task_switch+0xce/0x100 [ 908.026156] [] thread_return+0x48/0x4ae [ 908.026

[linux-yocto] [PATCH 22/28] sched/numa: Allow task switch if load imbalance improves

2014-08-18 Thread Yang Shi
From: Rik van Riel commit e63da03639cc9e6e83b62e7ef8ffdbb92421416a upstream Currently the NUMA balancing code only allows moving tasks between NUMA nodes when the load on both nodes is in balance. This breaks down when the load was imbalanced to begin with. Allow tasks to be moved between NUMA

[linux-yocto] [PATCH 16/28] sched/numa: Fix task_numa_free() lockdep splat

2014-08-18 Thread Yang Shi
From: Mike Galbraith commit 60e69eed85bb7b5198ef70643b5895c26ad76ef7 upstream Sasha reported that lockdep claims that the following commit: made numa_group.lock interrupt unsafe: 156654f491dd ("sched/numa: Move task_numa_free() to __put_task_struct()") While I don't see how that could be, gi

[linux-yocto] [PATCH 15/28] numa: use LAST_CPUPID_SHIFT to calculate LAST_CPUPID_MASK

2014-08-18 Thread Yang Shi
From: Srikar Dronamraju commit 834a964a098e7726fc296d7cd8f65ed3eeedd412 upstream LAST_CPUPID_MASK is calculated using LAST_CPUPID_WIDTH. However LAST_CPUPID_WIDTH itself can be 0. (when LAST_CPUPID_NOT_IN_PAGE_FLAGS is set). In such a case LAST_CPUPID_MASK turns out to be 0. But with recent

[linux-yocto] [PATCH 23/28] sched/numa: Update migrate_improves/degrades_locality()

2014-08-18 Thread Yang Shi
From: Rik van Riel commit b1ad065e65f56103db8b97edbd218a271ff5b1bb upstream Update the migrate_improves/degrades_locality() functions with knowledge of pseudo-interleaving. Do not consider moving tasks around within the set of group's active nodes as improving or degrading locality. Instead, le

[linux-yocto] [PATCH 21/28] sched/numa: Do not set preferred_node on migration to a second choice node

2014-08-18 Thread Yang Shi
From: Rik van Riel commit 68d1b02a58f5d9f584c1fb2923ed60ec68cbbd9b upstream Setting the numa_preferred_node for a task in task_numa_migrate does nothing on a 2-node system. Either we migrate to the node that already was our preferred node, or we stay where we were. On a 4-node system, it can sl

[linux-yocto] [PATCH 17/28] Documentation/vm/numa_memory_policy.txt: fix wrong document in numa_memory_policy.txt

2014-08-18 Thread Yang Shi
From: Tang Chen commit 8f28ed92d9314b98dc2033df770f5e6b85c5ffb7 upstream In document numa_memory_policy.txt, the following examples for flag MPOL_F_RELATIVE_NODES are incorrect. For example, consider a task that is attached to a cpuset with mems 2-5 that sets an Interleave polic

[linux-yocto] [PATCH 20/28] sched/numa: Retry placement more frequently when misplaced

2014-08-18 Thread Yang Shi
From: Rik van Riel commit 5085e2a328849bdee6650b32d52c87c3788ab01c upstream When tasks have not converged on their preferred nodes yet, we want to retry fairly often, to make sure we do not migrate a task's memory to an undesirable location, only to have to move it again later. This patch reduc

[linux-yocto] [PATCH 13/28] mm: move mmu notifier call from change_protection to change_pmd_range

2014-08-18 Thread Yang Shi
From: Rik van Riel commit a5338093bfb462256f70f3450c08f73e59543e26 upstream The NUMA scanning code can end up iterating over many gigabytes of unpopulated memory, especially in the case of a freshly started KVM guest with lots of memory. This results in the mmu notifier code being called even w

[linux-yocto] [PATCH 12/28] mm, numa: reorganize change_pmd_range()

2014-08-18 Thread Yang Shi
From: Rik van Riel commit 88a9ab6e3dfb5b10168130c255c6102c925343ab upstream Reorganize the order of ifs in change_pmd_range a little, in preparation for the next patch. [a...@linux-foundation.org: fix indenting, per David] Signed-off-by: Rik van Riel Cc: Peter Zijlstra Cc: Andrea Arcangeli R

[linux-yocto] [PATCH 10/28] numa: fix NULL pointer access and memory leak in unregister_one_node()

2014-08-18 Thread Yang Shi
From: Xishi Qiu commit 92d585ef067da7a966d6ce78c601bd1562b62619 upstream When doing socket hot remove, "node_devices[nid]" is set to NULL; acpi_processor_remove() try_offline_node() unregister_one_node() Then hot add a socket, but do not echo 1 > /sys/devices/system/cpu/

[linux-yocto] [PATCH 03/28] sched/numa: Track from which nodes NUMA faults are triggered

2014-08-18 Thread Yang Shi
From: Rik van Riel commit 50ec8a401fed6d246ab65e6011d61ac91c34af70 upstream Track which nodes NUMA faults are triggered from, in other words the CPUs on which the NUMA faults happened. This uses a similar mechanism to what is used to track the memory involved in numa faults. The next patches us

[linux-yocto] [PATCH 05/28] sched/numa, mm: Use active_nodes nodemask to limit numa migrations

2014-08-18 Thread Yang Shi
From: Rik van Riel commit 10f39042711ba21773763f267b4943a2c66c8bef upstream Use the active_nodes nodemask to make smarter decisions on NUMA migrations. In order to maximize performance of workloads that do not fit in one NUMA node, we want to satisfy the following criteria: 1) keep private m

[linux-yocto] [PATCH 07/28] sched/numa: Do statistics calculation using local variables only

2014-08-18 Thread Yang Shi
From: Rik van Riel commit 35664fd41e1c8cc4f0b89f6a51db5af39ba50640 upstream The current code in task_numa_placement calculates the difference between the old and the new value, but also temporarily stores half of the old value in the per-process variables. The NUMA balancing code looks at those

[linux-yocto] [PATCH 06/28] sched/numa: Normalize faults_cpu stats and weigh by CPU use

2014-08-18 Thread Yang Shi
From: Rik van Riel commit 7e2703e6099609adc93679c4d45cd6247f565971 upstream Tracing the code that decides the active nodes has made it abundantly clear that the naive implementation of the faults_from code has issues. Specifically, the garbage collector in some workloads will access orders of m

[linux-yocto] [PATCH 14/28] mm: numa: recheck for transhuge pages under lock during protection changes

2014-08-18 Thread Yang Shi
From: Mel Gorman commit 1ad9f620c3a22fa800489455ce517c29e576934e upstream Sasha reported the following bug using trinity kernel BUG at mm/mprotect.c:149! invalid opcode: [#1] PREEMPT SMP DEBUG_PAGEALLOC Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 20

[linux-yocto] [PATCH 01/28] sched/numa, mm: Remove p->numa_migrate_deferred

2014-08-18 Thread Yang Shi
From: Rik van Riel commit 52bf84aa206cd2c2516dfa3e03b578edf8a3242f upstream Excessive migration of pages can hurt the performance of workloads that span multiple NUMA nodes. However, it turns out that the p->numa_migrate_deferred knob is a really big hammer, which does reduce migration rates, b

[linux-yocto] [PATCH 09/28] sched/numa: Turn some magic numbers into #defines

2014-08-18 Thread Yang Shi
From: Rik van Riel commit be1e4e760d940c14d119bffef5eb007dfdf29046 upstream Cleanup suggested by Mel Gorman. Now the code contains some more hints on what statistics go where. Suggested-by: Mel Gorman Signed-off-by: Rik van Riel Acked-by: Mel Gorman Signed-off-by: Peter Zijlstra Cc: Chegu V

[linux-yocto] [PATCH 04/28] sched/numa: Build per numa_group active node mask from numa_faults_cpu statistics

2014-08-18 Thread Yang Shi
From: Rik van Riel commit 20e07dea286a90f096a779706861472d296397c6 upstream The numa_faults_cpu statistics are used to maintain an active_nodes nodemask per numa_group. This allows us to be smarter about when to do numa migrations. Signed-off-by: Rik van Riel Acked-by: Mel Gorman Signed-off-b

[linux-yocto] [PATCH 02/28] sched/numa: Rename p->numa_faults to numa_faults_memory

2014-08-18 Thread Yang Shi
From: Rik van Riel commit ff1df896aef8e0ec1556a5c44f424bd45bfa2cbe upstream In order to get a more consistent naming scheme, making it clear which fault statistics track memory locality, and which track CPU locality, rename the memory fault statistics. Suggested-by: Mel Gorman Signed-off-by: R

[linux-yocto] [PATCH 08/28] sched/numa: Rename variables in task_numa_fault()

2014-08-18 Thread Yang Shi
From: Rik van Riel commit 58b46da336a9312b2e21bb576d1c2c484dbf6257 upstream We track both the node of the memory after a NUMA fault, and the node of the CPU on which the fault happened. Rename the local variables in task_numa_fault to make things more explicit. Suggested-by: Mel Gorman Signed-

[linux-yocto] [PATCH 0/28] Refresh NUMA kernel code

2014-08-18 Thread Yang Shi
Refresh kernel NUMA up to 3.16. Primarily merged: numa,sched,mm: pseudo-interleaving for automatic NUMA balancing https://lkml.org/lkml/2014/1/27/459 patch 1 - 9 fix numa vs kvm scalability issue https://lkml.org/lkml/2014/2/18/677 patch 12/13 sched,numa: reduce page migrations with pseudo-int