On 2016/03/16 18:23, Zhao Lei wrote:
> We discussed patch titled:
> [PATCH] Make core_pattern support namespace
> before.
>
> Above patch can solve half problem of custom core_dump pattern
> in container, but there are also another problem that limit
> custom core_pattern in container, it is the
On 2016/02/19 14:37, Ian Kent wrote:
On Fri, 2016-02-19 at 12:08 +0900, Kamezawa Hiroyuki wrote:
On 2016/02/19 5:45, Eric W. Biederman wrote:
Personally I am a fan of the don't be clever and capture a kernel
thread
approach as it is very easy to see you what if any exploitation
opportun
On 2016/02/19 5:45, Eric W. Biederman wrote:
> Personally I am a fan of the don't be clever and capture a kernel thread
> approach as it is very easy to see you what if any exploitation
> opportunities there are. The justifications for something more clever
> is trickier. Of course we do somethi
On 2016/02/18 11:57, Eric W. Biederman wrote:
>
> Ccing The containers list because a related discussion is happening there
> and somehow this thread has never made it there.
>
> Ian Kent writes:
>
>> On Mon, 2013-11-18 at 18:28 +0100, Oleg Nesterov wrote:
>>> On 11/15, Eric W. Biederman wrote:
On 2015/12/22 6:52, Eric W. Biederman wrote:
> Dongsheng Yang writes:
>
>> On 12/20/2015 05:47 PM, Eric W. Biederman wrote:
>>> Dongsheng Yang writes:
>>>
On 12/20/2015 10:37 AM, Al Viro wrote:
> On Sun, Dec 20, 2015 at 10:14:29AM +0800, Dongsheng Yang wrote:
>> On 12/17/2015 07:23
On 2015/12/20 18:47, Eric W. Biederman wrote:
> Dongsheng Yang writes:
>
>> On 12/20/2015 10:37 AM, Al Viro wrote:
>>> On Sun, Dec 20, 2015 at 10:14:29AM +0800, Dongsheng Yang wrote:
On 12/17/2015 07:23 PM, Dongsheng Yang wrote:
> Hi guys,
> We are working on making core dump b
On 2015/12/17 21:30, Vladimir Davydov wrote:
> The rationale of separate swap counter is given by Johannes Weiner.
>
> Signed-off-by: Vladimir Davydov
> ---
> Changes in v2:
> - Add rationale of separate swap counter provided by Johannes.
>
> Documentation/cgroup.txt | 33 +++
On 2015/12/18 3:43, Luck, Tony wrote:
As Tony requested, we may need a knob to stop a fallback in "movable->normal",
later.
If the mirrored memory is small and the other is large,
I think we can both enable "non-mirrored -> normal" and "normal ->
non-mirrored".
Size of mirrored memory can
On 2015/12/17 13:48, Xishi Qiu wrote:
> On 2015/12/17 10:53, Kamezawa Hiroyuki wrote:
>
>> On 2015/12/17 11:47, Xishi Qiu wrote:
>>> On 2015/12/17 9:38, Izumi, Taku wrote:
>>>
>>>> Dear Xishi,
>>>>
>>>>Sorry for late.
>>
On 2015/12/17 12:32, Johannes Weiner wrote:
On Thu, Dec 17, 2015 at 11:46:27AM +0900, Kamezawa Hiroyuki wrote:
On 2015/12/16 20:09, Johannes Weiner wrote:
On Wed, Dec 16, 2015 at 12:18:30PM +0900, Kamezawa Hiroyuki wrote:
- swap-full notification via vmpressure or something mechanism.
Why
6:44 PM
>>> To: Izumi, Taku/泉 拓
>>> Cc: Luck, Tony; linux-kernel@vger.kernel.org; linux...@kvack.org;
>>> a...@linux-foundation.org; Kamezawa, Hiroyuki/亀澤 寛
>>> 之; m...@csn.ul.ie; Hansen, Dave; m...@codeblueprint.co.uk
>>> Subject: Re: [PATCH v3 2/2] mm:
On 2015/12/16 20:09, Johannes Weiner wrote:
On Wed, Dec 16, 2015 at 12:18:30PM +0900, Kamezawa Hiroyuki wrote:
Hmm, my requests are
- set the same capabilities as mlock() to set swap.limit=0
Setting swap.max is already privileged operation.
Sure.
- swap-full notification via
On 2015/12/16 2:21, Michal Hocko wrote:
I completely agree that malicious/untrusted users absolutely have to
be capped by the hard limit. Then the separate swap limit would work
for sure. But I am less convinced about usefulness of the rigid (to
the global memory pressure) swap limit without th
On 2015/12/15 23:50, Johannes Weiner wrote:
On Tue, Dec 15, 2015 at 12:22:41PM +0900, Kamezawa Hiroyuki wrote:
On 2015/12/15 4:42, Vladimir Davydov wrote:
Anyway, if you don't trust a container you'd better set the hard memory
limit so that it can't hurt others no matter what
On 2015/12/15 20:02, Vladimir Davydov wrote:
On Tue, Dec 15, 2015 at 12:22:41PM +0900, Kamezawa Hiroyuki wrote:
On 2015/12/15 4:42, Vladimir Davydov wrote:
On Mon, Dec 14, 2015 at 04:30:37PM +0100, Michal Hocko wrote:
On Thu 10-12-15 14:39:14, Vladimir Davydov wrote:
In the legacy hierarchy
On 2015/12/15 17:30, Vladimir Davydov wrote:
On Tue, Dec 15, 2015 at 12:12:40PM +0900, Kamezawa Hiroyuki wrote:
On 2015/12/15 0:30, Michal Hocko wrote:
On Thu 10-12-15 14:39:14, Vladimir Davydov wrote:
In the legacy hierarchy we charge memsw, which is dubious, because:
- memsw.limit must
On 2015/12/15 0:30, Michal Hocko wrote:
On Thu 10-12-15 14:39:14, Vladimir Davydov wrote:
In the legacy hierarchy we charge memsw, which is dubious, because:
- memsw.limit must be >= memory.limit, so it is impossible to limit
swap usage less than memory usage. Taking into account the fact
On 2015/12/15 4:42, Vladimir Davydov wrote:
On Mon, Dec 14, 2015 at 04:30:37PM +0100, Michal Hocko wrote:
On Thu 10-12-15 14:39:14, Vladimir Davydov wrote:
In the legacy hierarchy we charge memsw, which is dubious, because:
- memsw.limit must be >= memory.limit, so it is impossible to limit
On 2015/12/10 20:39, Vladimir Davydov wrote:
> In the legacy hierarchy we charge memsw, which is dubious, because:
>
> - memsw.limit must be >= memory.limit, so it is impossible to limit
> swap usage less than memory usage. Taking into account the fact that
> the primary limiting mechani
On 2015/10/31 4:42, Luck, Tony wrote:
If each memory controller has the same distance/latency, you (your firmware)
don't need
to allocate reliable memory per each memory controller.
If distance is problem, another node should be allocated.
...is the behavior(splitting zone) really required ?
On 2015/10/30 17:23, Michal Hocko wrote:
On Fri 30-10-15 14:23:59, KAMEZAWA Hiroyuki wrote:
On 2015/10/30 0:17, mho...@kernel.org wrote:
[...]
@@ -3135,13 +3145,56 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int
order,
if (gfp_mask & __GFP_NORETRY)
goto nor
On 2015/10/23 10:44, Luck, Tony wrote:
> First part of each memory controller. I have two memory controllers on each
> node
>
If each memory controller has the same distance/latency, you (your firmware)
don't need
to allocate reliable memory per each memory controller.
If distance is problem, a
On 2015/10/30 0:17, mho...@kernel.org wrote:
> From: Michal Hocko
>
> wait_iff_congested has been used to throttle allocator before it retried
> another round of direct reclaim to allow the writeback to make some
> progress and prevent reclaim from looping over dirty/writeback pages
> without mak
On 2015/10/30 0:17, mho...@kernel.org wrote:
> From: Michal Hocko
>
> __alloc_pages_slowpath has traditionally relied on the direct reclaim
> and did_some_progress as an indicator that it makes sense to retry
> allocation rather than declaring OOM. shrink_zones had to rely on
> zone_reclaimable i
On 2015/10/22 3:17, Luck, Tony wrote:
+ if (reliable_kernelcore) {
+ for_each_memblock(memory, r) {
+ if (memblock_is_mirror(r))
+ continue;
Should we have a safety check here that there is some mirrored memory? If you
giv
On 2015/10/09 19:36, Xishi Qiu wrote:
On 2015/10/9 17:24, Kamezawa Hiroyuki wrote:
On 2015/10/09 15:46, Xishi Qiu wrote:
On 2015/10/9 22:56, Taku Izumi wrote:
Xeon E7 v3 based systems supports Address Range Mirroring
and UEFI BIOS complied with UEFI spec 2.5 can notify which
ranges are
On 2015/10/09 15:46, Xishi Qiu wrote:
On 2015/10/9 22:56, Taku Izumi wrote:
Xeon E7 v3 based systems supports Address Range Mirroring
and UEFI BIOS complied with UEFI spec 2.5 can notify which
ranges are reliable (mirrored) via EFI memory map.
Now Linux kernel utilize its information and alloca
On 2015/10/09 14:52, Jiang Liu wrote:
On 2015/10/9 4:20, Andrew Morton wrote:
On Wed, 19 Aug 2015 17:18:15 -0700 (PDT) David Rientjes
wrote:
On Wed, 19 Aug 2015, Patil, Kiran wrote:
Acked-by: Kiran Patil
Where's the call to preempt_disable() to prevent kernels with preemption
from makin
On 2015/09/22 15:23, Ingo Molnar wrote:
> So when memory hotplug removes a piece of physical memory from pagetable
> mappings, it also frees the underlying PGD entry.
>
> This complicates PGD management, so don't do this. We can keep the
> PGD mapped and the PUD table all clear - it's only a singl
On 2015/08/25 8:15, Paul Turner wrote:
On Mon, Aug 24, 2015 at 3:49 PM, Tejun Heo wrote:
Hello,
On Mon, Aug 24, 2015 at 03:03:05PM -0700, Paul Turner wrote:
Hmm... I was hoping for an actual configurations and usage scenarios.
Preferably something people can set up and play with.
This is mu
On 2015/08/19 5:31, Tejun Heo wrote:
Hello, Paul.
On Mon, Aug 17, 2015 at 09:03:30PM -0700, Paul Turner wrote:
2) Control within an address-space. For subsystems with fungible resources,
e.g. CPU, it can be useful for an address space to partition its own
threads. Losing the capability to do
On 2015/08/10 17:14, Vladimir Davydov wrote:
On Sun, Aug 09, 2015 at 11:12:25PM +0900, Kamezawa Hiroyuki wrote:
On 2015/08/08 22:05, Vladimir Davydov wrote:
On Fri, Aug 07, 2015 at 10:38:16AM +0900, Kamezawa Hiroyuki wrote:
...
All ? hmm. It seems that mixture of record of global memory
On 2015/08/08 22:05, Vladimir Davydov wrote:
On Fri, Aug 07, 2015 at 10:38:16AM +0900, Kamezawa Hiroyuki wrote:
On 2015/08/06 17:59, Vladimir Davydov wrote:
On Wed, Aug 05, 2015 at 10:34:58AM +0900, Kamezawa Hiroyuki wrote:
I wonder, rather than collecting more data, rough calculation can
On 2015/08/06 17:59, Vladimir Davydov wrote:
On Wed, Aug 05, 2015 at 10:34:58AM +0900, Kamezawa Hiroyuki wrote:
Reading discussion, I feel storing more data is difficult, too.
Yep, even with the current 16-bit memcg id. Things would get even worse
if we wanted to extend it one day (will we
On 2015/08/05 16:47, Michal Hocko wrote:
On Wed 05-08-15 09:39:40, KAMEZAWA Hiroyuki wrote:
[...]
so, for memory controller, we'll have
We currently have only current, low, high, max and events currently.
All other knobs are either deprecated or waiting for a usecase to emerge
before the
On 2015/08/03 21:04, Vladimir Davydov wrote:
> Hi,
>
> Currently, workingset detection logic is not memcg aware - inactive_age
> is maintained per zone. As a result, if memory cgroups are used,
> refaulted file pages are activated randomly. This patch set makes
> inactive_age per lruvec so that wo
On 2015/08/05 4:31, Tejun Heo wrote:
From 6abc8ca19df0078de17dc38340db3002ed489ce7 Mon Sep 17 00:00:00 2001
From: Tejun Heo
Date: Tue, 4 Aug 2015 15:20:55 -0400
Traditionally, each cgroup controller implemented whatever interface
it wanted leading to interfaces which are widely inconsistent.
E
On 2015/06/30 11:45, Xishi Qiu wrote:
On 2015/6/29 15:32, Kamezawa Hiroyuki wrote:
On 2015/06/27 11:24, Xishi Qiu wrote:
This patch introduces a new migratetype called "MIGRATE_MIRROR", it is used to
allocate mirrored pages.
When cat /proc/pagetypeinfo, you can see the count of fre
On 2015/06/30 10:31, Xishi Qiu wrote:
On 2015/6/30 9:01, Kamezawa Hiroyuki wrote:
On 2015/06/30 8:11, Luck, Tony wrote:
@@ -814,7 +814,7 @@ int __init_memblock memblock_clear_hotplug(phys_addr_t
base, phys_addr_t size)
*/
int __init_memblock memblock_mark_mirror(phys_addr_t base
On 2015/06/30 8:11, Luck, Tony wrote:
@@ -814,7 +814,7 @@ int __init_memblock memblock_clear_hotplug(phys_addr_t
base, phys_addr_t size)
*/
int __init_memblock memblock_mark_mirror(phys_addr_t base, phys_addr_t size)
{
- system_has_some_mirror = true;
+ static_key_slow_inc(&sy
On 2015/06/27 11:25, Xishi Qiu wrote:
Before free bootmem, set mirrored pageblock's migratetype to MIGRATE_MIRROR, so
they could free to buddy system's MIGRATE_MIRROR list.
When set reserved memory, skip the mirrored memory.
Signed-off-by: Xishi Qiu
---
include/linux/memblock.h | 3 +++
mm/
On 2015/06/27 11:24, Xishi Qiu wrote:
This patch introduces a new migratetype called "MIGRATE_MIRROR", it is used to
allocate mirrored pages.
When cat /proc/pagetypeinfo, you can see the count of free mirrored blocks.
Signed-off-by: Xishi Qiu
My fear about this approarch is that this may brea
xample)
Thanks,
-Kame
>From 88213b0f76e2f603c5a38690cbd85a4df1e646ba Mon Sep 17 00:00:00 2001
From: KAMEZAWA Hiroyuki
Date: Mon, 29 Jun 2015 15:35:47 +0900
Subject: [PATCH] add a new config option for memory mirror
Add a new config option "CONFIG_MEMORY_MIRROR" for kernel assisted
memory mirroring.
In UEFI2.5 spec, Addr
On 2015/06/26 10:43, Xishi Qiu wrote:
On 2015/6/26 7:54, Kamezawa Hiroyuki wrote:
On 2015/06/25 18:44, Xishi Qiu wrote:
On 2015/6/10 11:06, Kamezawa Hiroyuki wrote:
On 2015/06/09 19:04, Xishi Qiu wrote:
On 2015/6/9 15:12, Kamezawa Hiroyuki wrote:
On 2015/06/04 22:04, Xishi Qiu wrote
On 2015/06/25 18:44, Xishi Qiu wrote:
On 2015/6/10 11:06, Kamezawa Hiroyuki wrote:
On 2015/06/09 19:04, Xishi Qiu wrote:
On 2015/6/9 15:12, Kamezawa Hiroyuki wrote:
On 2015/06/04 22:04, Xishi Qiu wrote:
Add the buddy system interface for address range mirroring feature.
Allocate mirrored
On 2015/06/16 2:20, Luck, Tony wrote:
On Mon, Jun 15, 2015 at 05:47:27PM +0900, Kamezawa Hiroyuki wrote:
So, there are 3 ideas.
(1) kernel only from MIRROR / user only from MOVABLE (Tony)
(2) kernel only from MIRROR / user from MOVABLE + MIRROR(ASAP) (AKPM
suggested)
This makes use
On 2015/06/11 5:40, Luck, Tony wrote:
I guess, mirrored memory should be allocated if !__GFP_HIGHMEM or !__GFP_MOVABLE
HIGHMEM shouldn't matter - partial memory mirror only makes any sense on X86_64
systems ... 32-bit kernels
don't even boot on systems with 64GB, and the minimum rational confi
On 2015/06/09 19:09, Xishi Qiu wrote:
On 2015/6/9 15:06, Kamezawa Hiroyuki wrote:
On 2015/06/04 22:02, Xishi Qiu wrote:
Add a new interface in path /proc/sys/vm/mirrorable. When set to 1, it means
we should allocate mirrored memory for both user and kernel processes.
Signed-off-by: Xishi Qiu
On 2015/06/09 19:10, Xishi Qiu wrote:
On 2015/6/9 14:44, Kamezawa Hiroyuki wrote:
On 2015/06/04 21:56, Xishi Qiu wrote:
This patch introduces a new config called "CONFIG_ACPI_MIRROR_MEMORY", it is
used to on/off the feature.
Signed-off-by: Xishi Qiu
---
mm/Kconfig | 8
On 2015/06/09 19:04, Xishi Qiu wrote:
On 2015/6/9 15:12, Kamezawa Hiroyuki wrote:
On 2015/06/04 22:04, Xishi Qiu wrote:
Add the buddy system interface for address range mirroring feature.
Allocate mirrored pages in MIGRATE_MIRROR list. If there is no mirrored pages
left, use other types pages
On 2015/06/04 22:04, Xishi Qiu wrote:
Add the buddy system interface for address range mirroring feature.
Allocate mirrored pages in MIGRATE_MIRROR list. If there is no mirrored pages
left, use other types pages.
Signed-off-by: Xishi Qiu
---
mm/page_alloc.c | 40 ++
On 2015/06/04 22:02, Xishi Qiu wrote:
Add a new interface in path /proc/sys/vm/mirrorable. When set to 1, it means
we should allocate mirrored memory for both user and kernel processes.
Signed-off-by: Xishi Qiu
I can't see why do we need this switch. If this is set, all GFP_HIGHUSER will
use
On 2015/06/04 22:02, Xishi Qiu wrote:
This patch introduces a new gfp flag called "__GFP_MIRROR", it is used to
allocate mirrored pages through buddy system.
Signed-off-by: Xishi Qiu
In Tony's original proposal, the motivation was to mirror all kernel memory.
Is the purpose of this patch mak
On 2015/06/04 21:57, Xishi Qiu wrote:
This patch introduces a new struct called "mirror_info", it is used to storage
the mirror address range which reported by EFI or ACPI.
TBD: call add_mirror_info() to fill it.
Signed-off-by: Xishi Qiu
---
arch/x86/mm/numa.c | 3 +++
include/linux/mm
On 2015/06/04 21:56, Xishi Qiu wrote:
This patch introduces a new config called "CONFIG_ACPI_MIRROR_MEMORY", it is
used to on/off the feature.
Signed-off-by: Xishi Qiu
---
mm/Kconfig | 8
1 file changed, 8 insertions(+)
diff --git a/mm/Kconfig b/mm/Kconfig
index 390214d..4f2a726 10
On 2015/06/04 21:58, Xishi Qiu wrote:
This patch introduces a new MIGRATE_TYPES called "MIGRATE_MIRROR", it is used
to storage the mirrored pages list.
When cat /proc/pagetypeinfo, you can see the count of free mirrored blocks.
I guess you need to add Mel to CC.
e.g.
euler-linux:~ # cat /pro
On 2015/04/25 5:01, Andrew Morton wrote:
On Fri, 24 Apr 2015 17:58:33 +0800 Gu Zheng wrote:
Since the change to the cpu <--> mapping (map the cpu to the physical
node for all possible at the boot), the node of cpu may be not present,
so we use the best near online node if the node is not prese
On 2015/04/02 10:36, Gu Zheng wrote:
Hi Kame, TJ,
On 04/01/2015 04:30 PM, Kamezawa Hiroyuki wrote:
On 2015/04/01 12:02, Tejun Heo wrote:
On Wed, Apr 01, 2015 at 11:55:11AM +0900, Kamezawa Hiroyuki wrote:
Now, hot-added cpus will have the lowest free cpu id.
Because of this, in most of
On 2015/04/01 12:02, Tejun Heo wrote:
On Wed, Apr 01, 2015 at 11:55:11AM +0900, Kamezawa Hiroyuki wrote:
Now, hot-added cpus will have the lowest free cpu id.
Because of this, in most of systems which has only cpu-hot-add, cpu-ids are
always
contiguous even after cpu hot add.
In enterprise
On 2015/03/30 18:58, Gu Zheng wrote:
> Hi Kame-san,
>
> On 03/27/2015 12:31 AM, Kamezawa Hiroyuki wrote:
>
>> On 2015/03/26 13:55, Gu Zheng wrote:
>>> Hi Kame-san,
>>> On 03/26/2015 11:19 AM, Kamezawa Hiroyuki wrote:
>>>
>>>> On 2015/03
On 2015/04/01 0:28, Tejun Heo wrote:
Hello, Kamezawa.
On Tue, Mar 31, 2015 at 03:09:05PM +0900, Kamezawa Hiroyuki wrote:
But this may be considered as API change for most hot-add users.
Hmm... Why would it be? What can that possibly break?
Now, hot-added cpus will have the lowest free
On 2015/03/30 18:49, Gu Zheng wrote:
Hi Kame-san,
On 03/27/2015 12:42 AM, Kamezawa Hiroyuki wrote:
On 2015/03/27 0:18, Tejun Heo wrote:
Hello,
On Thu, Mar 26, 2015 at 01:04:00PM +0800, Gu Zheng wrote:
wq generates the numa affinity (pool->node) for all the possible cpu's
per cpu w
On 2015/03/27 0:18, Tejun Heo wrote:
Hello,
On Thu, Mar 26, 2015 at 01:04:00PM +0800, Gu Zheng wrote:
wq generates the numa affinity (pool->node) for all the possible cpu's
per cpu workqueue at init stage, that means the affinity of currently un-present
ones' may be incorrect, so we need to upd
On 2015/03/26 13:55, Gu Zheng wrote:
> Hi Kame-san,
> On 03/26/2015 11:19 AM, Kamezawa Hiroyuki wrote:
>
>> On 2015/03/26 11:17, Gu Zheng wrote:
>>> Previously, we build the apicid <--> cpuid mapping when the cpu is present,
>>> but
>>> the rela
size: 192, default
> order:
> 1, min order: 0
>node 0: slabs: 6172, objs: 259224, free: 245741
>node 1: slabs: 3261, objs: 136962, free: 127656
>==
> So here we build the persistent [lapic id] <--> cpuid mapping when the cpu
> first
> present, an
On 2015/03/26 11:17, Gu Zheng wrote:
> Yasuaki Ishimatsu found that with node online/offline, cpu<->node
> relationship is established. Because workqueue uses a info which was
> established at boot time, but it may be changed by node hotpluging.
>
> Once pool->node points to a stale node, followin
On 2015/03/05 10:23, Gu Zheng wrote:
Hi Kamazawa-san,
On 03/04/2015 01:45 PM, Kamezawa Hiroyuki wrote:
On 2015/03/03 22:18, Tejun Heo wrote:
Hello, Kame.
On Tue, Mar 03, 2015 at 03:53:46PM +0900, Kamezawa Hiroyuki wrote:
relationship between proximity domain and lapic id doesn't c
On 2015/03/04 17:03, Xishi Qiu wrote:
On 2015/3/4 11:56, Gu Zheng wrote:
Hi Xishi,
On 03/04/2015 10:52 AM, Xishi Qiu wrote:
On 2015/3/4 10:22, Xishi Qiu wrote:
On 2015/3/3 18:20, Gu Zheng wrote:
Hi Xishi,
On 03/03/2015 11:30 AM, Xishi Qiu wrote:
When hot-remove a numa node, we will clea
On 2015/03/03 22:18, Tejun Heo wrote:
Hello, Kame.
On Tue, Mar 03, 2015 at 03:53:46PM +0900, Kamezawa Hiroyuki wrote:
relationship between proximity domain and lapic id doesn't change.
relationship between lapic-id and cpu-id changes.
pxm <-> memory address : no change
pxm
On 2015/03/03 1:28, Tejun Heo wrote:
Hello,
On Mon, Mar 02, 2015 at 05:41:05PM +0900, Kamezawa Hiroyuki wrote:
Let me start from explaining current behavior.
- cpu-id is determined when a new processor(lapicid/x2apicid) is founded.
cpu-id<->nodeid relationship is _not_ recorded.
I
On 2015/02/27 20:54, Tejun Heo wrote:
Hello,
On Fri, Feb 27, 2015 at 06:04:52PM +0800, Gu Zheng wrote:
Yasuaki Ishimatsu found that with node online/offline, cpu<->node
relationship is established. Because workqueue uses a info which was
established at boot time, but it may be changed by node h
(2014/12/17 12:22), Kamezawa Hiroyuki wrote:
(2014/12/17 10:36), Lai Jiangshan wrote:
On 12/17/2014 12:45 AM, Kamezawa Hiroyuki wrote:
With node online/offline, cpu<->node relationship is established.
Workqueue uses a info which was established at boot time but
it may be changed b
(2014/12/17 10:36), Lai Jiangshan wrote:
On 12/17/2014 12:45 AM, Kamezawa Hiroyuki wrote:
With node online/offline, cpu<->node relationship is established.
Workqueue uses a info which was established at boot time but
it may be changed by node hotpluging.
Once pool->node points to a s
node hotplug, this case should be handled.
Signed-off-by: KAMEZAWA Hiroyuki
---
kernel/workqueue.c | 23 +++
1 file changed, 23 insertions(+)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index f6ad05a..59d8be5 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqu
per cpu pools.
- clear per-cpu-pool's pool->node at node offlining.
- set per-cpu-pool's pool->node at node onlining.
- dropped modification to get_unbound_pool()
- dropped per-cpu-pool handling at cpu online/offline.
Reported-by: Yasuaki Ishimatsu
Signed-off-by: KAMEZAWA Hiroyuk
This is v4. Thank you for hints/commentes to previous versions.
I think this versions only contains necessary things and not invasive.
Tested several patterns of node hotplug and seems to work well.
Changes since v3
- removed changes against get_unbound_pool()
- remvoed codes in cpu offline eve
(2014/12/16 17:10), Kamezawa Hiroyuki wrote:
(2014/12/16 16:49), Lai Jiangshan wrote:
On 12/15/2014 07:18 PM, Kamezawa Hiroyuki wrote:
Workqueue keeps cpu<->node relationship including all possible cpus.
The original information was made at boot but it may change when
a new node is
(2014/12/16 16:49), Lai Jiangshan wrote:
On 12/15/2014 07:18 PM, Kamezawa Hiroyuki wrote:
Workqueue keeps cpu<->node relationship including all possible cpus.
The original information was made at boot but it may change when
a new node is added.
Update information if a new node is read
(2014/12/16 14:30), Lai Jiangshan wrote:
On 12/15/2014 07:14 PM, Kamezawa Hiroyuki wrote:
Unbound wq pool's node attribute is calculated at its allocation.
But it's now calculated based on possible cpu<->node information
which can be wrong after cpu hotplug/unplug.
If wrong p
(2014/12/16 14:32), Lai Jiangshan wrote:
On 12/15/2014 07:16 PM, Kamezawa Hiroyuki wrote:
The percpu workqueue pool are persistend and never be freed.
But cpu<->node relationship can be changed by cpu hotplug and pool->node
can point to an offlined node.
If pool->node points to
numa node affinity can be modified if memory-less
node turns out to be an usual node by step 2.
This patch handles the event in CPU_ONLINE callback of workqueue.
Signed-off-by: KAMEZAWA Hiroyuki
---
kernel/workqueue.c | 15 +++
1 file changed, 15 insertions(+)
diff --git a/kernel/wor
Workqueue keeps cpu<->node relationship including all possible cpus.
The original information was made at boot but it may change when
a new node is added.
Update information if a new node is ready with using node-hotplug callback.
Signed-off-by: KAMEZAWA Hiroyuki
---
include
e affinity at
cpu offlining and restore it at cpu onlining.
Signed-off-by: KAMEZAWA Hiroyuki
---
kernel/workqueue.c | 11 ++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 7809154..2fd0bd7 100644
--- a/kernel/workqueue.c
+++ b/k
nbound_pool
alloc_unbound_pwq
wq_update_unbound_numa
called at CPU_ONLINE/CPU_DOWN_PREPARE
and the latest online cpu info can be applied to a new wq pool,
which replaces old one.
Signed-off-by: KAMEZAWA Hiroyuki
---
kernel/workqueue.c | 38 ++
1 file chan
Lai-san, Tejun-san,
Thank you for review, this a fix v3. This has been tested on NUMA node hotplug
machine and seems work well.
The probelm is memory allocation failure because pool->node information can be
stale
after node hotplug. This patch(1,2) tries to fix pool->node calculation.
Patch (3,4
(2014/12/15 14:19), Lai Jiangshan wrote:
On 12/15/2014 12:04 PM, Kamezawa Hiroyuki wrote:
(2014/12/15 12:34), Lai Jiangshan wrote:
On 12/15/2014 10:55 AM, Kamezawa Hiroyuki wrote:
(2014/12/15 11:48), Lai Jiangshan wrote:
On 12/15/2014 10:20 AM, Kamezawa Hiroyuki wrote:
(2014/12/15 11:12
(2014/12/15 12:34), Lai Jiangshan wrote:
On 12/15/2014 10:55 AM, Kamezawa Hiroyuki wrote:
(2014/12/15 11:48), Lai Jiangshan wrote:
On 12/15/2014 10:20 AM, Kamezawa Hiroyuki wrote:
(2014/12/15 11:12), Lai Jiangshan wrote:
On 12/14/2014 12:38 AM, Kamezawa Hiroyuki wrote:
Although workqueue
(2014/12/15 11:48), Lai Jiangshan wrote:
On 12/15/2014 10:20 AM, Kamezawa Hiroyuki wrote:
(2014/12/15 11:12), Lai Jiangshan wrote:
On 12/14/2014 12:38 AM, Kamezawa Hiroyuki wrote:
Although workqueue detects relationship between cpu<->node at boot,
it is finally determined in cpu_up()
(2014/12/15 11:12), Lai Jiangshan wrote:
On 12/14/2014 12:38 AM, Kamezawa Hiroyuki wrote:
Although workqueue detects relationship between cpu<->node at boot,
it is finally determined in cpu_up().
This patch tries to update pool->node using online status of cpus.
1. When a node goes do
(2014/12/15 11:06), Lai Jiangshan wrote:
On 12/14/2014 12:35 AM, Kamezawa Hiroyuki wrote:
remove node aware unbound pools if node goes offline.
scan unbound workqueue and remove numa affine pool when
a node goes offline.
Signed-off-by: KAMEZAWA Hiroyuki
---
kernel/workqueue.c | 29
ode attr.
3. When a cpu comes up, update possinle node cpumask workqueue is using for
sched.
4. Detect the best node for unbound pool's cpumask using the latest info.
Signed-off-by: KAMEZAWA Hiroyuki
---
kernel/workqueue.c | 67 ++
1 fil
remove node aware unbound pools if node goes offline.
scan unbound workqueue and remove numa affine pool when
a node goes offline.
Signed-off-by: KAMEZAWA Hiroyuki
---
kernel/workqueue.c | 29 +
1 file changed, 29 insertions(+)
diff --git a/kernel/workqueue.c b
Yasuaki Ishimatsu hit a allocation failure bug when the numa mapping
between CPU and node is changed. This was the last scene:
SLUB: Unable to allocate memory on node 2 (gfp=0x80d0)
cache: kmalloc-192, object size: 192, buffer size: 192, default order: 1, min
order: 0
node 0: slabs: 6172, ob
Add warning if pool->node is offline.
This patch was originaly made for debug.
I think add warning here can show what may happen.
Signed-off-by: KAMEZAWA Hiroyuki
---
kernel/workqueue.c | 16 +---
1 file changed, 13 insertions(+), 3 deletions(-)
diff --git a/kernel/workqueue.
ode hotplug.
Signed-off-by: KAMEZAWA Hiroyuki node should be cleared and
+ * cached pools per cpu should be freed at node unplug
+ */
+
+void workqueue_register_numanode(int nid)
+{
+}
+
+void workqueue_unregister_numanode(int nid)
+{
+}
+#endif
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplu
(2014/10/31 18:46), Tang Chen wrote:
> When we are doing memory hot-add, the following functions are called:
>
> add_memory()
> |--> hotadd_new_pgdat()
> |--> free_area_init_node()
>|--> free_area_init_core()
> |--> zone->present_pages = realsize; /* 1.
(2014/11/02 12:15), Johannes Weiner wrote:
> Now that the external page_cgroup data structure and its lookup is
> gone, let the generic bad_page() check for page->mem_cgroup sanity.
>
> Signed-off-by: Johannes Weiner
Acked-by: KAMEZAWA Hiroyuki
--
To unsubscribe from this list
Weiner
Acked-by: KAMEZAWA Hiroyuki
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
----
> 8 files changed, 41 insertions(+), 487 deletions(-)
>
Great!
Acked-by: KAMEZAWA Hiroyuki
BTW, init/Kconfig comments shouldn't be updated ?
(I'm sorry if it has been updated since your latest fix.)
--
To unsubscribe from this list:
ter the final LRU removal. Uncharge can simply clear the
> pointer and the PCG_USED/PageCgroupUsed sites can test that instead.
>
> Because this is the last page_cgroup flag, this patch reduces the
> memcg per-page overhead to a single pointer.
>
> Signed-off-by: Johannes We
1 - 100 of 709 matches
Mail list logo