Re: [PATCH 0/2] Make core_pattern support namespace

2016-03-19 Thread Kamezawa Hiroyuki
On 2016/03/16 18:23, Zhao Lei wrote: > We discussed patch titled: > [PATCH] Make core_pattern support namespace > before. > > Above patch can solve half problem of custom core_dump pattern > in container, but there are also another problem that limit > custom core_pattern in container, it is the

Re: call_usermodehelper in containers

2016-02-19 Thread Kamezawa Hiroyuki
On 2016/02/19 14:37, Ian Kent wrote: On Fri, 2016-02-19 at 12:08 +0900, Kamezawa Hiroyuki wrote: On 2016/02/19 5:45, Eric W. Biederman wrote: Personally I am a fan of the don't be clever and capture a kernel thread approach as it is very easy to see you what if any exploitation opportun

Re: call_usermodehelper in containers

2016-02-18 Thread Kamezawa Hiroyuki
On 2016/02/19 5:45, Eric W. Biederman wrote: > Personally I am a fan of the don't be clever and capture a kernel thread > approach as it is very easy to see you what if any exploitation > opportunities there are. The justifications for something more clever > is trickier. Of course we do somethi

Re: call_usermodehelper in containers

2016-02-17 Thread Kamezawa Hiroyuki
On 2016/02/18 11:57, Eric W. Biederman wrote: > > Ccing The containers list because a related discussion is happening there > and somehow this thread has never made it there. > > Ian Kent writes: > >> On Mon, 2013-11-18 at 18:28 +0100, Oleg Nesterov wrote: >>> On 11/15, Eric W. Biederman wrote:

Re: [Propose] Isolate core_pattern in mnt namespace.

2015-12-21 Thread Kamezawa Hiroyuki
On 2015/12/22 6:52, Eric W. Biederman wrote: > Dongsheng Yang writes: > >> On 12/20/2015 05:47 PM, Eric W. Biederman wrote: >>> Dongsheng Yang writes: >>> On 12/20/2015 10:37 AM, Al Viro wrote: > On Sun, Dec 20, 2015 at 10:14:29AM +0800, Dongsheng Yang wrote: >> On 12/17/2015 07:23

Re: [Propose] Isolate core_pattern in mnt namespace.

2015-12-20 Thread Kamezawa Hiroyuki
On 2015/12/20 18:47, Eric W. Biederman wrote: > Dongsheng Yang writes: > >> On 12/20/2015 10:37 AM, Al Viro wrote: >>> On Sun, Dec 20, 2015 at 10:14:29AM +0800, Dongsheng Yang wrote: On 12/17/2015 07:23 PM, Dongsheng Yang wrote: > Hi guys, > We are working on making core dump b

Re: [PATCH v2 7/7] Documentation: cgroup: add memory.swap.{current,max} description

2015-12-17 Thread Kamezawa Hiroyuki
On 2015/12/17 21:30, Vladimir Davydov wrote: > The rationale of separate swap counter is given by Johannes Weiner. > > Signed-off-by: Vladimir Davydov > --- > Changes in v2: > - Add rationale of separate swap counter provided by Johannes. > > Documentation/cgroup.txt | 33 +++

Re: [PATCH v3 2/2] mm: Introduce kernelcore=mirror option

2015-12-17 Thread Kamezawa Hiroyuki
On 2015/12/18 3:43, Luck, Tony wrote: As Tony requested, we may need a knob to stop a fallback in "movable->normal", later. If the mirrored memory is small and the other is large, I think we can both enable "non-mirrored -> normal" and "normal -> non-mirrored". Size of mirrored memory can

Re: [PATCH v3 2/2] mm: Introduce kernelcore=mirror option

2015-12-16 Thread Kamezawa Hiroyuki
On 2015/12/17 13:48, Xishi Qiu wrote: > On 2015/12/17 10:53, Kamezawa Hiroyuki wrote: > >> On 2015/12/17 11:47, Xishi Qiu wrote: >>> On 2015/12/17 9:38, Izumi, Taku wrote: >>> >>>> Dear Xishi, >>>> >>>>Sorry for late. >>

Re: [PATCH 1/7] mm: memcontrol: charge swap to cgroup2

2015-12-16 Thread Kamezawa Hiroyuki
On 2015/12/17 12:32, Johannes Weiner wrote: On Thu, Dec 17, 2015 at 11:46:27AM +0900, Kamezawa Hiroyuki wrote: On 2015/12/16 20:09, Johannes Weiner wrote: On Wed, Dec 16, 2015 at 12:18:30PM +0900, Kamezawa Hiroyuki wrote: - swap-full notification via vmpressure or something mechanism. Why

Re: [PATCH v3 2/2] mm: Introduce kernelcore=mirror option

2015-12-16 Thread Kamezawa Hiroyuki
6:44 PM >>> To: Izumi, Taku/泉 拓 >>> Cc: Luck, Tony; linux-kernel@vger.kernel.org; linux...@kvack.org; >>> a...@linux-foundation.org; Kamezawa, Hiroyuki/亀澤 寛 >>> 之; m...@csn.ul.ie; Hansen, Dave; m...@codeblueprint.co.uk >>> Subject: Re: [PATCH v3 2/2] mm:

Re: [PATCH 1/7] mm: memcontrol: charge swap to cgroup2

2015-12-16 Thread Kamezawa Hiroyuki
On 2015/12/16 20:09, Johannes Weiner wrote: On Wed, Dec 16, 2015 at 12:18:30PM +0900, Kamezawa Hiroyuki wrote: Hmm, my requests are - set the same capabilities as mlock() to set swap.limit=0 Setting swap.max is already privileged operation. Sure. - swap-full notification via

Re: [PATCH 1/7] mm: memcontrol: charge swap to cgroup2

2015-12-15 Thread Kamezawa Hiroyuki
On 2015/12/16 2:21, Michal Hocko wrote: I completely agree that malicious/untrusted users absolutely have to be capped by the hard limit. Then the separate swap limit would work for sure. But I am less convinced about usefulness of the rigid (to the global memory pressure) swap limit without th

Re: [PATCH 1/7] mm: memcontrol: charge swap to cgroup2

2015-12-15 Thread Kamezawa Hiroyuki
On 2015/12/15 23:50, Johannes Weiner wrote: On Tue, Dec 15, 2015 at 12:22:41PM +0900, Kamezawa Hiroyuki wrote: On 2015/12/15 4:42, Vladimir Davydov wrote: Anyway, if you don't trust a container you'd better set the hard memory limit so that it can't hurt others no matter what

Re: [PATCH 1/7] mm: memcontrol: charge swap to cgroup2

2015-12-15 Thread Kamezawa Hiroyuki
On 2015/12/15 20:02, Vladimir Davydov wrote: On Tue, Dec 15, 2015 at 12:22:41PM +0900, Kamezawa Hiroyuki wrote: On 2015/12/15 4:42, Vladimir Davydov wrote: On Mon, Dec 14, 2015 at 04:30:37PM +0100, Michal Hocko wrote: On Thu 10-12-15 14:39:14, Vladimir Davydov wrote: In the legacy hierarchy

Re: [PATCH 1/7] mm: memcontrol: charge swap to cgroup2

2015-12-15 Thread Kamezawa Hiroyuki
On 2015/12/15 17:30, Vladimir Davydov wrote: On Tue, Dec 15, 2015 at 12:12:40PM +0900, Kamezawa Hiroyuki wrote: On 2015/12/15 0:30, Michal Hocko wrote: On Thu 10-12-15 14:39:14, Vladimir Davydov wrote: In the legacy hierarchy we charge memsw, which is dubious, because: - memsw.limit must

Re: [PATCH 1/7] mm: memcontrol: charge swap to cgroup2

2015-12-14 Thread Kamezawa Hiroyuki
On 2015/12/15 0:30, Michal Hocko wrote: On Thu 10-12-15 14:39:14, Vladimir Davydov wrote: In the legacy hierarchy we charge memsw, which is dubious, because: - memsw.limit must be >= memory.limit, so it is impossible to limit swap usage less than memory usage. Taking into account the fact

Re: [PATCH 1/7] mm: memcontrol: charge swap to cgroup2

2015-12-14 Thread Kamezawa Hiroyuki
On 2015/12/15 4:42, Vladimir Davydov wrote: On Mon, Dec 14, 2015 at 04:30:37PM +0100, Michal Hocko wrote: On Thu 10-12-15 14:39:14, Vladimir Davydov wrote: In the legacy hierarchy we charge memsw, which is dubious, because: - memsw.limit must be >= memory.limit, so it is impossible to limit

Re: [PATCH 1/7] mm: memcontrol: charge swap to cgroup2

2015-12-10 Thread Kamezawa Hiroyuki
On 2015/12/10 20:39, Vladimir Davydov wrote: > In the legacy hierarchy we charge memsw, which is dubious, because: > > - memsw.limit must be >= memory.limit, so it is impossible to limit > swap usage less than memory usage. Taking into account the fact that > the primary limiting mechani

Re: [PATCH] mm: Introduce kernelcore=reliable option

2015-11-03 Thread Kamezawa Hiroyuki
On 2015/10/31 4:42, Luck, Tony wrote: If each memory controller has the same distance/latency, you (your firmware) don't need to allocate reliable memory per each memory controller. If distance is problem, another node should be allocated. ...is the behavior(splitting zone) really required ?

Re: [RFC 1/3] mm, oom: refactor oom detection

2015-10-30 Thread Kamezawa Hiroyuki
On 2015/10/30 17:23, Michal Hocko wrote: On Fri 30-10-15 14:23:59, KAMEZAWA Hiroyuki wrote: On 2015/10/30 0:17, mho...@kernel.org wrote: [...] @@ -3135,13 +3145,56 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, if (gfp_mask & __GFP_NORETRY) goto nor

Re: [PATCH] mm: Introduce kernelcore=reliable option

2015-10-29 Thread Kamezawa Hiroyuki
On 2015/10/23 10:44, Luck, Tony wrote: > First part of each memory controller. I have two memory controllers on each > node > If each memory controller has the same distance/latency, you (your firmware) don't need to allocate reliable memory per each memory controller. If distance is problem, a

Re: [RFC 2/3] mm: throttle on IO only when there are too many dirty and writeback pages

2015-10-29 Thread Kamezawa Hiroyuki
On 2015/10/30 0:17, mho...@kernel.org wrote: > From: Michal Hocko > > wait_iff_congested has been used to throttle allocator before it retried > another round of direct reclaim to allow the writeback to make some > progress and prevent reclaim from looping over dirty/writeback pages > without mak

Re: [RFC 1/3] mm, oom: refactor oom detection

2015-10-29 Thread Kamezawa Hiroyuki
On 2015/10/30 0:17, mho...@kernel.org wrote: > From: Michal Hocko > > __alloc_pages_slowpath has traditionally relied on the direct reclaim > and did_some_progress as an indicator that it makes sense to retry > allocation rather than declaring OOM. shrink_zones had to rely on > zone_reclaimable i

Re: [PATCH] mm: Introduce kernelcore=reliable option

2015-10-22 Thread Kamezawa Hiroyuki
On 2015/10/22 3:17, Luck, Tony wrote: + if (reliable_kernelcore) { + for_each_memblock(memory, r) { + if (memblock_is_mirror(r)) + continue; Should we have a safety check here that there is some mirrored memory? If you giv

Re: [PATCH][RFC] mm: Introduce kernelcore=reliable option

2015-10-13 Thread Kamezawa Hiroyuki
On 2015/10/09 19:36, Xishi Qiu wrote: On 2015/10/9 17:24, Kamezawa Hiroyuki wrote: On 2015/10/09 15:46, Xishi Qiu wrote: On 2015/10/9 22:56, Taku Izumi wrote: Xeon E7 v3 based systems supports Address Range Mirroring and UEFI BIOS complied with UEFI spec 2.5 can notify which ranges are

Re: [PATCH][RFC] mm: Introduce kernelcore=reliable option

2015-10-09 Thread Kamezawa Hiroyuki
On 2015/10/09 15:46, Xishi Qiu wrote: On 2015/10/9 22:56, Taku Izumi wrote: Xeon E7 v3 based systems supports Address Range Mirroring and UEFI BIOS complied with UEFI spec 2.5 can notify which ranges are reliable (mirrored) via EFI memory map. Now Linux kernel utilize its information and alloca

Re: [Intel-wired-lan] [Patch V3 5/9] i40e: Use numa_mem_id() to better support memoryless node

2015-10-09 Thread Kamezawa Hiroyuki
On 2015/10/09 14:52, Jiang Liu wrote: On 2015/10/9 4:20, Andrew Morton wrote: On Wed, 19 Aug 2015 17:18:15 -0700 (PDT) David Rientjes wrote: On Wed, 19 Aug 2015, Patil, Kiran wrote: Acked-by: Kiran Patil Where's the call to preempt_disable() to prevent kernels with preemption from makin

Re: [PATCH 03/11] x86/mm/hotplug: Don't remove PGD entries in remove_pagetable()

2015-10-05 Thread Kamezawa Hiroyuki
On 2015/09/22 15:23, Ingo Molnar wrote: > So when memory hotplug removes a piece of physical memory from pagetable > mappings, it also frees the underlying PGD entry. > > This complicates PGD management, so don't do this. We can keep the > PGD mapped and the PUD table all clear - it's only a singl

Re: [PATCH 3/3] sched: Implement interface for cgroup unified hierarchy

2015-08-24 Thread Kamezawa Hiroyuki
On 2015/08/25 8:15, Paul Turner wrote: On Mon, Aug 24, 2015 at 3:49 PM, Tejun Heo wrote: Hello, On Mon, Aug 24, 2015 at 03:03:05PM -0700, Paul Turner wrote: Hmm... I was hoping for an actual configurations and usage scenarios. Preferably something people can set up and play with. This is mu

Re: [PATCH 3/3] sched: Implement interface for cgroup unified hierarchy

2015-08-18 Thread Kamezawa Hiroyuki
On 2015/08/19 5:31, Tejun Heo wrote: Hello, Paul. On Mon, Aug 17, 2015 at 09:03:30PM -0700, Paul Turner wrote: 2) Control within an address-space. For subsystems with fungible resources, e.g. CPU, it can be useful for an address space to partition its own threads. Losing the capability to do

Re: [PATCH 0/3] Make workingset detection logic memcg aware

2015-08-11 Thread Kamezawa Hiroyuki
On 2015/08/10 17:14, Vladimir Davydov wrote: On Sun, Aug 09, 2015 at 11:12:25PM +0900, Kamezawa Hiroyuki wrote: On 2015/08/08 22:05, Vladimir Davydov wrote: On Fri, Aug 07, 2015 at 10:38:16AM +0900, Kamezawa Hiroyuki wrote: ... All ? hmm. It seems that mixture of record of global memory

Re: [PATCH 0/3] Make workingset detection logic memcg aware

2015-08-09 Thread Kamezawa Hiroyuki
On 2015/08/08 22:05, Vladimir Davydov wrote: On Fri, Aug 07, 2015 at 10:38:16AM +0900, Kamezawa Hiroyuki wrote: On 2015/08/06 17:59, Vladimir Davydov wrote: On Wed, Aug 05, 2015 at 10:34:58AM +0900, Kamezawa Hiroyuki wrote: I wonder, rather than collecting more data, rough calculation can

Re: [PATCH 0/3] Make workingset detection logic memcg aware

2015-08-06 Thread Kamezawa Hiroyuki
On 2015/08/06 17:59, Vladimir Davydov wrote: On Wed, Aug 05, 2015 at 10:34:58AM +0900, Kamezawa Hiroyuki wrote: Reading discussion, I feel storing more data is difficult, too. Yep, even with the current 16-bit memcg id. Things would get even worse if we wanted to extend it one day (will we

Re: [PATCH v2 1/3] cgroup: define controller file conventions

2015-08-05 Thread Kamezawa Hiroyuki
On 2015/08/05 16:47, Michal Hocko wrote: On Wed 05-08-15 09:39:40, KAMEZAWA Hiroyuki wrote: [...] so, for memory controller, we'll have We currently have only current, low, high, max and events currently. All other knobs are either deprecated or waiting for a usecase to emerge before the

Re: [PATCH 0/3] Make workingset detection logic memcg aware

2015-08-04 Thread Kamezawa Hiroyuki
On 2015/08/03 21:04, Vladimir Davydov wrote: > Hi, > > Currently, workingset detection logic is not memcg aware - inactive_age > is maintained per zone. As a result, if memory cgroups are used, > refaulted file pages are activated randomly. This patch set makes > inactive_age per lruvec so that wo

Re: [PATCH v2 1/3] cgroup: define controller file conventions

2015-08-04 Thread Kamezawa Hiroyuki
On 2015/08/05 4:31, Tejun Heo wrote: From 6abc8ca19df0078de17dc38340db3002ed489ce7 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Tue, 4 Aug 2015 15:20:55 -0400 Traditionally, each cgroup controller implemented whatever interface it wanted leading to interfaces which are widely inconsistent. E

Re: [RFC v2 PATCH 2/8] mm: introduce MIGRATE_MIRROR to manage the mirrored pages

2015-06-30 Thread Kamezawa Hiroyuki
On 2015/06/30 11:45, Xishi Qiu wrote: On 2015/6/29 15:32, Kamezawa Hiroyuki wrote: On 2015/06/27 11:24, Xishi Qiu wrote: This patch introduces a new migratetype called "MIGRATE_MIRROR", it is used to allocate mirrored pages. When cat /proc/pagetypeinfo, you can see the count of fre

Re: [RFC v2 PATCH 7/8] mm: add the buddy system interface

2015-06-29 Thread Kamezawa Hiroyuki
On 2015/06/30 10:31, Xishi Qiu wrote: On 2015/6/30 9:01, Kamezawa Hiroyuki wrote: On 2015/06/30 8:11, Luck, Tony wrote: @@ -814,7 +814,7 @@ int __init_memblock memblock_clear_hotplug(phys_addr_t base, phys_addr_t size) */ int __init_memblock memblock_mark_mirror(phys_addr_t base

Re: [RFC v2 PATCH 7/8] mm: add the buddy system interface

2015-06-29 Thread Kamezawa Hiroyuki
On 2015/06/30 8:11, Luck, Tony wrote: @@ -814,7 +814,7 @@ int __init_memblock memblock_clear_hotplug(phys_addr_t base, phys_addr_t size) */ int __init_memblock memblock_mark_mirror(phys_addr_t base, phys_addr_t size) { - system_has_some_mirror = true; + static_key_slow_inc(&sy

Re: [RFC v2 PATCH 4/8] mm: add mirrored memory to buddy system

2015-06-29 Thread Kamezawa Hiroyuki
On 2015/06/27 11:25, Xishi Qiu wrote: Before free bootmem, set mirrored pageblock's migratetype to MIGRATE_MIRROR, so they could free to buddy system's MIGRATE_MIRROR list. When set reserved memory, skip the mirrored memory. Signed-off-by: Xishi Qiu --- include/linux/memblock.h | 3 +++ mm/

Re: [RFC v2 PATCH 2/8] mm: introduce MIGRATE_MIRROR to manage the mirrored pages

2015-06-29 Thread Kamezawa Hiroyuki
On 2015/06/27 11:24, Xishi Qiu wrote: This patch introduces a new migratetype called "MIGRATE_MIRROR", it is used to allocate mirrored pages. When cat /proc/pagetypeinfo, you can see the count of free mirrored blocks. Signed-off-by: Xishi Qiu My fear about this approarch is that this may brea

Re: [RFC v2 PATCH 1/8] mm: add a new config to manage the code

2015-06-28 Thread Kamezawa Hiroyuki
xample) Thanks, -Kame >From 88213b0f76e2f603c5a38690cbd85a4df1e646ba Mon Sep 17 00:00:00 2001 From: KAMEZAWA Hiroyuki Date: Mon, 29 Jun 2015 15:35:47 +0900 Subject: [PATCH] add a new config option for memory mirror Add a new config option "CONFIG_MEMORY_MIRROR" for kernel assisted memory mirroring. In UEFI2.5 spec, Addr

Re: [RFC PATCH 10/12] mm: add the buddy system interface

2015-06-26 Thread Kamezawa Hiroyuki
On 2015/06/26 10:43, Xishi Qiu wrote: On 2015/6/26 7:54, Kamezawa Hiroyuki wrote: On 2015/06/25 18:44, Xishi Qiu wrote: On 2015/6/10 11:06, Kamezawa Hiroyuki wrote: On 2015/06/09 19:04, Xishi Qiu wrote: On 2015/6/9 15:12, Kamezawa Hiroyuki wrote: On 2015/06/04 22:04, Xishi Qiu wrote

Re: [RFC PATCH 10/12] mm: add the buddy system interface

2015-06-25 Thread Kamezawa Hiroyuki
On 2015/06/25 18:44, Xishi Qiu wrote: On 2015/6/10 11:06, Kamezawa Hiroyuki wrote: On 2015/06/09 19:04, Xishi Qiu wrote: On 2015/6/9 15:12, Kamezawa Hiroyuki wrote: On 2015/06/04 22:04, Xishi Qiu wrote: Add the buddy system interface for address range mirroring feature. Allocate mirrored

Re: [RFC PATCH 10/12] mm: add the buddy system interface

2015-06-15 Thread Kamezawa Hiroyuki
On 2015/06/16 2:20, Luck, Tony wrote: On Mon, Jun 15, 2015 at 05:47:27PM +0900, Kamezawa Hiroyuki wrote: So, there are 3 ideas. (1) kernel only from MIRROR / user only from MOVABLE (Tony) (2) kernel only from MIRROR / user from MOVABLE + MIRROR(ASAP) (AKPM suggested) This makes use

Re: [RFC PATCH 10/12] mm: add the buddy system interface

2015-06-15 Thread Kamezawa Hiroyuki
On 2015/06/11 5:40, Luck, Tony wrote: I guess, mirrored memory should be allocated if !__GFP_HIGHMEM or !__GFP_MOVABLE HIGHMEM shouldn't matter - partial memory mirror only makes any sense on X86_64 systems ... 32-bit kernels don't even boot on systems with 64GB, and the minimum rational confi

Re: [RFC PATCH 08/12] mm: use mirrorable to switch allocate mirrored memory

2015-06-09 Thread Kamezawa Hiroyuki
On 2015/06/09 19:09, Xishi Qiu wrote: On 2015/6/9 15:06, Kamezawa Hiroyuki wrote: On 2015/06/04 22:02, Xishi Qiu wrote: Add a new interface in path /proc/sys/vm/mirrorable. When set to 1, it means we should allocate mirrored memory for both user and kernel processes. Signed-off-by: Xishi Qiu

Re: [RFC PATCH 01/12] mm: add a new config to manage the code

2015-06-09 Thread Kamezawa Hiroyuki
On 2015/06/09 19:10, Xishi Qiu wrote: On 2015/6/9 14:44, Kamezawa Hiroyuki wrote: On 2015/06/04 21:56, Xishi Qiu wrote: This patch introduces a new config called "CONFIG_ACPI_MIRROR_MEMORY", it is used to on/off the feature. Signed-off-by: Xishi Qiu --- mm/Kconfig | 8

Re: [RFC PATCH 10/12] mm: add the buddy system interface

2015-06-09 Thread Kamezawa Hiroyuki
On 2015/06/09 19:04, Xishi Qiu wrote: On 2015/6/9 15:12, Kamezawa Hiroyuki wrote: On 2015/06/04 22:04, Xishi Qiu wrote: Add the buddy system interface for address range mirroring feature. Allocate mirrored pages in MIGRATE_MIRROR list. If there is no mirrored pages left, use other types pages

Re: [RFC PATCH 10/12] mm: add the buddy system interface

2015-06-09 Thread Kamezawa Hiroyuki
On 2015/06/04 22:04, Xishi Qiu wrote: Add the buddy system interface for address range mirroring feature. Allocate mirrored pages in MIGRATE_MIRROR list. If there is no mirrored pages left, use other types pages. Signed-off-by: Xishi Qiu --- mm/page_alloc.c | 40 ++

Re: [RFC PATCH 08/12] mm: use mirrorable to switch allocate mirrored memory

2015-06-09 Thread Kamezawa Hiroyuki
On 2015/06/04 22:02, Xishi Qiu wrote: Add a new interface in path /proc/sys/vm/mirrorable. When set to 1, it means we should allocate mirrored memory for both user and kernel processes. Signed-off-by: Xishi Qiu I can't see why do we need this switch. If this is set, all GFP_HIGHUSER will use

Re: [RFC PATCH 07/12] mm: introduce __GFP_MIRROR to allocate mirrored pages

2015-06-09 Thread Kamezawa Hiroyuki
On 2015/06/04 22:02, Xishi Qiu wrote: This patch introduces a new gfp flag called "__GFP_MIRROR", it is used to allocate mirrored pages through buddy system. Signed-off-by: Xishi Qiu In Tony's original proposal, the motivation was to mirror all kernel memory. Is the purpose of this patch mak

Re: [RFC PATCH 02/12] mm: introduce mirror_info

2015-06-09 Thread Kamezawa Hiroyuki
On 2015/06/04 21:57, Xishi Qiu wrote: This patch introduces a new struct called "mirror_info", it is used to storage the mirror address range which reported by EFI or ACPI. TBD: call add_mirror_info() to fill it. Signed-off-by: Xishi Qiu --- arch/x86/mm/numa.c | 3 +++ include/linux/mm

Re: [RFC PATCH 01/12] mm: add a new config to manage the code

2015-06-08 Thread Kamezawa Hiroyuki
On 2015/06/04 21:56, Xishi Qiu wrote: This patch introduces a new config called "CONFIG_ACPI_MIRROR_MEMORY", it is used to on/off the feature. Signed-off-by: Xishi Qiu --- mm/Kconfig | 8 1 file changed, 8 insertions(+) diff --git a/mm/Kconfig b/mm/Kconfig index 390214d..4f2a726 10

Re: [RFC PATCH 03/12] mm: introduce MIGRATE_MIRROR to manage the mirrored, pages

2015-06-08 Thread Kamezawa Hiroyuki
On 2015/06/04 21:58, Xishi Qiu wrote: This patch introduces a new MIGRATE_TYPES called "MIGRATE_MIRROR", it is used to storage the mirrored pages list. When cat /proc/pagetypeinfo, you can see the count of free mirrored blocks. I guess you need to add Mel to CC. e.g. euler-linux:~ # cat /pro

Re: [RESEND RFC PATCH 2/2] gfp: use the best near online node if the target node is offline

2015-04-27 Thread Kamezawa Hiroyuki
On 2015/04/25 5:01, Andrew Morton wrote: On Fri, 24 Apr 2015 17:58:33 +0800 Gu Zheng wrote: Since the change to the cpu <--> mapping (map the cpu to the physical node for all possible at the boot), the node of cpu may be not present, so we use the best near online node if the node is not prese

Re: [PATCH 0/2] workqueue: fix a bug when numa mapping is changed

2015-04-01 Thread Kamezawa Hiroyuki
On 2015/04/02 10:36, Gu Zheng wrote: Hi Kame, TJ, On 04/01/2015 04:30 PM, Kamezawa Hiroyuki wrote: On 2015/04/01 12:02, Tejun Heo wrote: On Wed, Apr 01, 2015 at 11:55:11AM +0900, Kamezawa Hiroyuki wrote: Now, hot-added cpus will have the lowest free cpu id. Because of this, in most of

Re: [PATCH 0/2] workqueue: fix a bug when numa mapping is changed

2015-04-01 Thread Kamezawa Hiroyuki
On 2015/04/01 12:02, Tejun Heo wrote: On Wed, Apr 01, 2015 at 11:55:11AM +0900, Kamezawa Hiroyuki wrote: Now, hot-added cpus will have the lowest free cpu id. Because of this, in most of systems which has only cpu-hot-add, cpu-ids are always contiguous even after cpu hot add. In enterprise

Re: [PATCH 1/2] x86/cpu hotplug: make apicid <--> cpuid mapping persistent

2015-03-31 Thread Kamezawa Hiroyuki
On 2015/03/30 18:58, Gu Zheng wrote: > Hi Kame-san, > > On 03/27/2015 12:31 AM, Kamezawa Hiroyuki wrote: > >> On 2015/03/26 13:55, Gu Zheng wrote: >>> Hi Kame-san, >>> On 03/26/2015 11:19 AM, Kamezawa Hiroyuki wrote: >>> >>>> On 2015/03

Re: [PATCH 0/2] workqueue: fix a bug when numa mapping is changed

2015-03-31 Thread Kamezawa Hiroyuki
On 2015/04/01 0:28, Tejun Heo wrote: Hello, Kamezawa. On Tue, Mar 31, 2015 at 03:09:05PM +0900, Kamezawa Hiroyuki wrote: But this may be considered as API change for most hot-add users. Hmm... Why would it be? What can that possibly break? Now, hot-added cpus will have the lowest free

Re: [PATCH 0/2] workqueue: fix a bug when numa mapping is changed

2015-03-30 Thread Kamezawa Hiroyuki
On 2015/03/30 18:49, Gu Zheng wrote: Hi Kame-san, On 03/27/2015 12:42 AM, Kamezawa Hiroyuki wrote: On 2015/03/27 0:18, Tejun Heo wrote: Hello, On Thu, Mar 26, 2015 at 01:04:00PM +0800, Gu Zheng wrote: wq generates the numa affinity (pool->node) for all the possible cpu's per cpu w

Re: [PATCH 0/2] workqueue: fix a bug when numa mapping is changed

2015-03-26 Thread Kamezawa Hiroyuki
On 2015/03/27 0:18, Tejun Heo wrote: Hello, On Thu, Mar 26, 2015 at 01:04:00PM +0800, Gu Zheng wrote: wq generates the numa affinity (pool->node) for all the possible cpu's per cpu workqueue at init stage, that means the affinity of currently un-present ones' may be incorrect, so we need to upd

Re: [PATCH 1/2] x86/cpu hotplug: make apicid <--> cpuid mapping persistent

2015-03-26 Thread Kamezawa Hiroyuki
On 2015/03/26 13:55, Gu Zheng wrote: > Hi Kame-san, > On 03/26/2015 11:19 AM, Kamezawa Hiroyuki wrote: > >> On 2015/03/26 11:17, Gu Zheng wrote: >>> Previously, we build the apicid <--> cpuid mapping when the cpu is present, >>> but >>> the rela

Re: [PATCH 1/2] x86/cpu hotplug: make apicid <--> cpuid mapping persistent

2015-03-25 Thread Kamezawa Hiroyuki
size: 192, default > order: > 1, min order: 0 >node 0: slabs: 6172, objs: 259224, free: 245741 >node 1: slabs: 3261, objs: 136962, free: 127656 >== > So here we build the persistent [lapic id] <--> cpuid mapping when the cpu > first > present, an

Re: [PATCH 0/2] workqueue: fix a bug when numa mapping is changed

2015-03-25 Thread Kamezawa Hiroyuki
On 2015/03/26 11:17, Gu Zheng wrote: > Yasuaki Ishimatsu found that with node online/offline, cpu<->node > relationship is established. Because workqueue uses a info which was > established at boot time, but it may be changed by node hotpluging. > > Once pool->node points to a stale node, followin

Re: [PATCH] workqueue: update numa affinity when node hotplug

2015-03-04 Thread Kamezawa Hiroyuki
On 2015/03/05 10:23, Gu Zheng wrote: Hi Kamazawa-san, On 03/04/2015 01:45 PM, Kamezawa Hiroyuki wrote: On 2015/03/03 22:18, Tejun Heo wrote: Hello, Kame. On Tue, Mar 03, 2015 at 03:53:46PM +0900, Kamezawa Hiroyuki wrote: relationship between proximity domain and lapic id doesn't c

Re: node-hotplug: is memset 0 safe in try_offline_node()?

2015-03-04 Thread Kamezawa Hiroyuki
On 2015/03/04 17:03, Xishi Qiu wrote: On 2015/3/4 11:56, Gu Zheng wrote: Hi Xishi, On 03/04/2015 10:52 AM, Xishi Qiu wrote: On 2015/3/4 10:22, Xishi Qiu wrote: On 2015/3/3 18:20, Gu Zheng wrote: Hi Xishi, On 03/03/2015 11:30 AM, Xishi Qiu wrote: When hot-remove a numa node, we will clea

Re: [PATCH] workqueue: update numa affinity when node hotplug

2015-03-03 Thread Kamezawa Hiroyuki
On 2015/03/03 22:18, Tejun Heo wrote: Hello, Kame. On Tue, Mar 03, 2015 at 03:53:46PM +0900, Kamezawa Hiroyuki wrote: relationship between proximity domain and lapic id doesn't change. relationship between lapic-id and cpu-id changes. pxm <-> memory address : no change pxm

Re: [PATCH] workqueue: update numa affinity when node hotplug

2015-03-02 Thread Kamezawa Hiroyuki
On 2015/03/03 1:28, Tejun Heo wrote: Hello, On Mon, Mar 02, 2015 at 05:41:05PM +0900, Kamezawa Hiroyuki wrote: Let me start from explaining current behavior. - cpu-id is determined when a new processor(lapicid/x2apicid) is founded. cpu-id<->nodeid relationship is _not_ recorded. I

Re: [PATCH] workqueue: update numa affinity when node hotplug

2015-03-02 Thread Kamezawa Hiroyuki
On 2015/02/27 20:54, Tejun Heo wrote: Hello, On Fri, Feb 27, 2015 at 06:04:52PM +0800, Gu Zheng wrote: Yasuaki Ishimatsu found that with node online/offline, cpu<->node relationship is established. Because workqueue uses a info which was established at boot time, but it may be changed by node h

Re: [PATCH 1/2] workqueue: update numa affinity info at node hotplug

2014-12-16 Thread Kamezawa Hiroyuki
(2014/12/17 12:22), Kamezawa Hiroyuki wrote: (2014/12/17 10:36), Lai Jiangshan wrote: On 12/17/2014 12:45 AM, Kamezawa Hiroyuki wrote: With node online/offline, cpu<->node relationship is established. Workqueue uses a info which was established at boot time but it may be changed b

Re: [PATCH 1/2] workqueue: update numa affinity info at node hotplug

2014-12-16 Thread Kamezawa Hiroyuki
(2014/12/17 10:36), Lai Jiangshan wrote: On 12/17/2014 12:45 AM, Kamezawa Hiroyuki wrote: With node online/offline, cpu<->node relationship is established. Workqueue uses a info which was established at boot time but it may be changed by node hotpluging. Once pool->node points to a s

[PATCH 2/2] workqueue: update cpumask at CPU_ONLINE if necessary

2014-12-16 Thread Kamezawa Hiroyuki
node hotplug, this case should be handled. Signed-off-by: KAMEZAWA Hiroyuki --- kernel/workqueue.c | 23 +++ 1 file changed, 23 insertions(+) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index f6ad05a..59d8be5 100644 --- a/kernel/workqueue.c +++ b/kernel/workqu

[PATCH 1/2] workqueue: update numa affinity info at node hotplug

2014-12-16 Thread Kamezawa Hiroyuki
per cpu pools. - clear per-cpu-pool's pool->node at node offlining. - set per-cpu-pool's pool->node at node onlining. - dropped modification to get_unbound_pool() - dropped per-cpu-pool handling at cpu online/offline. Reported-by: Yasuaki Ishimatsu Signed-off-by: KAMEZAWA Hiroyuk

[PATCH 0/2] workqueue: fix a bug when numa mapping is changed v4

2014-12-16 Thread Kamezawa Hiroyuki
This is v4. Thank you for hints/commentes to previous versions. I think this versions only contains necessary things and not invasive. Tested several patterns of node hotplug and seems to work well. Changes since v3 - removed changes against get_unbound_pool() - remvoed codes in cpu offline eve

Re: [PATCH 3/4] workqueue: Update workqueue's possible cpumask when a new node, coming up.

2014-12-16 Thread Kamezawa Hiroyuki
(2014/12/16 17:10), Kamezawa Hiroyuki wrote: (2014/12/16 16:49), Lai Jiangshan wrote: On 12/15/2014 07:18 PM, Kamezawa Hiroyuki wrote: Workqueue keeps cpu<->node relationship including all possible cpus. The original information was made at boot but it may change when a new node is

Re: [PATCH 3/4] workqueue: Update workqueue's possible cpumask when a new node, coming up.

2014-12-16 Thread Kamezawa Hiroyuki
(2014/12/16 16:49), Lai Jiangshan wrote: On 12/15/2014 07:18 PM, Kamezawa Hiroyuki wrote: Workqueue keeps cpu<->node relationship including all possible cpus. The original information was made at boot but it may change when a new node is added. Update information if a new node is read

Re: [PATCH 1/4] workqueue:Fix unbound workqueue's node affinity detection

2014-12-15 Thread Kamezawa Hiroyuki
(2014/12/16 14:30), Lai Jiangshan wrote: On 12/15/2014 07:14 PM, Kamezawa Hiroyuki wrote: Unbound wq pool's node attribute is calculated at its allocation. But it's now calculated based on possible cpu<->node information which can be wrong after cpu hotplug/unplug. If wrong p

Re: [PATCH 2/4] workqueue: update per-cpu workqueue's node affinity at,online-offline

2014-12-15 Thread Kamezawa Hiroyuki
(2014/12/16 14:32), Lai Jiangshan wrote: On 12/15/2014 07:16 PM, Kamezawa Hiroyuki wrote: The percpu workqueue pool are persistend and never be freed. But cpu<->node relationship can be changed by cpu hotplug and pool->node can point to an offlined node. If pool->node points to

[PATCH 4/4] workqueue: Handle cpu-node affinity change at CPU_ONLINE.

2014-12-15 Thread Kamezawa Hiroyuki
numa node affinity can be modified if memory-less node turns out to be an usual node by step 2. This patch handles the event in CPU_ONLINE callback of workqueue. Signed-off-by: KAMEZAWA Hiroyuki --- kernel/workqueue.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/kernel/wor

[PATCH 3/4] workqueue: Update workqueue's possible cpumask when a new node, coming up.

2014-12-15 Thread Kamezawa Hiroyuki
Workqueue keeps cpu<->node relationship including all possible cpus. The original information was made at boot but it may change when a new node is added. Update information if a new node is ready with using node-hotplug callback. Signed-off-by: KAMEZAWA Hiroyuki --- include

[PATCH 2/4] workqueue: update per-cpu workqueue's node affinity at,online-offline

2014-12-15 Thread Kamezawa Hiroyuki
e affinity at cpu offlining and restore it at cpu onlining. Signed-off-by: KAMEZAWA Hiroyuki --- kernel/workqueue.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/kernel/workqueue.c b/kernel/workqueue.c index 7809154..2fd0bd7 100644 --- a/kernel/workqueue.c +++ b/k

[PATCH 1/4] workqueue:Fix unbound workqueue's node affinity detection

2014-12-15 Thread Kamezawa Hiroyuki
nbound_pool alloc_unbound_pwq wq_update_unbound_numa called at CPU_ONLINE/CPU_DOWN_PREPARE and the latest online cpu info can be applied to a new wq pool, which replaces old one. Signed-off-by: KAMEZAWA Hiroyuki --- kernel/workqueue.c | 38 ++ 1 file chan

[PATCH 0/4] workqueue: fix memory allocation after numa mapping is changed v3

2014-12-15 Thread Kamezawa Hiroyuki
Lai-san, Tejun-san, Thank you for review, this a fix v3. This has been tested on NUMA node hotplug machine and seems work well. The probelm is memory allocation failure because pool->node information can be stale after node hotplug. This patch(1,2) tries to fix pool->node calculation. Patch (3,4

Re: [PATCH 4/4] workqueue: handle change in cpu-node relationship.

2014-12-14 Thread Kamezawa Hiroyuki
(2014/12/15 14:19), Lai Jiangshan wrote: On 12/15/2014 12:04 PM, Kamezawa Hiroyuki wrote: (2014/12/15 12:34), Lai Jiangshan wrote: On 12/15/2014 10:55 AM, Kamezawa Hiroyuki wrote: (2014/12/15 11:48), Lai Jiangshan wrote: On 12/15/2014 10:20 AM, Kamezawa Hiroyuki wrote: (2014/12/15 11:12

Re: [PATCH 4/4] workqueue: handle change in cpu-node relationship.

2014-12-14 Thread Kamezawa Hiroyuki
(2014/12/15 12:34), Lai Jiangshan wrote: On 12/15/2014 10:55 AM, Kamezawa Hiroyuki wrote: (2014/12/15 11:48), Lai Jiangshan wrote: On 12/15/2014 10:20 AM, Kamezawa Hiroyuki wrote: (2014/12/15 11:12), Lai Jiangshan wrote: On 12/14/2014 12:38 AM, Kamezawa Hiroyuki wrote: Although workqueue

Re: [PATCH 4/4] workqueue: handle change in cpu-node relationship.

2014-12-14 Thread Kamezawa Hiroyuki
(2014/12/15 11:48), Lai Jiangshan wrote: On 12/15/2014 10:20 AM, Kamezawa Hiroyuki wrote: (2014/12/15 11:12), Lai Jiangshan wrote: On 12/14/2014 12:38 AM, Kamezawa Hiroyuki wrote: Although workqueue detects relationship between cpu<->node at boot, it is finally determined in cpu_up()

Re: [PATCH 4/4] workqueue: handle change in cpu-node relationship.

2014-12-14 Thread Kamezawa Hiroyuki
(2014/12/15 11:12), Lai Jiangshan wrote: On 12/14/2014 12:38 AM, Kamezawa Hiroyuki wrote: Although workqueue detects relationship between cpu<->node at boot, it is finally determined in cpu_up(). This patch tries to update pool->node using online status of cpus. 1. When a node goes do

Re: [PATCH 3/4] workqueue: remove per-node unbound pool when node goes offline.

2014-12-14 Thread Kamezawa Hiroyuki
(2014/12/15 11:06), Lai Jiangshan wrote: On 12/14/2014 12:35 AM, Kamezawa Hiroyuki wrote: remove node aware unbound pools if node goes offline. scan unbound workqueue and remove numa affine pool when a node goes offline. Signed-off-by: KAMEZAWA Hiroyuki --- kernel/workqueue.c | 29

[PATCH 4/4] workqueue: handle change in cpu-node relationship.

2014-12-13 Thread Kamezawa Hiroyuki
ode attr. 3. When a cpu comes up, update possinle node cpumask workqueue is using for sched. 4. Detect the best node for unbound pool's cpumask using the latest info. Signed-off-by: KAMEZAWA Hiroyuki --- kernel/workqueue.c | 67 ++ 1 fil

[PATCH 3/4] workqueue: remove per-node unbound pool when node goes offline.

2014-12-13 Thread Kamezawa Hiroyuki
remove node aware unbound pools if node goes offline. scan unbound workqueue and remove numa affine pool when a node goes offline. Signed-off-by: KAMEZAWA Hiroyuki --- kernel/workqueue.c | 29 + 1 file changed, 29 insertions(+) diff --git a/kernel/workqueue.c b

[PATCH 0/4] workqueue: fix bug when numa mapping is changed v2.

2014-12-13 Thread Kamezawa Hiroyuki
Yasuaki Ishimatsu hit a allocation failure bug when the numa mapping between CPU and node is changed. This was the last scene: SLUB: Unable to allocate memory on node 2 (gfp=0x80d0) cache: kmalloc-192, object size: 192, buffer size: 192, default order: 1, min order: 0 node 0: slabs: 6172, ob

[PATCH 2/4] workqueue: add warning if pool->node is offline

2014-12-13 Thread Kamezawa Hiroyuki
Add warning if pool->node is offline. This patch was originaly made for debug. I think add warning here can show what may happen. Signed-off-by: KAMEZAWA Hiroyuki --- kernel/workqueue.c | 16 +--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/kernel/workqueue.

[PATCH 1/4] workqueue: add a hook for node hotplug

2014-12-13 Thread Kamezawa Hiroyuki
ode hotplug. Signed-off-by: KAMEZAWA Hiroyuki node should be cleared and + * cached pools per cpu should be freed at node unplug + */ + +void workqueue_register_numanode(int nid) +{ +} + +void workqueue_unregister_numanode(int nid) +{ +} +#endif diff --git a/mm/memory_hotplug.c b/mm/memory_hotplu

Re: [PATCH 2/2] mem-hotplug: Fix wrong check for zone->pageset initialization in online_pages().

2014-11-04 Thread Kamezawa Hiroyuki
(2014/10/31 18:46), Tang Chen wrote: > When we are doing memory hot-add, the following functions are called: > > add_memory() > |--> hotadd_new_pgdat() > |--> free_area_init_node() >|--> free_area_init_core() > |--> zone->present_pages = realsize; /* 1.

Re: [patch 3/3] mm: move page->mem_cgroup bad page handling into generic code

2014-11-04 Thread Kamezawa Hiroyuki
(2014/11/02 12:15), Johannes Weiner wrote: > Now that the external page_cgroup data structure and its lookup is > gone, let the generic bad_page() check for page->mem_cgroup sanity. > > Signed-off-by: Johannes Weiner Acked-by: KAMEZAWA Hiroyuki -- To unsubscribe from this list

Re: [patch 2/3] mm: page_cgroup: rename file to mm/swap_cgroup.c

2014-11-04 Thread Kamezawa Hiroyuki
Weiner Acked-by: KAMEZAWA Hiroyuki -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

Re: [patch 1/3] mm: embed the memcg pointer directly into struct page

2014-11-04 Thread Kamezawa Hiroyuki
---- > 8 files changed, 41 insertions(+), 487 deletions(-) > Great! Acked-by: KAMEZAWA Hiroyuki BTW, init/Kconfig comments shouldn't be updated ? (I'm sorry if it has been updated since your latest fix.) -- To unsubscribe from this list:

Re: [patch 4/4] mm: memcontrol: remove unnecessary PCG_USED pc->mem_cgroup valid flag

2014-10-21 Thread Kamezawa Hiroyuki
ter the final LRU removal. Uncharge can simply clear the > pointer and the PCG_USED/PageCgroupUsed sites can test that instead. > > Because this is the last page_cgroup flag, this patch reduces the > memcg per-page overhead to a single pointer. > > Signed-off-by: Johannes We

  1   2   3   4   5   6   7   8   >