Scheduling Scalability Update

2000-12-15 Thread Mike Kravetz
. The Scheduling Scalability page is at: http://lse.sourceforge.net/scheduling/ If you are interested in this work, please join the lse-tech mailing list at: http://sourceforge.net/projects/lse -- Mike Kravetz [EMAIL PROTECTED] IBM Linux Technology Center

test9: running tasks not in run-queue

2000-11-08 Thread Mike Kravetz
d from the run-queue. Now, what usually happens is that wake_up_process_synchronous or wake_up_process will add the task back to the run-queue as soon as the scheduler drops the run-queue lock. Therefore, this does not seem to cause any problems. I'm curious, is this behavior by design OR are

Scheduler Scalability CFP

2000-11-16 Thread Mike Kravetz
://sourceforge.net/projects/lse Thanks, -- Mike Kravetz [EMAIL PROTECTED] IBM Linux Technology Center 15450 SW Koll Parkway Beaverton, OR 97006-6063 (503)578-3494 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" i

Re: linux scheduler limitations?

2001-03-29 Thread Mike Kravetz
scheduler patches located at: http://lse.sourceforge.net/scheduling/ I would be interested in your observations. -- Mike Kravetz [EMAIL PROTECTED] IBM Linux Technology Center - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the

Re: a quest for a better scheduler

2001-04-03 Thread Mike Kravetz
veloped a 'token passing' benchmark which attempts to address these issues (called reflex at the above site). However, I would really like to get a pointer to a community acceptable workload/benchmark for these low thread cases. -- Mike Kravetz [EMAIL PROTECTED] IBM

Re: a quest for a better scheduler

2001-04-03 Thread Mike Kravetz
e. However, at this point one could argue that we have moved away from a 'realistic' low task count system load. > lmbench's lat_ctx for example, and other tools in lmbench trigger various > scheduler workloads as well. Thanks, I'll add these to our list. -- Mike Kravetz

Re: a quest for a better scheduler

2001-04-03 Thread Mike Kravetz
i-queue patch I developed, the scheduler always attempts to make the same global scheduling decisions as the current scheduler. -- Mike Kravetz [EMAIL PROTECTED] IBM Linux Technology Center - To unsubscribe from this list: send the line "unsubscribe linux-kernel"

Re: a quest for a better scheduler

2001-04-04 Thread Mike Kravetz
ons, load balancing algorithms take considerable effort to get working in a reasonable well performing manner. > > Could you make a port of your thing on recent kernels? There is a 2.4.2 patch on the web page. I'll put out a 2.4.3 patch as soon as I get some time. -- Mike Kravetz

Re: [PATCH 1/4] create mm/Kconfig for arch-independent memory options

2005-04-04 Thread Mike Kravetz
On Mon, Apr 04, 2005 at 10:50:09AM -0700, Dave Hansen wrote: diff -puN mm/Kconfig~A6-mm-Kconfig mm/Kconfig --- memhotplug/mm/Kconfig~A6-mm-Kconfig 2005-04-04 09:04:48.0 -0700 +++ memhotplug-dave/mm/Kconfig 2005-04-04 10:15:23.0 -0700 @@ -0,0 +1,25 @@ > +choice > + prompt

[PATCH] ppc64 Kconfig memory models

2005-04-05 Thread Mike Kravetz
and FLAT for others. -- Signed-off-by: Mike Kravetz <[EMAIL PROTECTED]> diff -Naupr linux-2.6.12-rc2-mm1/arch/ppc64/Kconfig linux-2.6.12-rc2-mm1.work/arch/ppc64/Kconfig --- linux-2.6.12-rc2-mm1/arch/ppc64/Kconfig 2005-04-05 18:44:57.0 + +++ linux-2.6.12-rc2-mm1.work/arch

Re: [RFC][PATCH] Sparse Memory Handling (hot-add foundation)

2005-02-17 Thread Mike Kravetz
On Thu, Feb 17, 2005 at 04:03:53PM -0800, Dave Hansen wrote: > The attached patch Just tried to compile this and noticed that there is no definition of valid_section_nr(), referenced in sparse_init. -- Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body

Re: [PATCH] PPC64 NUMA memory fixup

2005-03-10 Thread mike kravetz
On Thu, Mar 10, 2005 at 02:36:13AM -0800, Andrew Morton wrote: > > This patch causes the non-numa G5 to oops very early in boot in > smp_call_function(). > OK - Let me take a look. -- Mike - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to

Re: [PATCH] PPC64 NUMA memory fixup

2005-03-11 Thread mike kravetz
On Fri, Mar 11, 2005 at 07:51:38PM +1100, Paul Mackerras wrote: > > Anyway, the ultimate reason seems to be that the numa.c code is > assuming that an address value and a size value occupy the same number > of cells. On the G5 we have #address-cells = 2 but #size-cells = 1. > Previously this

Re: [PATCH] PPC64 NUMA memory fixup

2005-03-11 Thread mike kravetz
this on a machine known to break with the previous version (such as G5). -- Signed-off-by: Mike Kravetz <[EMAIL PROTECTED]> diff -Naupr linux-2.6.11/arch/ppc64/mm/numa.c linux-2.6.11.work/arch/ppc64/mm/numa.c --- linux-2.6.11/arch/ppc64/mm/numa.c 2005-03-02 07:38:38.0 + +++ linux-

[PATCH] PPC64 NUMA memory fixup (another try)

2005-03-16 Thread Mike Kravetz
and OpenPower 720. -- Signed-off-by: Mike Kravetz <[EMAIL PROTECTED]> diff -Naupr linux-2.6.11.4/arch/ppc64/mm/numa.c linux-2.6.11.4.work/arch/ppc64/mm/numa.c --- linux-2.6.11.4/arch/ppc64/mm/numa.c 2005-03-16 00:09:31.0 + +++ linux-2.6.11.4.work/arch/ppc64/mm/numa.c2005-03-16

Re: [PATCH] ppc64: Add mem=X option, updated NUMA support

2005-03-23 Thread Mike kravetz
On Wed, Mar 23, 2005 at 11:11:10PM +1100, Michael Ellerman wrote: > > Can you test this on your 720 or whatever it was? And if anyone else > has an interesting NUMA machine they can test it on I'd love to hear > about it! > I've tested this with various config options on my 720. Appears to

Re: NUMA policy interface

2005-08-04 Thread Mike Kravetz
On Thu, Aug 04, 2005 at 03:19:52PM -0700, Christoph Lameter wrote: > This code already exist in the memory hotplug code base and Ray already > had a working implementation for page migration. The migration code will > also be necessary in order to relocate pages with ECC single bit failures >

Re: Bug: early_pfn_in_nid() called when not early

2006-12-13 Thread Mike Kravetz
On Wed, Dec 13, 2006 at 07:20:57PM +0100, Arnd Bergmann wrote: > After a lot of debugging in spufs, I found that a crash that we encountered > on Cell actually was caused by a change in the memory management. > > The patch that caused it is archived in http://lkml.org/lkml/2006/11/1/43, > and

RT scheduling: wakeup bug?

2007-10-01 Thread Mike Kravetz
I've been trying to track down some unexpected realtime latencies and believe one source is a bug in the wakeup code. Specifically, this is within the try_to_wake_up() routine. Within this routine there is the following code segment: /* * If a newly woken up RT task cannot

Re: -rt scheduling: wakeup bug?

2007-10-02 Thread Mike Kravetz
On Tue, Oct 02, 2007 at 07:06:32AM +0200, Ingo Molnar wrote: > * Mike Kravetz <[EMAIL PROTECTED]> wrote: > > > > My observations/debugging/conclusions are based on an earlier version > > of the code. It appears the same code/issue still exists in the most > > v

Re: -rt scheduling: wakeup bug?

2007-10-03 Thread Mike Kravetz
On Tue, Oct 02, 2007 at 07:06:32AM +0200, Ingo Molnar wrote: > Index: linux-rt-rebase.q/kernel/sched.c > === > --- linux-rt-rebase.q.orig/kernel/sched.c > +++ linux-rt-rebase.q/kernel/sched.c > @@ -1819,6 +1819,13 @@ out_set_cpu: >

-rt more realtime scheduling issues

2007-10-05 Thread Mike Kravetz
Hi Ingo, After applying the fix to try_to_wake_up() I was still seeing some large latencies for realtime tasks. Some debug code pointed out two additional causes of these latencies. I have put fixes into my 'old' kernel and the scheduler related latencies have gone away. I'm pretty confident

Re: -rt more realtime scheduling issues

2007-10-08 Thread Mike Kravetz
On Fri, Oct 05, 2007 at 07:15:48PM -0700, Mike Kravetz wrote: > After applying the fix to try_to_wake_up() I was still seeing some large > latencies for realtime tasks. I've been looking for places in the code where reschedule IPIs should be sent in the case of 'overload' to redistribute Re

Re: -rt more realtime scheduling issues

2007-10-09 Thread Mike Kravetz
On Mon, Oct 08, 2007 at 11:04:12PM -0400, Steven Rostedt wrote: > On Mon, Oct 08, 2007 at 11:45:23AM -0700, Mike Kravetz wrote: > > Are these accurate statements? I'll start working on a reliable delivery > > mechanism for RealTime scheduling. But, I just want to make sure tha

Re: [PATCH RT] fix rt-task scheduling issue

2007-10-09 Thread Mike Kravetz
On Mon, Oct 08, 2007 at 10:46:21PM -0400, Steven Rostedt wrote: > Mike, > > Can you attach your Signed-off-by to this patch, please. > > > On Fri, Oct 05, 2007 at 07:15:48PM -0700, Mike Kravetz wrote: > > Hi Ingo, > > > > After applying the fix to try_to_wak

Re: [RFC PATCH RT] push waiting rt tasks to cpus with lower prios.

2007-10-09 Thread mike kravetz
On Tue, Oct 09, 2007 at 01:59:37PM -0400, Steven Rostedt wrote: > This has been complied tested (and no more ;-) > > The idea here is when we find a situation that we just scheduled in an > RT task and we either pushed a lesser RT task away or more than one RT > task was scheduled on this CPU

Re: [RFC PATCH RT] push waiting rt tasks to cpus with lower prios.

2007-10-09 Thread mike kravetz
On Tue, Oct 09, 2007 at 04:50:47PM -0400, Steven Rostedt wrote: > > I did something like this a while ago for another scheduling project. > > A couple 'possible' optimizations to think about are: > > 1) Only scan the remote runqueues once and keep a local copy of the > >remote priorities for

Re: [PATCH] RT: Fix special-case exception for preempting the local CPU

2007-10-10 Thread mike kravetz
On Wed, Oct 10, 2007 at 10:49:35AM -0400, Gregory Haskins wrote: > diff --git a/kernel/sched.c b/kernel/sched.c > index 3e75c62..b7f7a96 100644 > --- a/kernel/sched.c > +++ b/kernel/sched.c > @@ -1869,7 +1869,8 @@ out_activate: >* extra locking in this particular case, because >

Re: -rt more realtime scheduling issues

2007-10-10 Thread Mike Kravetz
On Wed, Oct 10, 2007 at 07:50:52AM -0400, Steven Rostedt wrote: > On Tue, Oct 09, 2007 at 11:49:53AM -0700, Mike Kravetz wrote: > > The more I try understand the IPI handling the more confused I get. :( > > At fist I was concerned about an IPI happening in the middle of the > &g

Re: [PATCH v2] mm: make start_isolate_page_range() fail if already isolated

2018-03-13 Thread Mike Kravetz
On 03/13/2018 02:14 PM, Andrew Morton wrote: > On Fri, 9 Mar 2018 14:47:31 -0800 Mike Kravetz > wrote: > >> start_isolate_page_range() is used to set the migrate type of a >> set of pageblocks to MIGRATE_ISOLATE while attempting to start >> a migration operation

Re: [PATCH 1/2] selftests/memfd/memfd_test.c: fix implicit declaration

2018-03-13 Thread Mike Kravetz
t system (and some other changes). To me, this seems like step in the wrong direction. But, I could be totally wrong and perhaps self tests should primarily target the host system header files. -- Mike Kravetz

Re: [RFC PATCH V2 00/22] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling

2018-02-14 Thread Mike Kravetz
nge() fail if already isolated" should handle this situation IF we decide to expose alloc_gigantic_page (which I do not suggest). -- Mike Kravetz

Re: [RFC PATCH V2 00/22] Intel(R) Resource Director Technology Cache Pseudo-Locking enabling

2018-02-15 Thread Mike Kravetz
On 02/15/2018 12:39 PM, Reinette Chatre wrote: > On 2/14/2018 10:31 AM, Reinette Chatre wrote: >> On 2/14/2018 10:12 AM, Mike Kravetz wrote: >>> On 02/13/2018 07:46 AM, Reinette Chatre wrote: >>>> Adding MM maintainers to v2 to share the new MM change (patch

Re: [RFC PATCH 1/3] mm: make start_isolate_page_range() fail if already isolated

2018-02-15 Thread Mike Kravetz
On 02/12/2018 02:20 PM, Mike Kravetz wrote: > start_isolate_page_range() is used to set the migrate type of a > page block to MIGRATE_ISOLATE while attempting to start a > migration operation. It is assumed that only one thread is > attempting such an operation, and due to the li

Re: [PATCH 4.15 000/105] 4.15.14-stable review

2018-03-28 Thread Mike Kravetz
4. > > There is a regression on arm32 in libhugetlbfs/truncate_above_4GB-2M-32 > that also exists in 4.14 and mainline. We'll investigate the root cause > and report upstream in mainline. I suspect the cause is "hugetlbfs: > check for pgoff value overflow", but have not ver

Re: [PATCH 4.15 000/105] 4.15.14-stable review

2018-03-28 Thread Mike Kravetz
On 03/28/2018 12:06 PM, Mike Kravetz wrote: > On 03/28/2018 11:44 AM, Dan Rue wrote: >> On Tue, Mar 27, 2018 at 06:26:40PM +0200, Greg Kroah-Hartman wrote: >>> This is the start of the stable review cycle for the 4.15.14 release. >>> There are 105 patches in this

[PATCH 0/1] fix regression in hugetlbfs overflow checking

2018-03-28 Thread Mike Kravetz
r than 4GB on 32 bit kernels. The above is in the commit message. 63489f8e8211 has been sent upstream and to stable, so cc'ing stable here as well. I would appreciate some more eyes on this code. There have been several fixes and we keep running into issues. Mike Kravetz (1): hugetlbfs: fix bu

[PATCH 1/1] hugetlbfs: fix bug in pgoff overflow checking

2018-03-28 Thread Mike Kravetz
y: Dan Rue Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 22 +- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index b9a254dcc0e7..8450a1d75dfa 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c

Re: [PATCH 0/1] fix regression in hugetlbfs overflow checking

2018-03-29 Thread Mike Kravetz
On 03/28/2018 09:16 PM, Mike Kravetz wrote: > Commit 63489f8e8211 ("hugetlbfs: check for pgoff value overflow") > introduced a regression in 32 bit kernels. When creating the mask > to check vm_pgoff, it incorrectly specified that the size of a loff_t > was the size of

Re: [PATCH] mm/hugetlb: prevent hugetlb VMA to be misaligned

2018-03-20 Thread Mike Kravetz
via shmget/shmat have their vm_ops replaced. Therefore, this split callout is never made. The shm vm_ops do indirectly call the original vm_ops routines as needed. Therefore, I would suggest a patch something like the following instead. If we move forward with the patch, we should include Laurent

Re: [PATCH] mm/hugetlb: prevent hugetlb VMA to be misaligned

2018-03-20 Thread Mike Kravetz
On 03/20/2018 02:26 PM, Mike Kravetz wrote: > Thanks Laurent! > > This bug was introduced by 31383c6865a5. Dan's changes for 31383c6865a5 > seem pretty straight forward. It simply replaces an explicit check when > splitting a vma to a new vm_ops split callout. Unfortunately, map

[PATCH v2] shm: add split function to shm_vm_ops

2018-03-21 Thread Mike Kravetz
->split() to vm_operations_struct") Signed-off-by: Mike Kravetz Reported by: Laurent Dufour Tested-by: Laurent Dufour Acked-by: Michal Hocko Cc: sta...@vger.kernel.org --- Changes in v2 * Updated commit message * Cc stable ipc/shm.c | 12 1 file changed, 12 insertions(+

Re: [PATCH v2] shm: add split function to shm_vm_ops

2018-03-21 Thread Mike Kravetz
On 03/21/2018 01:56 PM, Andrew Morton wrote: > On Wed, 21 Mar 2018 09:13:14 -0700 Mike Kravetz > wrote: >> >> +static int shm_split(struct vm_area_struct *vma, unsigned long addr) >> +{ >> +struct file *file = vma->vm_file; >> +struct

Re: [PATCH] hugetlbfs: check for pgoff value overflow

2018-03-08 Thread Mike Kravetz
On 03/07/2018 08:25 PM, Mike Kravetz wrote: > On 03/07/2018 05:35 PM, Yisheng Xie wrote: >> However, region_chg makes me a litter puzzle that when its return value < 0, >> sometime >> adds_in_progress is added like this case, while sometime it is not. so wh

[PATCH v2] hugetlbfs: check for pgoff value overflow

2018-03-08 Thread Mike Kravetz
fix to this code was incomplete and did not take the remap_file_pages system call into account. Fixes: 045c7a3f53d9 ("hugetlbfs: fix offset overflow in hugetlbfs mmap") Cc: Reported-by: Nic Losby Signed-off-by: Mike Kravetz --- Changes in v2 * Use bitmask for overflow check as suggested b

Re: [PATCH v2] hugetlbfs: check for pgoff value overflow

2018-03-08 Thread Mike Kravetz
On 03/08/2018 02:15 PM, Andrew Morton wrote: > On Thu, 8 Mar 2018 13:05:02 -0800 Mike Kravetz > wrote: > >> A vma with vm_pgoff large enough to overflow a loff_t type when >> converted to a byte offset can be passed via the remap_file_pages >> system call. The

[PATCH v3] hugetlbfs: check for pgoff value overflow

2018-03-08 Thread Mike Kravetz
fix to this code was incomplete and did not take the remap_file_pages system call into account. Fixes: 045c7a3f53d9 ("hugetlbfs: fix offset overflow in hugetlbfs mmap") Cc: Reported-by: Nic Losby Signed-off-by: Mike Kravetz --- Changes in v3 * Use a simpler mask computation as suggested by

[PATCH v2] mm: make start_isolate_page_range() fail if already isolated

2018-03-09 Thread Mike Kravetz
functionality. Signed-off-by: Mike Kravetz --- Changes in v2 * Updated commit message and comments as suggested by Andrew Morton mm/page_alloc.c | 8 mm/page_isolation.c | 18 +- 2 files changed, 21 insertions(+), 5 deletions(-) diff --git a/mm/page_alloc.c b/mm

Re: [PATCH v3] hugetlbfs: check for pgoff value overflow

2018-03-16 Thread Mike Kravetz
On 03/16/2018 03:17 AM, Michal Hocko wrote: > On Thu 08-03-18 16:27:26, Mike Kravetz wrote: > > OK, looks good to me. Hairy but seems to be the easiest way around this. > Acked-by: Michal Hocko > >> +/* >> + * Mask used when checking the page offset value passed

Re: [RFC PATCH V2 13/22] x86/intel_rdt: Support schemata write - pseudo-locking core

2018-02-20 Thread Mike Kravetz
llocation? case you are trying to move away from. Sorry, I have not been following development of this feature. If you would have to create a device to accept a user buffer, could you perhaps use the same device to create/hand out a contiguous mapping? -- Mike Kravetz

Re: [PATCH 5/6] mm, hugetlb: further simplify hugetlb allocation API

2018-02-21 Thread Mike Kravetz
h->surplus_huge_pages_node[page_to_nid(page)]++; > } > > out_unlock: I thought we had this corrected in a previous version of the patch. My apologies for not looking more closely at this version. FWIW, Reviewed-by: Mike Kravetz -- Mike Kravetz

Re: [PATCH -mm] mm, hugetlb: Pass fault address to no page handler

2018-05-14 Thread Mike Kravetz
25.6%, the > IPC (instruction per cycle) increased from 0.3 to 0.37, and the time > spent in user space is reduced ~19.3% Since this patch only addresses hugetlbfs huge pages, I would suggest making that more explicit in the commit message. Other than that, the changes look fine to me. >

Re: [PATCH] memcg, hugetlb: pages allocated for hugetlb's overcommit will be charged to memcg

2018-05-03 Thread Mike Kravetz
On 05/03/2018 05:09 PM, TSUKADA Koutaro wrote: > On 2018/05/03 11:33, Mike Kravetz wrote: >> On 05/01/2018 11:54 PM, TSUKADA Koutaro wrote: >>> On 2018/05/02 13:41, Mike Kravetz wrote: >>>> What is the reason for not charging pages at allocation/reserve time? I

Re: [PATCH 21/24] selftests: memfd: return Kselftest Skip code for skipped tests

2018-05-04 Thread Mike Kravetz
ksft_skip We now KNOW that we are running as root because of the check above. We can delete this test, and rely on the later check to determine if the number of huge pages was actually increased. How about this instead (untested)? Signed-off-by: Mike Kravetz diff --git a/tools/testing/selftests/

Re: [PATCH v2 21/24] selftests: memfd: return Kselftest Skip code for skipped tests

2018-05-07 Thread Mike Kravetz
n it. > > In addition, return skip code when not enough huge pages are available to > run the test. > > Kselftest framework SKIP code is 4 and the framework prints appropriate > messages to indicate that the test is skipped. > > Signed-off-by: Shuah Khan (Samsung OSG) Tha

Re: [PATCH] selftests: memfd: split regular and hugetlbfs tests

2018-05-11 Thread Mike Kravetz
DONE > ok 1..2 selftests: memfd: run_fuse_test.sh [PASS] > selftests: memfd: run_hugetlbfs_test.sh > > Please run memfd with hugetlbfs test as root > not ok 1..3 selftests: memfd: run_hugetlbfs_test.sh [SKIP] > > Signed-off-by: Shuah Khan (Samsung OSG) Thanks for all your

Re: [PATCH] memcg, hugetlb: pages allocated for hugetlb's overcommit will be charged to memcg

2018-05-01 Thread Mike Kravetz
not charged to a memcg. memcg charges in other code paths seem to happen at huge page allocation time. -- Mike Kravetz > > The page charged to memcg will finally be uncharged at free_huge_page. > > Modification of memcontrol.c is for updating of statistical information >

Re: [PATCH 2/3] mm: add find_alloc_contig_pages() interface

2018-05-02 Thread Mike Kravetz
On 04/21/2018 09:16 AM, Vlastimil Babka wrote: > On 04/17/2018 04:09 AM, Mike Kravetz wrote: >> find_alloc_contig_pages() is a new interface that attempts to locate >> and allocate a contiguous range of pages. It is provided as a more >> convenient interface than alloc

Re: [PATCH] memcg, hugetlb: pages allocated for hugetlb's overcommit will be charged to memcg

2018-05-02 Thread Mike Kravetz
On 05/01/2018 11:54 PM, TSUKADA Koutaro wrote: > On 2018/05/02 13:41, Mike Kravetz wrote: >> What is the reason for not charging pages at allocation/reserve time? I am >> not an expert in memcg accounting, but I would think the pages should be >> charged at allocation tim

[PATCH v2 0/4] Interface for higher order contiguous allocations

2018-05-03 Thread Mike Kravetz
urce Director Technology Cache Pseudo-Locking. Mike Kravetz (4): mm: change type of free_contig_range(nr_pages) to unsigned long mm: check for proper migrate type during isolation mm: add find_alloc_contig_pages() interface mm/hugetlb: use find_alloc_contig_pages() to allocate gigantic pa

[PATCH v2 2/4] mm: check for proper migrate type during isolation

2018-05-03 Thread Mike Kravetz
as there are two primary users. Contiguous range allocation which wants to enforce migration type checking. Memory offline (hotplug) which is not concerned about type checking. Signed-off-by: Mike Kravetz --- include/linux/page-isolation.h | 8 +++- mm/memory_hotplug.c| 2

[PATCH v2 4/4] mm/hugetlb: use find_alloc_contig_pages() to allocate gigantic pages

2018-05-03 Thread Mike Kravetz
Use the new find_alloc_contig_pages() interface for the allocation of gigantic pages and remove associated code in hugetlb.c. Signed-off-by: Mike Kravetz --- mm/hugetlb.c | 87 +--- 1 file changed, 6 insertions(+), 81 deletions(-) diff

[PATCH v2 1/4] mm: change type of free_contig_range(nr_pages) to unsigned long

2018-05-03 Thread Mike Kravetz
an unsigned int. However, this should be changed to an unsigned long to be consistent with other page counts. Signed-off-by: Mike Kravetz --- include/linux/gfp.h | 2 +- mm/cma.c| 2 +- mm/hugetlb.c| 2 +- mm/page_alloc.c | 6 +++--- 4 files changed, 6 insertions(+), 6

[PATCH v2 3/4] mm: add find_alloc_contig_pages() interface

2018-05-03 Thread Mike Kravetz
is employed if possible. There is no guarantee that the routine will succeed. So, the user must be prepared for failure and have a fall back plan. Signed-off-by: Mike Kravetz --- include/linux/gfp.h | 12 + mm/page_alloc.c | 136 +++- 2

Re: [PATCH -mm] mm, hugetlb: Pass fault address to no page handler

2018-05-16 Thread Mike Kravetz
a to consider? That gets back to Michal's question of a specific use case or generic optimization. Unless code is simple (as in this patch), seems like we should hold off on considering additional optimizations unless there is a specific use case. I'm still OK with this change. -- Mike Kravetz

Re: [PATCH -V2 -mm] mm, hugetlbfs: Pass fault address to no page handler

2018-05-17 Thread Mike Kravetz
hich generates heavy cache > pressure. At the same time, the cache miss rate reduced from ~36.3% > to ~25.6%, the IPC (instruction per cycle) increased from 0.3 to 0.37, > and the time spent in user space is reduced ~19.3%. > Agree with Michal that commit message looks better. I we

Re: [PATCH -mm] mm, huge page: Copy to access sub-page last when copy huge page

2018-05-18 Thread Mike Kravetz
sub-pages are to be copied. IIUC, you added the same algorithm for sub-page ordering to copy_huge_page() that was previously added to clear_huge_page(). Correct? If so, then perhaps a common helper could be used by both the clear and copy huge page routines. It would also make maintenance easier. -- Mike Kravetz

Re: [PATCH v2 1/4] mm: change type of free_contig_range(nr_pages) to unsigned long

2018-05-18 Thread Mike Kravetz
On 05/18/2018 02:12 AM, Vlastimil Babka wrote: > On 05/04/2018 01:29 AM, Mike Kravetz wrote: >> free_contig_range() is currently defined as: >> void free_contig_range(unsigned long pfn, unsigned nr_pages); >> change to, >> void free_contig_range(unsigned long

[PATCH] MAINTAINERS: Change hugetlbfs maintainer and update files

2018-05-18 Thread Mike Kravetz
. hugetlb.c and hugetlb.h are not 100% hugetlbfs, but a majority of their content is hugetlbfs related. Signed-off-by: Mike Kravetz --- MAINTAINERS | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 9051a9ca24a2..c7a5eb074eb1 100644

Re: [PATCH 2/2] mm,fork: introduce MADV_WIPEONFORK

2017-08-18 Thread Mike Kravetz
lar to the OpenBSD minherit syscall with MAP_INHERIT_ZERO: > > https://man.openbsd.org/minherit.2 > > Reported-by: Florian Weimer > Reported-by: Colm MacCártaigh > Signed-off-by: Rik van Riel My primary concern with the first suggested patch was trying to define semantics if MADV

Re: [PATCH v2] mm/hugetlb.c: make huge_pte_offset() consistent and document behaviour

2017-08-18 Thread Mike Kravetz
ted behaviour of this function. > This is to set clear semantics for architecture specific implementations > of huge_pte_offset(). > > Signed-off-by: Punit Agrawal > Cc: Catalin Marinas > Cc: Naoya Horiguchi > Cc: Steve Capper > Cc: Will Deacon > Cc: Kirill A. Shutem

Re: [RFC PATCH 3/3] mm/map_contig: Add mmap(MAP_CONTIG) support

2017-10-17 Thread Mike Kravetz
ence well enough to know if it would be possible for driver code to make CMA reservations. But, it looks doubtful. -- Mike Kravetz

[RFC PATCH 1/3] mm/map_contig: Add VM_CONTIG flag to vma struct

2017-10-11 Thread Mike Kravetz
Add the flag VM_CONTIG to vma structure to identify vmas which are backed by contiguous memory allocations. This flag is not propogated to child processes, so be sure to clear at fork time. Signed-off-by: Mike Kravetz --- include/linux/mm.h | 1 + kernel/fork.c | 2 +- 2 files changed, 2

[RFC PATCH 3/3] mm/map_contig: Add mmap(MAP_CONTIG) support

2017-10-11 Thread Mike Kravetz
-by: Mike Kravetz --- include/uapi/asm-generic/mman.h | 1 + mm/mmap.c | 94 + 2 files changed, 95 insertions(+) diff --git a/include/uapi/asm-generic/mman.h b/include/uapi/asm-generic/mman.h index 7162cd4cca73..e8046b4c4ac4 100644

[RFC PATCH 0/3] Add mmap(MAP_CONTIG) support

2017-10-11 Thread Mike Kravetz
. Also, the allocations should probably be done outside mmap_sem but that was the easiest place to do it in this quick and easy POC. I just wanted to throw out some code to get further ideas. It is far from complete. Mike Kravetz (3): mm/map_contig: Add VM_CONTIG flag to vma struct mm

[RFC PATCH 2/3] mm/map_contig: Use pre-allocated pages for VM_CONTIG mappings

2017-10-11 Thread Mike Kravetz
When populating mappings backed by contiguous memory allocations (VM_CONTIG), use the preallocated pages instead of allocating new. Signed-off-by: Mike Kravetz --- mm/memory.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/mm/memory.c b/mm/memory.c index

Re: [RFC PATCH 3/3] mm/map_contig: Add mmap(MAP_CONTIG) support

2017-10-12 Thread Mike Kravetz
On 10/12/2017 07:37 AM, Michal Hocko wrote: > On Wed 11-10-17 18:46:11, Mike Kravetz wrote: >> Add new MAP_CONTIG flag to mmap system call. Check for flag in normal >> mmap flag processing. If present, pre-allocate a contiguous set of >> pages to back the mapping. The

[PATCH 1/3] mm:hugetlb: Define system call hugetlb size encodings in single file

2017-07-31 Thread Mike Kravetz
for these encodings. Put common definitions in a single header file. The primary uapi header files for mmap and shm will use these definitions as a basis for definitions specific to those system calls. Signed-off-by: Mike Kravetz --- include/uapi/asm-generic/hugetlb_encode.h | 34

[PATCH 0/3] Consolidate system call hugetlb page size encodings

2017-07-31 Thread Mike Kravetz
header file, and add to user (uapi/linux/shm.h) header file. Add definitions for all known huge page size encodings as in mmap. [1]https://lkml.org/lkml/2017/3/8/548 Mike Kravetz (3): mm:hugetlb: Define system call hugetlb size encodings in single file mm: arch: Consolidate mmap hugetlb

[PATCH 2/3] mm: arch: Consolidate mmap hugetlb size encodings

2017-07-31 Thread Mike Kravetz
). Include definitions for all known huge page sizes. Use the generic encoding definitions in hugetlb_encode.h as the basis for these definitions. Signed-off-by: Mike Kravetz --- arch/alpha/include/uapi/asm/mman.h | 11 --- arch/mips/include/uapi/asm/mman.h | 11 --- arch

[PATCH 3/3] mm:shm: Use new hugetlb size encoding definitions

2017-07-31 Thread Mike Kravetz
Use the common definitions from hugetlb_encode.h header file for encoding hugetlb size definitions in shmget system call flags. In addition, move these definitions from the internal (kernel) to user (uapi) header file. Suggested-by: Matthew Wilcox Signed-off-by: Mike Kravetz --- include/linux

[RFC] mmap(MAP_CONTIG)

2017-10-03 Thread Mike Kravetz
) with some kludges to use the pages at fault time. It is really ugly, which is why I am not sharing the code. Hoping for some comments/suggestions. [1] https://www.linuxplumbersconf.org/2017/ocw/proposals/4669 -- Mike Kravetz

Re: [RFC] mmap(MAP_CONTIG)

2017-10-04 Thread Mike Kravetz
On 10/04/2017 04:54 AM, Michal Nazarewicz wrote: > On Tue, Oct 03 2017, Mike Kravetz wrote: >> At Plumbers this year, Guy Shattah and Christoph Lameter gave a presentation >> titled 'User space contiguous memory allocation for DMA' [1]. The slides >> point out the performanc

Re: [RFC] mmap(MAP_CONTIG)

2017-10-04 Thread Mike Kravetz
On 10/04/2017 06:49 AM, Anshuman Khandual wrote: > On 10/04/2017 05:26 AM, Mike Kravetz wrote: >> At Plumbers this year, Guy Shattah and Christoph Lameter gave a presentation >> titled 'User space contiguous memory allocation for DMA' [1]. The slides >> point out the

Re: [RFC] mmap(MAP_CONTIG)

2017-10-04 Thread Mike Kravetz
e populated at mmap time, and the pages locked. Therefore, there should be no swap or migration. -- Mike Kravetz

Re: [PATCH 1/2] mm: Introduce wrapper to access mm->nr_ptes

2017-10-04 Thread Mike Kravetz
dex 5624918154db..1c08f0136667 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -813,7 +813,7 @@ static struct mm_struct *mm_init(struct mm_struct *mm, > struct task_struct *p, > init_rwsem(>mmap_sem); > INIT_LIST_HEAD(>mmlist); > mm->core_state

Re: [RFC PATCH 3/3] mm/map_contig: Add mmap(MAP_CONTIG) support

2017-10-16 Thread Mike Kravetz
-allocate pages for their use, and this 'might' be something useful for contiguous allocations as well. I wonder if going down the path of a separate devide/filesystem/etc for contiguous allocations might be a better option. It would keep the implementation somewhat separate. However, I would then be afraid that we end up with another 'separate/special vm' as in the case of hugetlbfs today. -- Mike Kravetz

Re: [RFC PATCH 3/3] mm/map_contig: Add mmap(MAP_CONTIG) support

2017-10-16 Thread Mike Kravetz
On 10/16/2017 11:07 AM, Michal Hocko wrote: > On Mon 16-10-17 10:43:38, Mike Kravetz wrote: >> Just to be clear, the posix standard talks about a typed memory object. >> The suggested implementation has one create a connection to the memory >> object to receive a fd, then use

Re: [RFC PATCH 3/3] mm/map_contig: Add mmap(MAP_CONTIG) support

2017-10-16 Thread Mike Kravetz
On 10/16/2017 02:03 PM, Laura Abbott wrote: > On 10/16/2017 01:32 PM, Mike Kravetz wrote: >> On 10/16/2017 11:07 AM, Michal Hocko wrote: >>> On Mon 16-10-17 10:43:38, Mike Kravetz wrote: >>>> Just to be clear, the posix standard talks about a typed memory object. >

Re: [PATCH] mm/hugetlbfs: Remove the redundant -ENIVAL return from hugetlbfs_setattr()

2017-09-29 Thread Mike Kravetz
= hugetlb_vmtruncate(inode, attr->ia_size); > Thanks for noticing. I would hope the compiler is smarter than the code and optimize this away. Reviewed-by: Mike Kravetz -- Mike Kravetz

Re: [PATCH] mm, hugetlb: fix "treat_as_movable" condition in htlb_alloc_mask

2017-09-29 Thread Mike Kravetz
upported(), which is only there if ARCH_ENABLE_HUGEPAGE_MIGRATION is defined. IIUC, this functionality was added for powerpc. Yet, powerpc does not define ARCH_ENABLE_HUGEPAGE_MIGRATION (unless I am missing something). -- Mike Kravetz

Re: [RFC PATCH 0/5] mm, hugetlb: allocation API and migration improvements

2017-12-20 Thread Mike Kravetz
the cgroup limit', the migration may fail because of this. I like your new code below as it explicitly takes reserve and cgroup accounting out of the picture for migration. Let me think about it for another day before providing a Reviewed-by. -- Mike Kravetz >> I don't think this is a bu

Re: [PATCH v3 0/9] memfd: add sealing to hugetlb-backed memory

2017-12-20 Thread Mike Kravetz
On 12/20/2017 04:26 PM, Andrew Morton wrote: > On Wed, 20 Dec 2017 16:10:51 +0100 Michal Hocko wrote: > >> On Wed 20-12-17 15:15:50, Marc-André Lureau wrote: >>> Hi >>> >>> On Wed, Nov 15, 2017 at 4:13 AM, Mike Kravetz >>> wrote: >>>&

Re: [PATCH RFC 1/2] mm, hugetlb: unify core page allocation accounting and initialization

2017-11-28 Thread Mike Kravetz
page = alloc_fresh_huge_page_node(h, node); > - if (page) { > - ret = 1; > + page = __hugetlb_alloc_buddy_huge_page(h, gfp_mask, > + node, nodes_allowed); I don't have the greatest understanding of

Re: [PATCH RFC 2/2] mm, hugetlb: do not rely on overcommit limit during migration

2017-11-28 Thread Mike Kravetz
porary(hpage); > + ClearPageHugeTemporary(new_hpage); > + } > } > > unlock_page(hpage); > I'm still trying to wrap my head around all the different scenarios. In general, this new code only 'kicks in' if the there is not a free pre-allocated huge page for migration. Right? So, if there are free huge pages they are 'consumed' during migration and the number of available pre-allocated huge pages is reduced? Or, is that not exactly how it works? Or does it depend in the purpose of the migration? The only reason I ask is because this new method of allocating a surplus page (if successful) results in no decrease of available huge pages. Perhaps all migrations should attempt to allocate surplus pages and not impact the pre-allocated number of available huge pages. Or, perhaps I am just confused. :) -- Mike Kravetz

Re: [PATCH] hugetlbfs: change put_page/unlock_page order in hugetlbfs_fallocate()

2017-11-28 Thread Mike Kravetz
ed, if only to prevent future breakage or someone copy-pasting this >> code. >> >> Fixes: 70c3547e36f5c ("hugetlbfs: add hugetlbfs_fallocate()") >> >> cc: Eric Biggers >> cc: Mike Kravetz >> >> Signed-off-by: Nadav Amit >> --- &

Re: [RFC PATCH 3/5] mm, hugetlb: do not rely on overcommit limit during migration

2017-12-14 Thread Mike Kravetz
On 12/13/2017 11:40 PM, Michal Hocko wrote: > On Wed 13-12-17 15:35:33, Mike Kravetz wrote: >> On 12/04/2017 06:01 AM, Michal Hocko wrote: > [...] >>> Before migration >>> /sys/devices/system/node/node0/hugepages/hugepages-2048kB/free_hugepages:0 >>> /

Re: [RFC PATCH 4/5] mm, hugetlb: get rid of surplus page accounting tricks

2017-12-14 Thread Mike Kravetz
On 12/13/2017 11:50 PM, Michal Hocko wrote: > On Wed 13-12-17 16:45:55, Mike Kravetz wrote: >> On 12/04/2017 06:01 AM, Michal Hocko wrote: >>> From: Michal Hocko >>> >>> alloc_surplus_huge_page increases the pool size and the number of >>> surplus

Re: [RFC PATCH 5/5] mm, hugetlb: further simplify hugetlb allocation API

2017-12-14 Thread Mike Kravetz
ir excessive prefix underscores to make names shorter > This patch will need to be modified to take into account the incremental diff to patch 4 in this series. Other than that, the changes look good. Reviewed-by: Mike Kravetz -- Mike Kravetz > Signed-off-by: Michal Hocko > --- > m

Re: [RFC PATCH 0/5] mm, hugetlb: allocation API and migration improvements

2017-12-21 Thread Mike Kravetz
On 12/20/2017 11:28 PM, Michal Hocko wrote: > On Wed 20-12-17 14:43:03, Mike Kravetz wrote: >> On 12/20/2017 01:53 AM, Michal Hocko wrote: >>> On Wed 20-12-17 05:33:36, Naoya Horiguchi wrote: >>>> I have one comment on the code path from mbind(2). >>>

<    3   4   5   6   7   8   9   10   11   12   >