Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-29 Thread Andrea Arcangeli
Hello, On Mon, Oct 29, 2018 at 11:08:34AM +0100, Michal Hocko wrote: > This seems like a separate issue which should better be debugged. Please > open a new thread describing the problem and the state of the node. Yes, in my view it should be evaluated separately too, because it's overall less co

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-29 Thread Michal Hocko
On Mon 29-10-18 20:42:53, Balbir Singh wrote: > On Mon, Oct 29, 2018 at 10:00:35AM +0100, Michal Hocko wrote: [...] > > These hugetlb allocations might be disruptive and that is an expected > > behavior because this is an explicit requirement from an admin to > > pre-allocate large pages for the fu

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-29 Thread Balbir Singh
On Mon, Oct 29, 2018 at 10:00:35AM +0100, Michal Hocko wrote: > On Mon 29-10-18 16:17:52, Balbir Singh wrote: > [...] > > I wonder if alloc_pool_huge_page() should also trim out it's logic > > of __GFP_THISNODE for the same reasons as mentioned here. I like > > that we round robin to alloc the pool

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-29 Thread Michal Hocko
On Mon 29-10-18 16:17:52, Balbir Singh wrote: [...] > I wonder if alloc_pool_huge_page() should also trim out it's logic > of __GFP_THISNODE for the same reasons as mentioned here. I like > that we round robin to alloc the pool pages, but __GFP_THISNODE > might be an overkill for that case as well.

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-28 Thread Balbir Singh
On Tue, Sep 25, 2018 at 02:03:25PM +0200, Michal Hocko wrote: > From: Andrea Arcangeli > > THP allocation might be really disruptive when allocated on NUMA system > with the local node full or hard to reclaim. Stefan has posted an > allocation stall report on 4.12 based SLES kernel which suggests

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-28 Thread David Rientjes
On Mon, 22 Oct 2018, Zi Yan wrote: > Hi David, > Hi! > On 22 Oct 2018, at 17:04, David Rientjes wrote: > > > On Tue, 16 Oct 2018, Mel Gorman wrote: > > > > > I consider this to be an unfortunate outcome. On the one hand, we have a > > > problem that three people can trivially reproduce with k

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-23 Thread Mel Gorman
On Tue, Oct 23, 2018 at 08:57:45AM +0100, Mel Gorman wrote: > Note that I accept it's trivial to fragment memory in a harmful way. > I've prototyped a test case yesterday that uses fio in the following way > to fragment memory > > o fio of many small files (64K) > o create initial pages using writ

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-23 Thread Mel Gorman
On Mon, Oct 22, 2018 at 02:04:32PM -0700, David Rientjes wrote: > On Tue, 16 Oct 2018, Mel Gorman wrote: > > > I consider this to be an unfortunate outcome. On the one hand, we have a > > problem that three people can trivially reproduce with known test cases > > and a patch shown to resolve the p

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-22 Thread Zi Yan
Hi David, On 22 Oct 2018, at 17:04, David Rientjes wrote: On Tue, 16 Oct 2018, Mel Gorman wrote: I consider this to be an unfortunate outcome. On the one hand, we have a problem that three people can trivially reproduce with known test cases and a patch shown to resolve the problem. Two of t

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-22 Thread David Rientjes
On Tue, 16 Oct 2018, Mel Gorman wrote: > I consider this to be an unfortunate outcome. On the one hand, we have a > problem that three people can trivially reproduce with known test cases > and a patch shown to resolve the problem. Two of those three people work > on distributions that are exposed

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-22 Thread David Rientjes
On Mon, 15 Oct 2018, Andrea Arcangeli wrote: > > On Mon, 15 Oct 2018 15:30:17 -0700 (PDT) David Rientjes > > wrote: > > > Would it be possible to test with my > > > patch[*] that does not try reclaim to address the thrashing issue? > > > > Yes please. > > It'd also be great if a testcase rep

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-22 Thread David Rientjes
On Mon, 15 Oct 2018, Andrea Arcangeli wrote: > > At the risk of beating a dead horse that has already been beaten, what are > > the plans for this patch when the merge window opens? It would be rather > > unfortunate for us to start incurring a 14% increase in access latency and > > 40% increa

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-17 Thread Mel Gorman
On Tue, Oct 16, 2018 at 03:37:15PM -0700, Andrew Morton wrote: > On Tue, 16 Oct 2018 08:46:06 +0100 Mel Gorman wrote: > > I consider this to be an unfortunate outcome. On the one hand, we have a > > problem that three people can trivially reproduce with known test cases > > and a patch shown to re

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-17 Thread Michal Hocko
On Tue 16-10-18 16:16:43, Andrew Morton wrote: > On Tue, 16 Oct 2018 19:11:49 -0400 Andrea Arcangeli > wrote: > > > This was a severe regression > > compared to previous kernels that made important workloads unusable > > and it starts when __GFP_THISNODE was added to THP allocations under > > MA

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-16 Thread Andrew Morton
On Tue, 16 Oct 2018 19:11:49 -0400 Andrea Arcangeli wrote: > This was a severe regression > compared to previous kernels that made important workloads unusable > and it starts when __GFP_THISNODE was added to THP allocations under > MADV_HUGEPAGE. It is not a significant risk to go to the previou

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-16 Thread Andrea Arcangeli
Hello, On Tue, Oct 16, 2018 at 03:37:15PM -0700, Andrew Morton wrote: > we'll still make it into 4.19.1. Am reluctant to merge this while > discussion, testing and possibly more development are ongoing. I think there can be definitely more developments primarily to make the compact deferred logi

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-16 Thread Andrew Morton
On Tue, 16 Oct 2018 08:46:06 +0100 Mel Gorman wrote: > On Mon, Oct 15, 2018 at 03:44:59PM -0700, Andrew Morton wrote: > > On Mon, 15 Oct 2018 15:30:17 -0700 (PDT) David Rientjes > > wrote: > > > > > At the risk of beating a dead horse that has already been beaten, what > > > are > > > the pl

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-16 Thread Mel Gorman
On Mon, Oct 15, 2018 at 03:44:59PM -0700, Andrew Morton wrote: > On Mon, 15 Oct 2018 15:30:17 -0700 (PDT) David Rientjes > wrote: > > > At the risk of beating a dead horse that has already been beaten, what are > > the plans for this patch when the merge window opens? > > I'll hold onto it unt

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-15 Thread Andrea Arcangeli
Hello Andrew, On Mon, Oct 15, 2018 at 03:44:59PM -0700, Andrew Morton wrote: > On Mon, 15 Oct 2018 15:30:17 -0700 (PDT) David Rientjes > wrote: > > Would it be possible to test with my > > patch[*] that does not try reclaim to address the thrashing issue? > > Yes please. It'd also be great i

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-15 Thread Andrea Arcangeli
On Mon, Oct 15, 2018 at 03:30:17PM -0700, David Rientjes wrote: > At the risk of beating a dead horse that has already been beaten, what are > the plans for this patch when the merge window opens? It would be rather > unfortunate for us to start incurring a 14% increase in access latency and >

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-15 Thread Andrew Morton
On Mon, 15 Oct 2018 15:30:17 -0700 (PDT) David Rientjes wrote: > At the risk of beating a dead horse that has already been beaten, what are > the plans for this patch when the merge window opens? I'll hold onto it until we've settled on something. Worst case, Andrea's original is easily backp

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-15 Thread David Rientjes
On Wed, 10 Oct 2018, David Rientjes wrote: > > I think "madvise vs mbind" is more an issue of "no-permission vs > > permission" required. And if the processes ends up swapping out all > > other process with their memory already allocated in the node, I think > > some permission is correct to be re

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-10 Thread David Rientjes
On Tue, 9 Oct 2018, Andrea Arcangeli wrote: > I think "madvise vs mbind" is more an issue of "no-permission vs > permission" required. And if the processes ends up swapping out all > other process with their memory already allocated in the node, I think > some permission is correct to be required,

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-10 Thread David Rientjes
On Tue, 9 Oct 2018, Andrea Arcangeli wrote: > On Tue, Oct 09, 2018 at 03:17:30PM -0700, David Rientjes wrote: > > causes workloads to severely regress both in fault and access latency when > > we know that direct reclaim is unlikely to make direct compaction free an > > entire pageblock. It's m

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-10 Thread Vlastimil Babka
On 10/10/18 12:51 AM, Andrea Arcangeli wrote: > Yes there's one case where reclaim is "pointless", but it happens once > and then COMPACT_DEFERRED is returned and __GFP_NORETRY will skip > reclaim then. > > So you're right when we hit fragmentation there's one and only one > "pointless" reclaim in

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-09 Thread Andrea Arcangeli
On Tue, Oct 09, 2018 at 04:25:10PM +0200, Michal Hocko wrote: > On Tue 09-10-18 14:00:34, Mel Gorman wrote: > > On Tue, Oct 09, 2018 at 02:27:45PM +0200, Michal Hocko wrote: > > > [Sorry for being slow in responding but I was mostly offline last few > > > days] > > > > > > On Tue 09-10-18 10:48:2

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-09 Thread Andrea Arcangeli
On Tue, Oct 09, 2018 at 03:17:30PM -0700, David Rientjes wrote: > causes workloads to severely regress both in fault and access latency when > we know that direct reclaim is unlikely to make direct compaction free an > entire pageblock. It's more likely than not that the reclaim was > pointless

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-09 Thread Andrea Arcangeli
On Mon, Oct 08, 2018 at 01:41:09PM -0700, David Rientjes wrote: > The page allocator is expecting __GFP_NORETRY for thp allocations per its > comment: > > /* >* Checks for costly allocations with __GFP_NORETRY, which >* includes THP page fault allocat

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-09 Thread David Rientjes
On Tue, 9 Oct 2018, Mel Gorman wrote: > > The page allocator is expecting __GFP_NORETRY for thp allocations per its > > comment: > > > > /* > > * Checks for costly allocations with __GFP_NORETRY, which > > * includes THP page fault allocations > >

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-09 Thread Mel Gorman
On Tue, Oct 09, 2018 at 04:25:10PM +0200, Michal Hocko wrote: > On Tue 09-10-18 14:00:34, Mel Gorman wrote: > > On Tue, Oct 09, 2018 at 02:27:45PM +0200, Michal Hocko wrote: > > > [Sorry for being slow in responding but I was mostly offline last few > > > days] > > > > > > On Tue 09-10-18 10:48:2

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-09 Thread Michal Hocko
On Tue 09-10-18 14:00:34, Mel Gorman wrote: > On Tue, Oct 09, 2018 at 02:27:45PM +0200, Michal Hocko wrote: > > [Sorry for being slow in responding but I was mostly offline last few > > days] > > > > On Tue 09-10-18 10:48:25, Mel Gorman wrote: > > [...] > > > This goes back to my point that the M

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-09 Thread Vlastimil Babka
On 10/8/18 10:41 PM, David Rientjes wrote: > + /* > + * If faulting a hugepage, it is very unlikely that > + * thrashing the zonelist is going to assist compaction > + * in freeing an entire pageblock. There are no

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-09 Thread Mel Gorman
On Tue, Oct 09, 2018 at 02:27:45PM +0200, Michal Hocko wrote: > [Sorry for being slow in responding but I was mostly offline last few > days] > > On Tue 09-10-18 10:48:25, Mel Gorman wrote: > [...] > > This goes back to my point that the MADV_HUGEPAGE hint should not make > > promises about local

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-09 Thread Michal Hocko
[Sorry for being slow in responding but I was mostly offline last few days] On Tue 09-10-18 10:48:25, Mel Gorman wrote: [...] > This goes back to my point that the MADV_HUGEPAGE hint should not make > promises about locality and that introducing MADV_LOCAL for specialised > libraries may be more

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-09 Thread Mel Gorman
On Mon, Oct 08, 2018 at 01:41:09PM -0700, David Rientjes wrote: > On Fri, 5 Oct 2018, Andrea Arcangeli wrote: > > > I tried to add just __GFP_NORETRY but it changes nothing. Try it > > yourself if you think that can resolve the swap storm and excessive > > reclaim CPU overhead... and see if it wor

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-08 Thread David Rientjes
On Fri, 5 Oct 2018, Andrea Arcangeli wrote: > I tried to add just __GFP_NORETRY but it changes nothing. Try it > yourself if you think that can resolve the swap storm and excessive > reclaim CPU overhead... and see if it works. I didn't intend to > reinvent the wheel with __GFP_COMPACT_ONLY, if __

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-05 Thread Andrea Arcangeli
Hello, On Thu, Oct 04, 2018 at 04:05:26PM -0700, David Rientjes wrote: > The source of the problem needs to be addressed: memory compaction. We > regress because we lose __GFP_NORETRY and pointlessly try reclaim, but I commented in detail about the __GFP_NORETRY topic in the other email so I w

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-05 Thread Andrea Arcangeli
Hi, On Fri, Oct 05, 2018 at 01:35:15PM -0700, David Rientjes wrote: > Why is it ever appropriate to do heavy reclaim and swap activity to > allocate a transparent hugepage? This is exactly what the __GFP_NORETRY > check for high-order allocations is attempting to avoid, and it explicitly > sta

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-05 Thread David Rientjes
On Fri, 5 Oct 2018, Mel Gorman wrote: > > This causes, on average, a 13.9% access latency regression on Haswell, and > > the regression would likely be more severe on Naples and Rome. > > > > That assumes that fragmentation prevents easy allocation which may very > well be the case. While it wo

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-05 Thread Mel Gorman
On Thu, Oct 04, 2018 at 01:16:32PM -0700, David Rientjes wrote: > On Tue, 25 Sep 2018, Michal Hocko wrote: > > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > > index da858f794eb6..149b6f4cf023 100644 > > --- a/mm/mempolicy.c > > +++ b/mm/mempolicy.c > > @@ -2046,8 +2046,36 @@ alloc_pages_vma(gfp_t

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-04 Thread David Rientjes
On Thu, 4 Oct 2018, Andrea Arcangeli wrote: > Hello David, > Hi Andrea, > On Thu, Oct 04, 2018 at 01:16:32PM -0700, David Rientjes wrote: > > There are ways to address this without introducing regressions for > > existing users of MADV_HUGEPAGE: introduce an madvise() mode to accept > > remot

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-04 Thread Andrea Arcangeli
Hello David, On Thu, Oct 04, 2018 at 01:16:32PM -0700, David Rientjes wrote: > There are ways to address this without introducing regressions for > existing users of MADV_HUGEPAGE: introduce an madvise() mode to accept > remote thp allocations, which users of this library would never set, or >

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-10-04 Thread David Rientjes
On Tue, 25 Sep 2018, Michal Hocko wrote: > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > index da858f794eb6..149b6f4cf023 100644 > --- a/mm/mempolicy.c > +++ b/mm/mempolicy.c > @@ -2046,8 +2046,36 @@ alloc_pages_vma(gfp_t gfp, int order, struct > vm_area_struct *vma, > nmask = poli

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-09-25 Thread Michal Hocko
On Tue 25-09-18 13:20:08, Mel Gorman wrote: > On Tue, Sep 25, 2018 at 02:03:25PM +0200, Michal Hocko wrote: > > From: Andrea Arcangeli > > > > THP allocation might be really disruptive when allocated on NUMA system > > with the local node full or hard to reclaim. Stefan has posted an > > allocati

Re: [PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-09-25 Thread Mel Gorman
On Tue, Sep 25, 2018 at 02:03:25PM +0200, Michal Hocko wrote: > From: Andrea Arcangeli > > THP allocation might be really disruptive when allocated on NUMA system > with the local node full or hard to reclaim. Stefan has posted an > allocation stall report on 4.12 based SLES kernel which suggests

[PATCH 1/2] mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings

2018-09-25 Thread Michal Hocko
From: Andrea Arcangeli THP allocation might be really disruptive when allocated on NUMA system with the local node full or hard to reclaim. Stefan has posted an allocation stall report on 4.12 based SLES kernel which suggests the same issue: [245513.362669] kvm: page allocation stalls for 194572