Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-25 Thread Tim Chen
On 08/24/2017 01:44 PM, Mel Gorman wrote: > On Thu, Aug 24, 2017 at 11:16:15AM -0700, Linus Torvalds wrote: >> On Thu, Aug 24, 2017 at 10:49 AM, Tim Chen >> wrote: >>> >>> These changes look fine. We are testing them now. >>> Does the second patch in the series look okay to you? >> >> I didn't r

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-24 Thread Mel Gorman
On Thu, Aug 24, 2017 at 11:16:15AM -0700, Linus Torvalds wrote: > On Thu, Aug 24, 2017 at 10:49 AM, Tim Chen wrote: > > > > These changes look fine. We are testing them now. > > Does the second patch in the series look okay to you? > > I didn't really have any reaction to that one, as long as Me

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-24 Thread Linus Torvalds
On Thu, Aug 24, 2017 at 10:49 AM, Tim Chen wrote: > > These changes look fine. We are testing them now. > Does the second patch in the series look okay to you? I didn't really have any reaction to that one, as long as Mel&co are ok with it, I'm fine with it. Linus

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-24 Thread Tim Chen
On 08/23/2017 04:30 PM, Linus Torvalds wrote: > On Wed, Aug 23, 2017 at 11:17 AM, Linus Torvalds > wrote: >> On Wed, Aug 23, 2017 at 8:58 AM, Tim Chen wrote: >>> >>> Will you still consider the original patch as a fail safe mechanism? >> >> I don't think we have much choice, although I would *rea

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-23 Thread Linus Torvalds
On Wed, Aug 23, 2017 at 11:17 AM, Linus Torvalds wrote: > On Wed, Aug 23, 2017 at 8:58 AM, Tim Chen wrote: >> >> Will you still consider the original patch as a fail safe mechanism? > > I don't think we have much choice, although I would *really* want to > get this root-caused rather than just pa

RE: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-23 Thread Liang, Kan
> > On Wed, Aug 23, 2017 at 8:58 AM, Tim Chen > wrote: > > > > Will you still consider the original patch as a fail safe mechanism? > > I don't think we have much choice, although I would *really* want to get this > root-caused rather than just papering over the symptoms. > > Maybe still worth

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-23 Thread Linus Torvalds
On Wed, Aug 23, 2017 at 8:58 AM, Tim Chen wrote: > > Will you still consider the original patch as a fail safe mechanism? I don't think we have much choice, although I would *really* want to get this root-caused rather than just papering over the symptoms. Maybe still worth testing that "sched/n

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-23 Thread Mel Gorman
On Tue, Aug 22, 2017 at 11:19:12AM -0700, Linus Torvalds wrote: > On Tue, Aug 22, 2017 at 10:23 AM, Liang, Kan wrote: > > > > Although the patch doesn't trigger watchdog, the spin lock wait time > > is not small (0.45s). > > It may get worse again on larger systems. > > Yeah, I don't think Mel's

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-23 Thread Tim Chen
On 08/23/2017 07:49 AM, Liang, Kan wrote: >> On Tue, Aug 22, 2017 at 12:55 PM, Liang, Kan wrote: >>> So I propose testing the attached trivial patch. >>> >>> It doesn’t work. >>> The call stack is the same. >> >> So I would have expected the stack trace to be the same, and I would even >> exp

RE: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-23 Thread Liang, Kan
> Subject: Re: [PATCH 1/2] sched/wait: Break up long wake list walk > > On Tue, Aug 22, 2017 at 2:24 PM, Andi Kleen wrote: > > > > I believe in this case it's used by threads, so a reference count > > limit wouldn't help. > > For the first migration

RE: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-23 Thread Liang, Kan
> On Tue, Aug 22, 2017 at 12:55 PM, Liang, Kan wrote: > > > >> So I propose testing the attached trivial patch. > > > > It doesn’t work. > > The call stack is the same. > > So I would have expected the stack trace to be the same, and I would even > expect the CPU usage to be fairly similar, becau

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-22 Thread Linus Torvalds
On Tue, Aug 22, 2017 at 3:52 PM, Linus Torvalds wrote: > > The *other* memory policies look fairly sane. They basically have a > fairly well-defined preferred node for the policy (although the > "MPOL_INTERLEAVE" looks wrong for a hugepage). But > MPOL_PREFERRED/MPOL_F_LOCAL really looks complete

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-22 Thread Linus Torvalds
On Tue, Aug 22, 2017 at 2:24 PM, Andi Kleen wrote: > > I believe in this case it's used by threads, so a reference count limit > wouldn't help. For the first migration try, yes. But if it's some kind of "try and try again" pattern, the second time you try and there are people waiting for the page

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-22 Thread Andi Kleen
On Tue, Aug 22, 2017 at 04:08:52PM -0500, Christopher Lameter wrote: > On Tue, 22 Aug 2017, Andi Kleen wrote: > > > We only see it on 4S+ today. But systems are always getting larger, > > so what's a large system today, will be a normal medium scale system > > tomorrow. > > > > BTW we also collect

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-22 Thread Christopher Lameter
On Tue, 22 Aug 2017, Andi Kleen wrote: > We only see it on 4S+ today. But systems are always getting larger, > so what's a large system today, will be a normal medium scale system > tomorrow. > > BTW we also collected PT traces for the long hang cases, but it was > hard to find a consistent patter

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-22 Thread Linus Torvalds
On Tue, Aug 22, 2017 at 1:53 PM, Peter Zijlstra wrote: > > So _the_ problem with yield() is when you hit this with a RT task it > will busy spin and possibly not allow the task that actually has the > lock to make progress at all. I thought we had explicitly defined yield() to not do that. But I

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-22 Thread Peter Zijlstra
On Tue, Aug 22, 2017 at 01:42:13PM -0700, Linus Torvalds wrote: > +void wait_on_page_bit_or_yield(struct page *page, int bit_nr) > +{ > + if (PageWaiters(page)) { > + yield(); > + return; > + } > + wait_on_page_bit(page, bit_nr); > +} So _the_ problem with yield

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-22 Thread Linus Torvalds
On Tue, Aug 22, 2017 at 12:55 PM, Liang, Kan wrote: > >> So I propose testing the attached trivial patch. > > It doesn’t work. > The call stack is the same. So I would have expected the stack trace to be the same, and I would even expect the CPU usage to be fairly similar, because you'd see repea

RE: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-22 Thread Liang, Kan
> So I propose testing the attached trivial patch. It doesn’t work. The call stack is the same. 100.00% (821af140) | ---wait_on_page_bit __migration_entry_wait migration_entry_wait do_swap_page __h

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-22 Thread Andi Kleen
> > Still, generating such a migration storm would be fairly tricky I think. > > Well, Mel seems to have been unable to generate a load that reproduces > the long page waitqueues. And I don't think we've had any other > reports of this either. It could be that it requires a fairly large system. O

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-22 Thread Linus Torvalds
On Tue, Aug 22, 2017 at 12:08 PM, Peter Zijlstra wrote: > > So that migration stuff has a filter on, we need two consecutive numa > faults from the same page_cpupid 'hash', see > should_numa_migrate_memory(). Hmm. That is only called for MPOL_F_MORON. We don't actually know what policy the probl

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-22 Thread Linus Torvalds
On Tue, Aug 22, 2017 at 11:56 AM, Peter Zijlstra wrote: > > Won't we now prematurely terminate the wait when we get a spurious > wakeup? I think there's two answers to that: (a) do we even care? (b) what spurious wakeup? The "do we even care" quesiton is because wait_on_page_bit by definitio

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-22 Thread Peter Zijlstra
On Tue, Aug 22, 2017 at 11:19:12AM -0700, Linus Torvalds wrote: > And since everybody touches it, as a result everybody eventually > thinks that page should be migrated to their NUMA node. So that migration stuff has a filter on, we need two consecutive numa faults from the same page_cpupid 'hash'

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-22 Thread Peter Zijlstra
On Tue, Aug 22, 2017 at 11:19:12AM -0700, Linus Torvalds wrote: > diff --git a/mm/filemap.c b/mm/filemap.c > index a49702445ce0..75c29a1f90fb 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -991,13 +991,11 @@ static inline int > wait_on_page_bit_common(wait_queue_head_t *q, >

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-22 Thread Linus Torvalds
On Tue, Aug 22, 2017 at 11:19 AM, Linus Torvalds wrote: > > So I propose testing the attached trivial patch. It may not do > anything at all. But the existing code is actually doing extra work > just to be fragile, in case the scenario above can happen. Side note: the patch compiles for me. But t

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-22 Thread Linus Torvalds
On Tue, Aug 22, 2017 at 10:23 AM, Liang, Kan wrote: > > Although the patch doesn't trigger watchdog, the spin lock wait time > is not small (0.45s). > It may get worse again on larger systems. Yeah, I don't think Mel's patch is great - because I think we could do so much better. What I like abou

RE: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-22 Thread Liang, Kan
> > Covering both paths would be something like the patch below which > > spins until the page is unlocked or it should reschedule. It's not > > even boot tested as I spent what time I had on the test case that I > > hoped would be able to prove it really works. > > I will give it a try. Althoug

RE: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-21 Thread Liang, Kan
> > Because that code sequence doesn't actually depend on > > "wait_on_page_lock()" for _correctness_ anyway, afaik. Anybody who > > does "migration_entry_wait()" _has_ to retry anyway, since the page > > table contents may have changed by waiting. > > > > So I'm not proud of the attached patch, an

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-21 Thread Mel Gorman
On Fri, Aug 18, 2017 at 12:14:12PM -0700, Linus Torvalds wrote: > On Fri, Aug 18, 2017 at 11:54 AM, Mel Gorman > wrote: > > > > One option to mitigate (but not eliminate) the problem is to record when > > the page lock is contended and pass in TNF_PAGE_CONTENDED (new flag) to > > task_numa_fault()

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-18 Thread Linus Torvalds
On Fri, Aug 18, 2017 at 1:29 PM, Liang, Kan wrote: > Here is the profiling with THP disabled for wait_on_page_bit_common and > wake_up_page_bit. > > > The call stack of wait_on_page_bit_common > # Overhead Trace output > # .. > # >100.00% (821aefca) >

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-18 Thread Linus Torvalds
On Fri, Aug 18, 2017 at 1:05 PM, Andi Kleen wrote: > > I think what's happening is that it allows more parallelism during wakeup: > > Normally it's like > > CPU 1 CPU 2 CPU 3 > . > > LOCK > wake up tasks on other CPUswoken up

RE: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-18 Thread Liang, Kan
> >> > >> That indicates that it may be a hot page and it's possible that the > >> page is locked for a short time but waiters accumulate. What happens > >> if you leave NUMA balancing enabled but disable THP? > > > > No, disabling THP doesn't help the case. > > Interesting. That particular code

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-18 Thread Linus Torvalds
On Fri, Aug 18, 2017 at 12:58 PM, Andi Kleen wrote: >> which is hacky, but there's a rationale for it: >> >> (a) avoid the crazy long wait queues ;) >> >> (b) we know that migration is *supposed* to be CPU-bound (not IO >> bound), so yielding the CPU and retrying may just be the right thing >> t

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-18 Thread Andi Kleen
> I was really hoping that we'd root-cause this and have a solution (and > then apply Tim's patch as a "belt and suspenders" kind of thing), but One thing I wanted to point out is that Tim's patch seems to make several schedule intensive micro benchmarks faster. I think what's happening is that

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-18 Thread Andi Kleen
> which is hacky, but there's a rationale for it: > > (a) avoid the crazy long wait queues ;) > > (b) we know that migration is *supposed* to be CPU-bound (not IO > bound), so yielding the CPU and retrying may just be the right thing > to do. So this would degenerate into a spin when the conte

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-18 Thread Linus Torvalds
On Fri, Aug 18, 2017 at 11:54 AM, Mel Gorman wrote: > > One option to mitigate (but not eliminate) the problem is to record when > the page lock is contended and pass in TNF_PAGE_CONTENDED (new flag) to > task_numa_fault(). Well, finding it contended is fairly easy - just look at the page wait qu

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-18 Thread Mel Gorman
On Fri, Aug 18, 2017 at 10:48:23AM -0700, Linus Torvalds wrote: > On Fri, Aug 18, 2017 at 9:53 AM, Liang, Kan wrote: > > > >> On Fri, Aug 18, 2017 Mel Gorman wrote: > >> > >> That indicates that it may be a hot page and it's possible that the page is > >> locked for a short time but waiters accumu

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-18 Thread Linus Torvalds
On Fri, Aug 18, 2017 at 9:53 AM, Liang, Kan wrote: > >> On Fri, Aug 18, 2017 Mel Gorman wrote: >> >> That indicates that it may be a hot page and it's possible that the page is >> locked for a short time but waiters accumulate. What happens if you leave >> NUMA balancing enabled but disable THP?

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-18 Thread Linus Torvalds
On Fri, Aug 18, 2017 at 5:23 AM, Mel Gorman wrote: > > new_page = alloc_pages_node(node, > - (GFP_TRANSHUGE_LIGHT | __GFP_THISNODE), > + (GFP_TRANSHUGE_LIGHT | __GFP_THISNODE) & > ~__GFP_DIRECT_RECLAIM, > HPAGE_PMD_ORDER); That can't make any d

RE: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-18 Thread Liang, Kan
> On Fri, Aug 18, 2017 at 02:20:38PM +, Liang, Kan wrote: > > > Nothing fancy other than needing a comment if it works. > > > > > > > No, the patch doesn't work. > > > > That indicates that it may be a hot page and it's possible that the page is > locked for a short time but waiters accumula

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-18 Thread Andi Kleen
> Still, I don't think this problem is THP specific. If there is a hot > regular page getting migrated, we'll also see many threads get > queued up quickly. THP may have made the problem worse as migrating > it takes a longer time, meaning more threads could get queued up. Also THP probably make

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-18 Thread Tim Chen
On 08/18/2017 07:46 AM, Mel Gorman wrote: > On Fri, Aug 18, 2017 at 02:20:38PM +, Liang, Kan wrote: >>> Nothing fancy other than needing a comment if it works. >>> >> >> No, the patch doesn't work. >> > > That indicates that it may be a hot page and it's possible that the page is > locked for

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-18 Thread Mel Gorman
On Fri, Aug 18, 2017 at 02:20:38PM +, Liang, Kan wrote: > > Nothing fancy other than needing a comment if it works. > > > > No, the patch doesn't work. > That indicates that it may be a hot page and it's possible that the page is locked for a short time but waiters accumulate. What happens

RE: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-18 Thread Liang, Kan
> On Thu, Aug 17, 2017 at 01:44:40PM -0700, Linus Torvalds wrote: > > On Thu, Aug 17, 2017 at 1:18 PM, Liang, Kan wrote: > > > > > > Here is the call stack of wait_on_page_bit_common when the queue is > > > long (entries >1000). > > > > > > # Overhead Trace output > > > #

RE: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-18 Thread Liang, Kan
> On Thu, Aug 17, 2017 at 1:18 PM, Liang, Kan wrote: > > > > Here is the call stack of wait_on_page_bit_common when the queue is > > long (entries >1000). > > > > # Overhead Trace output > > # .. > > # > >100.00% (931aefca) > > | > >

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-18 Thread Mel Gorman
On Thu, Aug 17, 2017 at 01:44:40PM -0700, Linus Torvalds wrote: > On Thu, Aug 17, 2017 at 1:18 PM, Liang, Kan wrote: > > > > Here is the call stack of wait_on_page_bit_common > > when the queue is long (entries >1000). > > > > # Overhead Trace output > > # .. > > # > >

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-17 Thread Linus Torvalds
On Thu, Aug 17, 2017 at 1:18 PM, Liang, Kan wrote: > > Here is the call stack of wait_on_page_bit_common > when the queue is long (entries >1000). > > # Overhead Trace output > # .. > # >100.00% (931aefca) > | > ---wait_on_page_bit >

RE: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-17 Thread Liang, Kan
> > Here is the wake_up_page_bit call stack when the workaround is running, > which > > is collected by perf record -g -a -e probe:wake_up_page_bit -- sleep 10 > > It's actually not really wake_up_page_bit() that is all that > interesting, it would be more interesting to see which path it is tha

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-17 Thread Linus Torvalds
On Thu, Aug 17, 2017 at 9:17 AM, Liang, Kan wrote: > > We tried both 12 and 16 bit table and that didn't make a difference. > The long wake ups are mostly on the same page when we do instrumentation Ok. > Here is the wake_up_page_bit call stack when the workaround is running, which > is collecte

RE: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-17 Thread Liang, Kan
> On Mon, Aug 14, 2017 at 5:52 PM, Tim Chen > wrote: > > We encountered workloads that have very long wake up list on large > > systems. A waker takes a long time to traverse the entire wake list > > and execute all the wake functions. > > > > We saw page wait list that are up to 3700+ entries lo

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-16 Thread Eric W. Biederman
Linus Torvalds writes: > On Tue, Aug 15, 2017 at 3:57 PM, Linus Torvalds > wrote: >> >> Oh, and the page wait-queue really needs that key argument too, which >> is another thing that swait queue code got rid of in the name of >> simplicity. > > Actually, it gets worse. > > Because the page wait

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-15 Thread Linus Torvalds
On Tue, Aug 15, 2017 at 3:57 PM, Linus Torvalds wrote: > > Oh, and the page wait-queue really needs that key argument too, which > is another thing that swait queue code got rid of in the name of > simplicity. Actually, it gets worse. Because the page wait queues are hashed, it's not an all-or-n

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-15 Thread Linus Torvalds
On Tue, Aug 15, 2017 at 3:56 PM, Linus Torvalds wrote: > > Except they really don't actually work for this case, exactly because > they also simplify away "minor" details like exclusive vs > non-exclusive etc. > > The page wait-queue very much has a mix of "wake all" and "wake one" > semantics.

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-15 Thread Linus Torvalds
On Tue, Aug 15, 2017 at 3:47 PM, Davidlohr Bueso wrote: > > Or you can always use wake_qs; which exists _exactly_ for the issues you > are running into Except they really don't actually work for this case, exactly because they also simplify away "minor" details like exclusive vs non-exclusive etc

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-15 Thread Davidlohr Bueso
On Mon, 14 Aug 2017, Linus Torvalds wrote: On Mon, Aug 14, 2017 at 8:15 PM, Andi Kleen wrote: But what should we do when some other (non page) wait queue runs into the same problem? Hopefully the same: root-cause it. Or you can always use wake_qs; which exists _exactly_ for the issues you

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-15 Thread Linus Torvalds
On Tue, Aug 15, 2017 at 12:41 PM, Linus Torvalds wrote: > > So if we have unnecessarily collisions because we have waiters looking > at different bits of the same page, we could just hash in the bit > number that we're waiting for too. Oh, nope, we can't do that, because we only have one "PageWat

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-15 Thread Linus Torvalds
On Tue, Aug 15, 2017 at 12:05 PM, Tim Chen wrote: > > We have a test case but it is a customer workload. We'll try to get > a bit more info. Ok. Being a customer workload is lovely in the sense that it is actually a real load, not just a microbecnhmark. But yeah, it makes it harder to describe

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-15 Thread Tim Chen
On 08/14/2017 08:28 PM, Linus Torvalds wrote: > On Mon, Aug 14, 2017 at 8:15 PM, Andi Kleen wrote: >> But what should we do when some other (non page) wait queue runs into the >> same problem? > > Hopefully the same: root-cause it. > > Once you have a test-case, it should generally be fairly sim

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-14 Thread Linus Torvalds
On Mon, Aug 14, 2017 at 8:15 PM, Andi Kleen wrote: > But what should we do when some other (non page) wait queue runs into the > same problem? Hopefully the same: root-cause it. Once you have a test-case, it should generally be fairly simple to do with profiles, just seeing who the caller is whe

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-14 Thread Andi Kleen
> That is what the patch does now, and that is why I dislike the patch. > > So I _am_ NAK'ing the patch if nobody is willing to even try alternatives. Ok, perhaps larger hash table is the right solution for this one. But what should we do when some other (non page) wait queue runs into the same

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-14 Thread Linus Torvalds
On Mon, Aug 14, 2017 at 7:27 PM, Andi Kleen wrote: > > We could try it and it may even help in this case and it may > be a good idea in any case on such a system, but: > > - Even with a large hash table it might be that by chance all CPUs > will be queued up on the same page > - There are a lot of

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-14 Thread Andi Kleen
On Mon, Aug 14, 2017 at 06:48:06PM -0700, Linus Torvalds wrote: > On Mon, Aug 14, 2017 at 5:52 PM, Tim Chen wrote: > > We encountered workloads that have very long wake up list on large > > systems. A waker takes a long time to traverse the entire wake list and > > execute all the wake functions.

Re: [PATCH 1/2] sched/wait: Break up long wake list walk

2017-08-14 Thread Linus Torvalds
On Mon, Aug 14, 2017 at 5:52 PM, Tim Chen wrote: > We encountered workloads that have very long wake up list on large > systems. A waker takes a long time to traverse the entire wake list and > execute all the wake functions. > > We saw page wait list that are up to 3700+ entries long in tests of