On Wed, Aug 05, 2020 at 10:46:12PM -0700, Hugh Dickins wrote:
> On Mon, 27 Jul 2020, Greg KH wrote:
> >
> > Linus just pointed me at this thread.
> >
> > If you could run:
> > echo -n 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control
> > and run the same workload to see if anythi
On Fri, Aug 07, 2020 at 11:41:09AM -0700, Hugh Dickins wrote:
> My home testing was, effectively, on top of c6fe44d96fc1 (v5.8 plus
> your two patches): I did not have in the io_uring changes from the
> current tree. In glancing there, I noticed one new and one previous
> instance of SetPageWaiters
On Fri, Aug 7, 2020 at 11:41 AM Hugh Dickins wrote:
>
> +
> + /*
> +* If we hoped to pass PG_locked on to the next locker, but found
> +* no exclusive taker, then we must clear it before dropping q->lock.
> +* Otherwise unlock_page() can race trylock_page_bit_common()
On Thu, 6 Aug 2020, Linus Torvalds wrote:
> On Thu, Aug 6, 2020 at 11:00 AM Matthew Wilcox wrote:
> >
> > It wasn't clear to me whether Hugh thought it was an improvement or
> > not that trylock was now less likely to jump the queue. There're
> > the usual "fair is the opposite of throughput" kin
On Thu, Aug 6, 2020 at 11:00 AM Matthew Wilcox wrote:
>
> It wasn't clear to me whether Hugh thought it was an improvement or
> not that trylock was now less likely to jump the queue. There're
> the usual "fair is the opposite of throughput" kind of arguments.
Yeah, it could go either way. But o
On Thu, Aug 06, 2020 at 10:07:07AM -0700, Linus Torvalds wrote:
> On Wed, Aug 5, 2020 at 10:21 PM Hugh Dickins wrote:
> > Something I was interested to realize in looking at this: trylock_page()
> > on a contended lock is now much less likely to jump the queue and
> > succeed than before, since yo
On Wed, Aug 5, 2020 at 10:21 PM Hugh Dickins wrote:
>
> Something I was interested to realize in looking at this: trylock_page()
> on a contended lock is now much less likely to jump the queue and
> succeed than before, since your lock holder hands off the page lock to
> the next holder: much smal
On Mon, 27 Jul 2020, Greg KH wrote:
>
> Linus just pointed me at this thread.
>
> If you could run:
> echo -n 'module xhci_hcd =p' > /sys/kernel/debug/dynamic_debug/control
> and run the same workload to see if anything shows up in the log when
> xhci crashes, that would be great.
Thanks,
Nice to see the +130.0% this morning.
I got back on to this on Monday, here's some follow-up.
On Sun, 26 Jul 2020, Hugh Dickins wrote:
>
> The comparison runs have not yet completed (except for the one started
> early), but they have all got past the most interesting tests, and it's
> clear that
On Mon, Aug 3, 2020 at 6:14 AM Michal Hocko wrote:
>
> I hope I got it right and this is the latest version of your patches. Btw.
> do you still think that increasing PAGE_WAIT_TABLE_BITS is reasonable.
I suspect it's still very reasonable, but I'd love to have numbers for it.
> In the meantime
Hi,
sorry for being late in discussion (was offline or very busy with other
stuff).
I hope I got it right and this is the latest version of your patches. Btw.
do you still think that increasing PAGE_WAIT_TABLE_BITS is reasonable.
In the meantime I have learned that the customer suffering from the
On Sun, Jul 26, 2020 at 01:30:32PM -0700, Hugh Dickins wrote:
> On Sat, 25 Jul 2020, Hugh Dickins wrote:
> > On Sat, 25 Jul 2020, Hugh Dickins wrote:
> > > On Sat, 25 Jul 2020, Linus Torvalds wrote:
> > > > On Sat, Jul 25, 2020 at 3:14 AM Oleg Nesterov wrote:
> > > > >
> > > > > Heh. I too thought
On Sun, 26 Jul 2020, Linus Torvalds wrote:
> On Sun, Jul 26, 2020 at 1:30 PM Hugh Dickins wrote:
> >
> > I've deduced nothing useful from the logs, will have to leave that
> > to others here with more experience of them. But my assumption now
> > is that you have successfully removed one bottlene
On Sun, Jul 26, 2020 at 1:30 PM Hugh Dickins wrote:
>
> I've deduced nothing useful from the logs, will have to leave that
> to others here with more experience of them. But my assumption now
> is that you have successfully removed one bottleneck, so the tests
> get somewhat further and now stick
On Sat, 25 Jul 2020, Hugh Dickins wrote:
> On Sat, 25 Jul 2020, Hugh Dickins wrote:
> > On Sat, 25 Jul 2020, Linus Torvalds wrote:
> > > On Sat, Jul 25, 2020 at 3:14 AM Oleg Nesterov wrote:
> > > >
> > > > Heh. I too thought about this. And just in case, your patch looks
> > > > correct
> > > > t
Linus,
I was greatly confused and tried to confuse you.
Somehow I misunderstood your last version and didn't bother to read it
again until now.
Sorry for noise and thanks for your explanations.
Oleg.
On 07/25, Linus Torvalds wrote:
>
> On Sat, Jul 25, 2020 at 12:28 PM Oleg Nesterov wrote:
>
On Sat, 25 Jul 2020, Hugh Dickins wrote:
> On Sat, 25 Jul 2020, Linus Torvalds wrote:
> > On Sat, Jul 25, 2020 at 3:14 AM Oleg Nesterov wrote:
> > >
> > > Heh. I too thought about this. And just in case, your patch looks correct
> > > to me. But I can't really comment this behavioural change. Perh
On Sat, 25 Jul 2020, Linus Torvalds wrote:
> On Sat, Jul 25, 2020 at 3:14 AM Oleg Nesterov wrote:
> >
> > Heh. I too thought about this. And just in case, your patch looks correct
> > to me. But I can't really comment this behavioural change. Perhaps it
> > should come in a separate patch?
>
> We
On Sat, Jul 25, 2020 at 12:28 PM Oleg Nesterov wrote:
>
> What I tried to say. AFAICS before that commit we had (almost) the same
> behaviour you propose now: unlock_page/etc wakes all the non-exclusive
> waiters up.
>
> No?
Yes, but no.
We'd wake them _up_ fairly aggressively, but then they'd b
Firstly, to avoid the confusion, let me repeat I think your patch is fine.
I too thought that non-exclusive waiters do not care about the bit state
and thus wake_page_function() can simply wake them all up.
But then I did "git blame", found your commit 3510ca20ece0150 and came to
conclusion there
On Sat, Jul 25, 2020 at 3:14 AM Oleg Nesterov wrote:
>
> Heh. I too thought about this. And just in case, your patch looks correct
> to me. But I can't really comment this behavioural change. Perhaps it
> should come in a separate patch?
We could do that. At the same time, I think both parts chan
On 07/24, Linus Torvalds wrote:
>
> I just realized that one thing we could do is to not even test the
> page bit for the shared case in the wakeup path.
>
> Because anybody who uses the non-exclusive "wait_on_page_locked()" or
> "wait_on_page_writeback()" isn't actually interested in the bit state
On 07/24, Linus Torvalds wrote:
>
> And in fact, once you do it on top of that, it becomes obvious that we
> can share even more code: move the WQ_FLAG_WOKEN logic _into_ the
> trylock_page_bit_common() function.
Ah, indeed, somehow I didn't realize this,
> I added your reviewed-by, but maybe you
On Fri, Jul 24, 2020 at 7:08 PM Hugh Dickins wrote:
>
> But whatever, what happens on the next run, with these latest patches,
> will be more important; and I'll follow this next run with a run on
> the baseline without them, to compare results.
So the loads you are running are known to have sens
On Fri, 24 Jul 2020, Linus Torvalds wrote:
> On Fri, Jul 24, 2020 at 10:32 AM Linus Torvalds
> wrote:
> > Ok, that makes sense. Except you did it on top of the original patch
> > without the fix to set WQ_FLAG_WOKEN for the non-wakeup case.
>
> Hmm.
>
> I just realized that one thing we could d
On Fri, Jul 24, 2020 at 10:32 AM Linus Torvalds
wrote:
> Ok, that makes sense. Except you did it on top of the original patch
> without the fix to set WQ_FLAG_WOKEN for the non-wakeup case.
Hmm.
I just realized that one thing we could do is to not even test the
page bit for the shared case in th
On Fri, Jul 24, 2020 at 8:24 AM Oleg Nesterov wrote:
>
> not sure this makes any sense, but this looks like another user of
> trylock_page_bit_common(), see the patch below on top of 1/2.
Ok, that makes sense. Except you did it on top of the original patch
without the fix to set WQ_FLAG_WOKEN for
On 07/23, Linus Torvalds wrote:
>
> But I'll walk over my patch mentally one more time. Here's the current
> version, anyway.
Both patches look correct to me, feel free to add
Reviewed-by: Oleg Nesterov
> @@ -1013,18 +1014,40 @@ static int wake_page_function(wait_queue_entry_t
> *wait, unsigne
On 07/23, Linus Torvalds wrote:
>
> IOW, I think we should do something like this (this is on top of my
> patch, since it has that wake_page_function() change in it, but notice
> how we have the exact same issue in our traditional
> autoremove_wake_function() usage).
...
> +static inline void lis
On Thu, Jul 23, 2020 at 5:47 PM Linus Torvalds
wrote:
>
> On Thu, Jul 23, 2020 at 5:07 PM Hugh Dickins wrote:
> >
> > I say that for full disclosure, so you don't wrack your brains
> > too much, when it may still turn out to be a screwup on my part.
>
> Sounds unlikely.
>
> If that patch applied
On Thu, Jul 23, 2020 at 5:07 PM Hugh Dickins wrote:
>
> I say that for full disclosure, so you don't wrack your brains
> too much, when it may still turn out to be a screwup on my part.
Sounds unlikely.
If that patch applied even reasonably closely, I don't see how you'd
see a list corruption th
On Thu, 23 Jul 2020, Linus Torvalds wrote:
> On Thu, Jul 23, 2020 at 4:11 PM Hugh Dickins wrote:
> > On Thu, 23 Jul 2020, Linus Torvalds wrote:
> > >
> > > I'll send a new version after I actually test it.
> >
> > I'll give it a try when you're happy with it.
>
> Ok, what I described is what I've
On Thu, Jul 23, 2020 at 4:11 PM Hugh Dickins wrote:
>
> On Thu, 23 Jul 2020, Linus Torvalds wrote:
> >
> > I'll send a new version after I actually test it.
>
> I'll give it a try when you're happy with it.
Ok, what I described is what I've been running for a while now. But I
don't put much stres
On Thu, 23 Jul 2020, Linus Torvalds wrote:
>
> I'll send a new version after I actually test it.
I'll give it a try when you're happy with it. I did try yesterday's
with my swapping loads on home machines (3 of 4 survived 16 hours),
and with some google stresstests on work machines (0 of 10 survi
On Thu, Jul 23, 2020 at 10:32 AM Linus Torvalds
wrote:
>
> So here's a v2, now as a "real" commit with a commit message and everything.
Oh, except it's broken.
Switching from the "am I still on the list" logic to the
"WQ_FLAG_WOKEN is set if woken" logic was all well and good, but I
missed the c
On Thu, Jul 23, 2020 at 11:22 AM Linus Torvalds
wrote:
>
> So I think that is a separate issue, generic to our finish_wait() uses.
IOW, I think we should do something like this (this is on top of my
patch, since it has that wake_page_function() change in it, but notice
how we have the exact same
On Thu, Jul 23, 2020 at 11:01 AM Oleg Nesterov wrote:
>
> > + *
> > + * We _really_ should have a "list_del_init_careful()" to
> > + * properly pair with the unlocked "list_empty_careful()"
> > + * in finish_wait().
> > + */
> > + smp_mb();
> > + list_del_init(&wai
On 07/23, Linus Torvalds wrote:
>
> So here's a v2, now as a "real" commit with a commit message and everything.
I am already sleeping, will read it tomorrow, but at first glance...
> @@ -1013,18 +1014,40 @@ static int wake_page_function(wait_queue_entry_t
> *wait, unsigned mode, int sync,
>
On Thu, Jul 23, 2020 at 5:47 AM Oleg Nesterov wrote:
>
> I still can't convince myself thatI fully understand this patch but I see
> nothing really wrong after a quick glance...
I guess my comments should be extended further then.
Is there anything in particular you think is unclear?
> > +
On 07/22, Linus Torvalds wrote:
>
> Comments? Oleg, this should fix the race you talked about too.
Yes.
I still can't convince myself thatI fully understand this patch but I see
nothing really wrong after a quick glance...
> + * We can no longer use 'wait' after we've done the
> + * li
On Wed 22-07-20 11:29:20, Linus Torvalds wrote:
> On Tue, Jul 21, 2020 at 8:33 AM Linus Torvalds
> wrote:
> >
> > More likely, it's actually *caused* by that commit 11a19c7b099f, and
> > what might be happening is that other CPU's are just adding new
> > waiters to the list *while* we're waking th
On Wed, Jul 22, 2020 at 4:42 PM Linus Torvalds
wrote:
>
> NOTE NOTE NOTE! This is both somewhat subtle code, and ENTIRELY
> UNTESTED.
It seems to boot.
It adds more lines than it removes, but a lot of it is comments, and
while it's somewhat subtle, I think it's actually conceptually simpler
than
On Wed, Jul 22, 2020 at 3:10 PM Linus Torvalds
wrote:
>
> > + bool first_time = true;
> > bool thrashing = false;
> > bool delayacct = false;
> > unsigned long pflags;
> > @@ -1134,7 +1135,12 @@ static inline int wait_on_page_bit_commo
> > spin_lock_ir
On Wed, Jul 22, 2020 at 2:29 PM Hugh Dickins wrote:
>
> -#define PAGE_WAIT_TABLE_BITS 8
> +#define PAGE_WAIT_TABLE_BITS 10
Well, that seems harmless even on small machines.
> + bool first_time = true;
> bool thrashing = false;
> bool delayacct = false;
> unsigned lo
On Wed, 22 Jul 2020, Linus Torvalds wrote:
>
> I do wonder if we should make that PAGE_WAIT_TABLE_SIZE be larger. 256
> entries seems potentially ridiculously small, and aliasing not only
> increases the waitqueue length, it also potentially causes more
> contention on the waitqueue spinlock (whic
On Tue, Jul 21, 2020 at 8:33 AM Linus Torvalds
wrote:
>
> More likely, it's actually *caused* by that commit 11a19c7b099f, and
> what might be happening is that other CPU's are just adding new
> waiters to the list *while* we're waking things up, because somebody
> else already got the page lock a
On Tue 21-07-20 08:33:33, Linus Torvalds wrote:
> On Mon, Jul 20, 2020 at 11:33 PM Michal Hocko wrote:
> >
> > The lockup is in page_unlock in do_read_fault and I suspect that this is
> > yet another effect of a very long waitqueue chain which has been
> > addresses by 11a19c7b099f ("sched/wait: I
On Mon, Jul 20, 2020 at 11:33 PM Michal Hocko wrote:
>
> The lockup is in page_unlock in do_read_fault and I suspect that this is
> yet another effect of a very long waitqueue chain which has been
> addresses by 11a19c7b099f ("sched/wait: Introduce wakeup boomark in
> wake_up_page_bit") previously
On Tue 21-07-20 15:17:49, Chris Down wrote:
> I understand the pragmatic considerations here, but I'm quite concerned
> about the maintainability and long-term ability to reason about a patch like
> this. For example, how do we know when this patch is safe to remove? Also,
> what other precedent d
I understand the pragmatic considerations here, but I'm quite concerned about
the maintainability and long-term ability to reason about a patch like this.
For example, how do we know when this patch is safe to remove? Also, what other
precedent does this set for us covering for poor userspace b
On Tue, Jul 21, 2020 at 03:38:35PM +0200, Michal Hocko wrote:
> On Tue 21-07-20 09:23:44, Qian Cai wrote:
> > On Tue, Jul 21, 2020 at 02:17:52PM +0200, Michal Hocko wrote:
> > > On Tue 21-07-20 07:44:07, Qian Cai wrote:
> > > >
> > > >
> > > > > On Jul 21, 2020, at 7:25 AM, Michal Hocko wrote:
>
On Tue 21-07-20 09:23:44, Qian Cai wrote:
> On Tue, Jul 21, 2020 at 02:17:52PM +0200, Michal Hocko wrote:
> > On Tue 21-07-20 07:44:07, Qian Cai wrote:
> > >
> > >
> > > > On Jul 21, 2020, at 7:25 AM, Michal Hocko wrote:
> > > >
> > > > Are these really important? I believe I can dig that out f
On Tue, Jul 21, 2020 at 02:17:52PM +0200, Michal Hocko wrote:
> On Tue 21-07-20 07:44:07, Qian Cai wrote:
> >
> >
> > > On Jul 21, 2020, at 7:25 AM, Michal Hocko wrote:
> > >
> > > Are these really important? I believe I can dig that out from the bug
> > > report but I didn't really consider th
On Tue 21-07-20 07:44:07, Qian Cai wrote:
>
>
> > On Jul 21, 2020, at 7:25 AM, Michal Hocko wrote:
> >
> > Are these really important? I believe I can dig that out from the bug
> > report but I didn't really consider that important enough.
>
> Please dig them out. We have also been running tho
> On Jul 21, 2020, at 7:25 AM, Michal Hocko wrote:
>
> Are these really important? I believe I can dig that out from the bug
> report but I didn't really consider that important enough.
Please dig them out. We have also been running those things on “large” powerpc
as well and never saw such
On Tue 21-07-20 07:10:14, Qian Cai wrote:
>
>
> > On Jul 21, 2020, at 2:33 AM, Michal Hocko wrote:
> >
> > on a large ppc machine. The very likely cause is a suboptimal
> > configuration when systed-udev spawns way too many workders to bring the
> > system up.
>
> This is strange. The problem
> On Jul 21, 2020, at 2:33 AM, Michal Hocko wrote:
>
> on a large ppc machine. The very likely cause is a suboptimal
> configuration when systed-udev spawns way too many workders to bring the
> system up.
This is strange. The problem description is missing quite a few important
details. For
57 matches
Mail list logo