BTW, another issue which the signal boost can cause is around locks. If the
scheduling thread is holding a lock when it signals that a task is
available and if there aren't enough cores available then the receiving
thread will be boosted, will take the CPU from the scheduling thread, try
to acquire the lock, fail, block on an event, the scheduling thread will
then (one hopes) be scheduled, will release the lock, and then the
receiving thread will wake up, grab the lock, grab the task, and start
running. If we hit this pattern then scheduling a single task can take
three context switches.

Gabriel and I brainstormed a few ways to investigate both the consequences
of priority boosting and why Chrome does so many context switches.

On Tue, Aug 28, 2018 at 9:26 AM Sami Kyostila <skyos...@chromium.org> wrote:

> I think I've seen instances of this problem even with the old IPC system:
> the sending thread is likely to get descheduled because the receiving
> thread is woken up before the former finished running. We kicked around an
> idea once about buffering message sends and only flushing them once the
> current task is finished -- maybe it would be time to revisit something
> like that?
>
> - Sami
>
> ke 22. elok. 2018 klo 1.15 Bruce Dawson (brucedaw...@chromium.org)
> kirjoitti:
>
>> I've definitely been bitten by this. On one game engine that I worked on
>> they would signal all of the worker threads when a task was ready. Due to
>> the priority boosting all of them would wake up and try to acquire the
>> scheduler lock. The scheduler lock was held by the thread that had signaled
>> all of the worker threads, which was reliably no longer running. And oh, by
>> the way, it was a spin lock, so the main thread couldn't release because it
>> wasn't running. The call to SetEvent() would frequently take 20 ms to
>> return.
>>
>> There were a lot of problems with this:
>>
>>    - Don't signal all of your worker threads when you have just one task
>>    - Don't use a spin lock
>>
>> In this case the priority raising made the issues critical, but it wasn't
>> the underlying issue.
>>
>> I commented on the bug. I do think this is worth exploring, but there are
>> probably cases where we rely on this priority boost to avoid starvation or
>> improve response times. It's possible that we'd see better results by
>> somehow reducing the number of cross-thread/cross-process messages we send,
>> somehow.
>>
>> Also, note that on systems with enough cores the priority boost can
>> become irrelevant - two communicating threads will migrate to different
>> cores and both will continue running. So, our workstations will behave
>> fundamentally differently from customer machines. Yay.
>>
>> On Mon, Aug 20, 2018 at 4:37 PM Gabriel Charette <g...@chromium.org>
>> wrote:
>>
>>> Hello scheduler devs (and *v8/chromium-mojo* friends -- sorry for
>>> cross-posting; see related note below).
>>>
>>> Some kernels give a boost to a thread when the resource it was waiting
>>> on is signaled (lock, event, pipe, file I/O, etc.). Some platforms document
>>> this
>>> <https://docs.microsoft.com/en-us/windows/desktop/procthread/priority-boosts>;
>>> on others we've anecdotally observed things that make us believe they do.
>>>
>>> I think this might be hurting Chrome's task system.
>>>
>>> The Chrome semantics when signaling a thread is often "hey, you have
>>> work, you should run soon"; not "hey, please do this work ASAP"; I think...
>>> This is certainly the case for TaskScheduler use cases, I'm not so sure
>>> about input use cases (e.g. 16 thread hops to respond to input IIRC; boost
>>> probably helps that chain a lot..?).
>>> But in a case where there are many messages (e.g. *mojo*), this means
>>> many context switches (send one message; switch; process one message;
>>> switch back; etc.).
>>>
>>> https://crbug.com/872248#c4 suggests that MessageLoop::ScheduleWork()
>>> is really expensive (though there may be sampling bias there --
>>> investigation in progress).
>>>
>>> https://crbug.com/872248 also suggests that the Blink main thread is
>>> descheduled while it's trying to signal workers to help it on a parallel
>>> task (I've observed this first hand when working in *v8* this winter
>>> but didn't know what to think of it then trace1
>>> <https://drive.google.com/file/d/1YFC8lh67rCEQOMA2_A8i7BlFw_NHkCma/view?usp=sharing>
>>>  trace2
>>> <https://drive.google.com/file/d/1prrkIlNApLNeu-ppL_5PQT8a2opgKubb/view?usp=sharing>
>>> ).
>>>
>>> On Windows we can tweak this with
>>> ::SetProcessPriorityBoost/SetThreadPriorityBoost(). Not sure about POSIX. I
>>> might try to experiment with this (feels scary..!).
>>>
>>> In the meantime I figured it would at least be good to inform all of you
>>> so you no longer scratch your head at these occasional unexplained latency
>>> delays in traces.
>>>
>>> Cheers!
>>> Gab
>>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "scheduler-dev" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to scheduler-dev+unsubscr...@chromium.org.
>> To post to this group, send email to scheduler-...@chromium.org.
>> To view this discussion on the web visit
>> https://groups.google.com/a/chromium.org/d/msgid/scheduler-dev/CAE5mQiNLNRQiCyv%2BNLU8X7ToQ-s-wRt%2BQgy2B%2BcjwuEKfCu%2B5g%40mail.gmail.com
>> <https://groups.google.com/a/chromium.org/d/msgid/scheduler-dev/CAE5mQiNLNRQiCyv%2BNLU8X7ToQ-s-wRt%2BQgy2B%2BcjwuEKfCu%2B5g%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
-- 
v8-dev mailing list
v8-dev@googlegroups.com
http://groups.google.com/group/v8-dev
--- 
You received this message because you are subscribed to the Google Groups 
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to v8-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to