Hi Chen,

This is an interesting problem. I don't think it's particularly related to 
Flight, except for the fact that Flight services are likely to face problems 
like this, so I think it's good to discuss here.

The Java and C++ implementations of gRPC/Flight use a thread pool based model, 
yes. One thing where Flight could help you more is if it exposed the 
asynchronous nature of the underlying implementation. In particular, in Java, 
this would let you queue a request without having to tie up a thread. (You 
could just enqueue the request reader/response sender objects and return 
immediately, letting the thread be returned to the thread pool.) Off the top of 
my head, this would be quite achievable to implement [*]. 

> - The size of requests is heterogeneous, meaning some requests process a few 
> MBs of data and may just take a few hundred milliseconds, while others may 
> need to process hundreds of GBs of data taking hours.

Are you able to estimate or calculate this up front? If so, you could limit 
based on threads + request cost. Then hopefully whales would use less than its 
5% quota of threads/concurrent RPC calls.

If not, perhaps one way to approximate it is to do something like what cloud 
providers do for cheap instances: only allow the user to use their full RPC 
quota (per server) for a short time, then throttle down for a period after 
that. I think that won't help for scaling up, though, unless you also only 
grant the full quota after some time has passed after server startup.

Another solution might be to keep the quota, but also round-robin handling 
requests, that is, even if a user is technically under quota, if there are also 
requests from other users who haven't been served, prioritize those first. This 
is where I feel like there is almost certainly existing literature (queueing 
theory?) that I am not familiar enough with to reference, but which I would go 
start investigating.

[*]: But we'd end up duplicating all interfaces, so I'd like to also evaluate 
rewriting the current synchronous interface in terms of the asynchronous 
interface.

On Mon, Jan 23, 2023, at 23:01, Chen Song wrote:
> Hi All
> 
> 
> 
> I have a question on best practices on building scheduling / throttling 
> mechanisms on Flight data services in a multi-tenant environment.
> 
> 
> 
> My high level understandings are
> 
> - The Flight data service uses a thread-pool-based model. i.e. each data 
> server normally runs with a fixed-size thread pool. During processing, each 
> request will occupy the entire thread in its lifecycle.
> 
> - *The size of requests is heterogeneous, meaning some requests process a few 
> MBs of data and may just take a few hundred milliseconds, while others may 
> need to process hundreds of GBs of data taking hours.*
> 
> *- For simplicity, let's use thread as the unit of resources shared among 
> multiple users across data servers to facilitate discussion*
> 
> **
> 
> *One natural way to start is to only allow a user to use a share of the 
> thread pool per server. For example, each user is allowed to use up to 5% of 
> threads in the thread pool on a server.*
> 
> *- This mechanism, however, has a defect on fairness when there are many 
> whale users (users who send much more concurrent requests than the total 
> number threads allowed for that user from all servers). Using the example 
> above, if there are 20 such users (each taking 5% of the thread pool) at all 
> time, they will use up all threads in the fleet very quickly.*
> 
> *- Adding more servers doesn't solve this issue as each whale user will take 
> threads from the new servers quickly as well.*
> 
> **
> 
> *In other words, how to ensure fairness to not starve regular users when 
> there are many whale users?*
> 
> *My question is: is there any best practice **in Flight data services** to 
> handle this with *local* scheduling/throttling? Or this can be only solved 
> with global throttling: e.g., track concurrent requests from a user in a 
> centralized place, and then each Flight metadata or data service fetches the 
> user concurrencies periodically?*
> 
> **
> 
> *Thanks in advance.*
> 
> 
> --
> Chen Song
> 
> 
> 

Reply via email to