Weston Pace created ARROW-15079:
-----------------------------------

             Summary: [C++] Add scheduler to constrain memory of exec plans
                 Key: ARROW-15079
                 URL: https://issues.apache.org/jira/browse/ARROW-15079
             Project: Apache Arrow
          Issue Type: New Feature
          Components: C++
            Reporter: Weston Pace


This is a high-level JIRA that will probably consist of a number of subtasks.  
Currently the exec plan supports backpressure.  This helps to limit the amount 
of data read in by a single query.  Spillover can also help reduce the amount 
of memory used by a single query.

However, if multiple queries are run concurrently, then the system will 
eventually run out of memory.  I'd like to propose a simple scheduler to start 
to solve this problem.

 * Initially, the scheduler will be given a configurable max memory target.  We 
will assume the user is responsible for ensuring this memory is not used 
elsewhere on the system.  For example, a user may configure 12GB of RAM for the 
scheduler and the user is responsible for ensuring that 12GB of RAM is not 
consumed by other processes.

In future JIRA issues we can add more sophistication such as detecting and 
adapting to the system memory levels.

 * The scheduler will track memory usage by querying for the RSS usage of the 
process.  An alternative approach would be to use a dedicated memory pool.  
This is more flexible (allows for multiple schedulers / query engines in a 
single process) but wouldn't capture the non-pool allocations (although these 
should be small).

 * The ExecPlan will be modified to submit its tasks to the scheduler instead 
of directly to the executor (the scheduler will submit tasks to the executor).  
Tasks will be prioritized.  If a task is higher priority then lower priority 
tasks will be pushed to disk.  Prioritization will be based initially on a 
tasks position in the ExecPlan.  Tasks closer to the sink will be higher 
priority.

 * The scheduler is separate from spillover.  Although it may interact with 
spillover mechanisms to ask for more spillover as memory pressure increases.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to