[jira] [Closed] (ARROW-14163) [C++] Naive spillover implementation for join

Weston Pace (Jira) Mon, 23 May 2022 20:33:08 -0700


     [ 
https://issues.apache.org/jira/browse/ARROW-14163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Weston Pace closed ARROW-14163.
-------------------------------
    Resolution: Won't Fix

It appears this will be addressed by ARROW-16389

> [C++] Naive spillover implementation for join
> ---------------------------------------------
>
>                 Key: ARROW-14163
>                 URL: https://issues.apache.org/jira/browse/ARROW-14163
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++
>            Reporter: Weston Pace
>            Assignee: Sasha Krassovsky
>            Priority: Major
>              Labels: query-engine
>             Fix For: 9.0.0
>
>
> A join is a pipeline breaker.  I believe the proposed join operators assume 
> that the data can fit into memory and queue all incoming batches.  For 
> example, if I understand correctly, 
> https://github.com/apache/arrow/pull/11150 queues the right side until the 
> left side had finished.
> There are many clever and interesting ways that this can be optimized  
> (divide & conquer, recursive query, prioritize reading the left side and 
> pause the right side read).  This issue is intentionally not clever or 
> interesting.
> Instead, I think it would be good to take advantage of this opportunity to 
> start fleshing out our spillover capabilities.  A very simplistic 
> implementation could be a standalone node which has 2 inputs and 2 outputs.  
> The node queues up all incoming data on the "right" input and lets the "left" 
> input pass through.  Then, when the left input has finished the node will 
> release the right input.
> This node could then implement a basic spillover mechanism (e.g. IPC to disk) 
> and start to flesh out the abstractions that we will eventually want to 
> handle different spillover strategies  (abort on spill, spill to disk, and 
> spill to s3 are all I can think of at the moment).



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

[jira] [Closed] (ARROW-14163) [C++] Naive spillover implementation for join

Reply via email to