[ 
https://issues.apache.org/jira/browse/ARROW-15517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Keane resolved ARROW-15517.
------------------------------------
    Resolution: Fixed

Issue resolved by pull request 12316
[https://github.com/apache/arrow/pull/12316]

> [R] Use WriteNode in write_dataset()
> ------------------------------------
>
>                 Key: ARROW-15517
>                 URL: https://issues.apache.org/jira/browse/ARROW-15517
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Neal Richardson
>            Assignee: Neal Richardson
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 8.0.0
>
>          Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> Currently, write_dataset uses the Scanner interface, which can't handle 
> everything that the ExecPlan does. So if your arrow_dplyr_query contains 
> things like aggregations or (more importantly) joins, you have to materialize 
> the Table in memory before you can write to disk. The WriteNode added in 
> ARROW-13542 is a special sink node that can be put at the end of an ExecPlan, 
> so data should be able to stream to disk in more cases, and will benefit from 
> future improvements to ExecPlan memory usage and spillover.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to