[ https://issues.apache.org/jira/browse/ARROW-15517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jonathan Keane resolved ARROW-15517. ------------------------------------ Resolution: Fixed Issue resolved by pull request 12316 [https://github.com/apache/arrow/pull/12316] > [R] Use WriteNode in write_dataset() > ------------------------------------ > > Key: ARROW-15517 > URL: https://issues.apache.org/jira/browse/ARROW-15517 > Project: Apache Arrow > Issue Type: Improvement > Components: R > Reporter: Neal Richardson > Assignee: Neal Richardson > Priority: Major > Labels: pull-request-available > Fix For: 8.0.0 > > Time Spent: 3h 50m > Remaining Estimate: 0h > > Currently, write_dataset uses the Scanner interface, which can't handle > everything that the ExecPlan does. So if your arrow_dplyr_query contains > things like aggregations or (more importantly) joins, you have to materialize > the Table in memory before you can write to disk. The WriteNode added in > ARROW-13542 is a special sink node that can be put at the end of an ExecPlan, > so data should be able to stream to disk in more cases, and will benefit from > future improvements to ExecPlan memory usage and spillover. -- This message was sent by Atlassian Jira (v8.20.7#820007)