[GitHub] [arrow] alamb commented on pull request #8283: ARROW-9707: [Rust] [DataFusion] DataFusion Scheduler Prototype [WIP]

2020-11-14 Thread GitBox
alamb commented on pull request #8283: URL: https://github.com/apache/arrow/pull/8283#issuecomment-727190329 @andygrove your description makes sense. Thank you for the ideas that this PR sparked. I agree that an csv --> parquet converter would be really useful and cool. We will li

[GitHub] [arrow] alamb commented on pull request #8283: ARROW-9707: [Rust] [DataFusion] DataFusion Scheduler Prototype [WIP]

2020-11-13 Thread GitBox
alamb commented on pull request #8283: URL: https://github.com/apache/arrow/pull/8283#issuecomment-726756633 @andygrove I wonder what, if anything, you plan to do with this PR now This is an automated message from the Apache

[GitHub] [arrow] alamb commented on pull request #8283: ARROW-9707: [Rust] [DataFusion] DataFusion Scheduler Prototype [WIP]

2020-09-28 Thread GitBox
alamb commented on pull request #8283: URL: https://github.com/apache/arrow/pull/8283#issuecomment-700198208 BTW @jorgecarleitao -- I really like your ideas regarding using async streams in `ExecutionPlan` -- I think it sounds like a very elegant way to implement back pressure (and avoid

[GitHub] [arrow] alamb commented on pull request #8283: ARROW-9707: [Rust] [DataFusion] DataFusion Scheduler Prototype [WIP]

2020-09-28 Thread GitBox
alamb commented on pull request #8283: URL: https://github.com/apache/arrow/pull/8283#issuecomment-700197576 > When I run the TPC-H query I am testing against a data set that has 240 Parquet files. If we just try and run everything at once with async/await and have tokio do the scheduling,