[jira] [Resolved] (ARROW-16302) [C++] Null values in partitioning field for FilenamePartitioning

2022-05-26 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weston Pace resolved ARROW-16302. - Fix Version/s: 9.0.0 Resolution: Fixed Issue resolved by pull request 12977

[jira] [Comment Edited] (ARROW-16670) [R] Behaviour of R-specific key/value metadata in the query engine

2022-05-26 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17542659#comment-17542659 ] Weston Pace edited comment on ARROW-16670 at 5/26/22 8:33 PM: -- {quote}I

[jira] [Commented] (ARROW-16670) [R] Behaviour of R-specific key/value metadata in the query engine

2022-05-26 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17542659#comment-17542659 ] Weston Pace commented on ARROW-16670: - {blockquote} I wonder if ignoring the R metadata for query

[jira] [Commented] (ARROW-16660) [C#] Add support for Time32Array and Time64Array

2022-05-26 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17542658#comment-17542658 ] Weston Pace commented on ARROW-16660: - Done. I've also given you permission to assign issues to

[jira] [Assigned] (ARROW-16660) [C#] Add support for Time32Array and Time64Array

2022-05-26 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weston Pace reassigned ARROW-16660: --- Assignee: Rishabh Rana > [C#] Add support for Time32Array and Time64Array >

[jira] [Resolved] (ARROW-16659) [C++] Remove ambiguous constructor for VectorKernel

2022-05-25 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weston Pace resolved ARROW-16659. - Fix Version/s: 9.0.0 Resolution: Fixed Issue resolved by pull request 13235

[jira] [Resolved] (ARROW-16646) [C++] HashJoin node can crash if a key column is a scalar

2022-05-25 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weston Pace resolved ARROW-16646. - Fix Version/s: 9.0.0 Resolution: Fixed Issue resolved by pull request 13236

[jira] [Commented] (ARROW-16642) [C++] An Error Occured While Reading Parquet File Using C++ - GetRecordBatchReader -Corrupt snappy compressed data.

2022-05-25 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17542245#comment-17542245 ] Weston Pace commented on ARROW-16642: - You might need to provide a few more details on how you are

[jira] [Commented] (ARROW-16609) [C++] xxhash not installed into dist/lib/include when building C++

2022-05-25 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17542118#comment-17542118 ] Weston Pace commented on ARROW-16609: - I think [~apitrou]'s output would be valuable here. I

[jira] [Resolved] (ARROW-15583) [C++] The Substrait consumer could potentially use a massive amount of RAM if the producer uses large anchors

2022-05-25 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-15583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weston Pace resolved ARROW-15583. - Fix Version/s: 9.0.0 Resolution: Fixed Issue resolved by pull request 12852

[jira] [Created] (ARROW-16646) [C++] HashJoin node can crash if a key column is a scalar

2022-05-24 Thread Weston Pace (Jira)
Weston Pace created ARROW-16646: --- Summary: [C++] HashJoin node can crash if a key column is a scalar Key: ARROW-16646 URL: https://issues.apache.org/jira/browse/ARROW-16646 Project: Apache Arrow

[jira] [Closed] (ARROW-14163) [C++] Naive spillover implementation for join

2022-05-23 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-14163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weston Pace closed ARROW-14163. --- Resolution: Won't Fix It appears this will be addressed by ARROW-16389 > [C++] Naive spillover

[jira] [Assigned] (ARROW-16637) [C++] Add row-based utilities for encoding a batch and merging row tables

2022-05-23 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weston Pace reassigned ARROW-16637: --- Assignee: Weston Pace > [C++] Add row-based utilities for encoding a batch and merging row

[jira] [Created] (ARROW-16637) [C++] Add row-based utilities for encoding a batch and merging row tables

2022-05-23 Thread Weston Pace (Jira)
Weston Pace created ARROW-16637: --- Summary: [C++] Add row-based utilities for encoding a batch and merging row tables Key: ARROW-16637 URL: https://issues.apache.org/jira/browse/ARROW-16637 Project:

[jira] [Updated] (ARROW-16590) [C++] Consolidate files dealing with row-major storage, add some helper methods

2022-05-23 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weston Pace updated ARROW-16590: Description: We've built up a number of utilities that are based around a row-major encoding.

[jira] [Commented] (ARROW-16632) [Website] Announce Acero Engine

2022-05-23 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17541066#comment-17541066 ] Weston Pace commented on ARROW-16632: - This is a good outline. > 2 Acero's current state Acero's

[jira] [Created] (ARROW-16626) [C++] Name the C++ streaming execution engine

2022-05-20 Thread Weston Pace (Jira)
Weston Pace created ARROW-16626: --- Summary: [C++] Name the C++ streaming execution engine Key: ARROW-16626 URL: https://issues.apache.org/jira/browse/ARROW-16626 Project: Apache Arrow Issue

[jira] [Resolved] (ARROW-15779) [Python] Create python bindings for Substrait consumer

2022-05-20 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-15779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weston Pace resolved ARROW-15779. - Fix Version/s: 9.0.0 Resolution: Fixed Issue resolved by pull request 12672

[jira] [Resolved] (ARROW-15534) [C++] Add convenience function to substrait consumer to create plan instead of declaration

2022-05-20 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-15534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weston Pace resolved ARROW-15534. - Fix Version/s: 9.0.0 Resolution: Fixed Issue resolved by pull request 13181

[jira] [Commented] (ARROW-16609) [C++] xxhash not installed into dist/lib/include when building C++

2022-05-19 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17539746#comment-17539746 ] Weston Pace commented on ARROW-16609: - {{arrow/util/hashing.h}} (which should probably be named

[jira] [Resolved] (ARROW-15498) [C++][Compute] Implement Bloom filter pushdown between hash joins

2022-05-17 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-15498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weston Pace resolved ARROW-15498. - Fix Version/s: 9.0.0 Resolution: Fixed Issue resolved by pull request 12289

[jira] [Commented] (ARROW-16549) [C++] Simplify AggregateNodeOptions aggregates/targets

2022-05-17 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17538287#comment-17538287 ] Weston Pace commented on ARROW-16549: - Moving {{names}} inside of the {{Aggregate}} makes a lot of

[jira] [Created] (ARROW-16590) [C++] Consolidate files dealing with row-major storage, add some helper methods

2022-05-16 Thread Weston Pace (Jira)
Weston Pace created ARROW-16590: --- Summary: [C++] Consolidate files dealing with row-major storage, add some helper methods Key: ARROW-16590 URL: https://issues.apache.org/jira/browse/ARROW-16590

[jira] [Commented] (ARROW-16574) [C++] TSAN failure in arrow-ipc-read-write-test

2022-05-16 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17537686#comment-17537686 ] Weston Pace commented on ARROW-16574: - I hope to get to it this week but if someone else wants to

[jira] [Created] (ARROW-16587) [Python][C++] Add more failure tests to execution engine / substrait consumer

2022-05-16 Thread Weston Pace (Jira)
Weston Pace created ARROW-16587: --- Summary: [Python][C++] Add more failure tests to execution engine / substrait consumer Key: ARROW-16587 URL: https://issues.apache.org/jira/browse/ARROW-16587 Project:

[jira] [Assigned] (ARROW-16498) [C++] Fix potential deadlock in arrow::compute::TaskScheduler

2022-05-13 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weston Pace reassigned ARROW-16498: --- Assignee: Weston Pace > [C++] Fix potential deadlock in arrow::compute::TaskScheduler >

[jira] [Resolved] (ARROW-16498) [C++] Fix potential deadlock in arrow::compute::TaskScheduler

2022-05-13 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weston Pace resolved ARROW-16498. - Fix Version/s: 9.0.0 Resolution: Fixed Issue resolved by pull request 13091

[jira] [Created] (ARROW-16574) TSAN failure in arrow-ipc-read-write-test

2022-05-13 Thread Weston Pace (Jira)
Weston Pace created ARROW-16574: --- Summary: TSAN failure in arrow-ipc-read-write-test Key: ARROW-16574 URL: https://issues.apache.org/jira/browse/ARROW-16574 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-16550) [C++] Add support for Substrait cast expression

2022-05-12 Thread Weston Pace (Jira)
Weston Pace created ARROW-16550: --- Summary: [C++] Add support for Substrait cast expression Key: ARROW-16550 URL: https://issues.apache.org/jira/browse/ARROW-16550 Project: Apache Arrow Issue

[jira] [Created] (ARROW-16549) [C++] Simplify AggregateNodeOptions aggregates/targets

2022-05-12 Thread Weston Pace (Jira)
Weston Pace created ARROW-16549: --- Summary: [C++] Simplify AggregateNodeOptions aggregates/targets Key: ARROW-16549 URL: https://issues.apache.org/jira/browse/ARROW-16549 Project: Apache Arrow

[jira] [Commented] (ARROW-16531) [Python] Lint rules do not seem to be getting enforced

2022-05-12 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17536217#comment-17536217 ] Weston Pace commented on ARROW-16531: - Yikes. Thanks for looking into this. I can upgrade my

[jira] [Commented] (ARROW-15591) [C++] Add support for aggregation to the Substrait consumer

2022-05-12 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-15591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17535917#comment-17535917 ] Weston Pace commented on ARROW-15591: - First of all, how did you get that cool red text? The

[jira] [Commented] (ARROW-16389) [C++] Support hash-join on larger than memory datasets

2022-05-11 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17535807#comment-17535807 ] Weston Pace commented on ARROW-16389: - Attaching a design for this created by [~michalno]:

[jira] [Commented] (ARROW-16531) [Python] Lint rules do not seem to be getting enforced

2022-05-11 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17535008#comment-17535008 ] Weston Pace commented on ARROW-16531: - I've attached the full list of errors. They seem to all come

[jira] [Updated] (ARROW-16531) [Python] Lint rules do not seem to be getting enforced

2022-05-11 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weston Pace updated ARROW-16531: Attachment: python-lint-errs > [Python] Lint rules do not seem to be getting enforced >

[jira] [Created] (ARROW-16531) [Python] Lint rules do not seem to be getting enforced

2022-05-11 Thread Weston Pace (Jira)
Weston Pace created ARROW-16531: --- Summary: [Python] Lint rules do not seem to be getting enforced Key: ARROW-16531 URL: https://issues.apache.org/jira/browse/ARROW-16531 Project: Apache Arrow

[jira] [Closed] (ARROW-15297) [C++] The write node options shouldn't require a schema

2022-05-11 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-15297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weston Pace closed ARROW-15297. --- Resolution: Fixed > [C++] The write node options shouldn't require a schema >

[jira] [Commented] (ARROW-15297) [C++] The write node options shouldn't require a schema

2022-05-11 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-15297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17535000#comment-17535000 ] Weston Pace commented on ARROW-15297: - Yes, I think so. > [C++] The write node options shouldn't

[jira] [Created] (ARROW-16525) [C++] Tee node not properly marking node finished

2022-05-10 Thread Weston Pace (Jira)
Weston Pace created ARROW-16525: --- Summary: [C++] Tee node not properly marking node finished Key: ARROW-16525 URL: https://issues.apache.org/jira/browse/ARROW-16525 Project: Apache Arrow Issue

[jira] [Created] (ARROW-16524) [C++] Add generic multi-output node

2022-05-10 Thread Weston Pace (Jira)
Weston Pace created ARROW-16524: --- Summary: [C++] Add generic multi-output node Key: ARROW-16524 URL: https://issues.apache.org/jira/browse/ARROW-16524 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-16523) [C++] Move ExecPlan scheduling into the plan

2022-05-10 Thread Weston Pace (Jira)
Weston Pace created ARROW-16523: --- Summary: [C++] Move ExecPlan scheduling into the plan Key: ARROW-16523 URL: https://issues.apache.org/jira/browse/ARROW-16523 Project: Apache Arrow Issue

[jira] [Created] (ARROW-16522) [C++] Evolution of exec plan

2022-05-10 Thread Weston Pace (Jira)
Weston Pace created ARROW-16522: --- Summary: [C++] Evolution of exec plan Key: ARROW-16522 URL: https://issues.apache.org/jira/browse/ARROW-16522 Project: Apache Arrow Issue Type: Improvement

[jira] [Closed] (ARROW-16506) Pyarrow 8.0.0 write_dataset writes data in different order with use_threads=True

2022-05-10 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weston Pace closed ARROW-16506. --- Resolution: Duplicate > Pyarrow 8.0.0 write_dataset writes data in different order with >

[jira] [Commented] (ARROW-16506) Pyarrow 8.0.0 write_dataset writes data in different order with use_threads=True

2022-05-10 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17534486#comment-17534486 ] Weston Pace commented on ARROW-16506: - FYI ARROW-16518 was recently filed and I was reminded of

[jira] [Commented] (ARROW-10883) [C++][Dataset] Preserve order when writing dataset

2022-05-10 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17534485#comment-17534485 ] Weston Pace commented on ARROW-10883: - I deleted the link to ARROW-12873 because I don't know that

[jira] [Commented] (ARROW-16513) [C++] Add a compute function to hash inputs

2022-05-10 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17534479#comment-17534479 ] Weston Pace commented on ARROW-16513: - Yes it is. Thanks. > [C++] Add a compute function to hash

[jira] [Closed] (ARROW-16513) [C++] Add a compute function to hash inputs

2022-05-10 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weston Pace closed ARROW-16513. --- Resolution: Duplicate > [C++] Add a compute function to hash inputs >

[jira] [Commented] (ARROW-16518) [Python] Ensure _exec_plan.execplan preserves order of inputs

2022-05-10 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17534457#comment-17534457 ] Weston Pace commented on ARROW-16518: - I doubt anyone expects or wants their data to be scrambled

[jira] [Resolved] (ARROW-16426) [C++] Add TeeNode to execution engine

2022-05-09 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weston Pace resolved ARROW-16426. - Fix Version/s: 9.0.0 Resolution: Fixed Issue resolved by pull request 13040

[jira] [Commented] (ARROW-16513) [C++] Add a compute function to hash inputs

2022-05-09 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17534022#comment-17534022 ] Weston Pace commented on ARROW-16513: - CC [~michalno] > [C++] Add a compute function to hash inputs

[jira] [Created] (ARROW-16513) [C++] Add a compute function to hash inputs

2022-05-09 Thread Weston Pace (Jira)
Weston Pace created ARROW-16513: --- Summary: [C++] Add a compute function to hash inputs Key: ARROW-16513 URL: https://issues.apache.org/jira/browse/ARROW-16513 Project: Apache Arrow Issue Type:

[jira] [Commented] (ARROW-15901) [C++] Support flat custom output field names in Substrait

2022-05-09 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-15901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17534005#comment-17534005 ] Weston Pace commented on ARROW-15901: - Added ARROW-16512 for the follow-up > [C++] Support flat

[jira] [Created] (ARROW-16512) [C++] Support nested custom output field names in Substrait

2022-05-09 Thread Weston Pace (Jira)
Weston Pace created ARROW-16512: --- Summary: [C++] Support nested custom output field names in Substrait Key: ARROW-16512 URL: https://issues.apache.org/jira/browse/ARROW-16512 Project: Apache Arrow

[jira] [Updated] (ARROW-16475) [Python] Allow pc.call_function to use field references (needed for UDFs)

2022-05-09 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weston Pace updated ARROW-16475: Summary: [Python] Allow pc.call_function to use field references (needed for UDFs) (was:

[jira] [Commented] (ARROW-16475) [Python] Publically expose Expression._call

2022-05-09 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17534003#comment-17534003 ] Weston Pace commented on ARROW-16475: - {{pc.call_function}} works for me. I'll update the title. >

[jira] [Commented] (ARROW-15081) [R][C++] Arrow crashes (OOM) on R client with large remote parquet files

2022-05-09 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-15081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17534002#comment-17534002 ] Weston Pace commented on ARROW-15081: - Yes, the per-file metadata was only accounting for around

[jira] [Assigned] (ARROW-14163) [C++] Naive spillover implementation for join

2022-05-09 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-14163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weston Pace reassigned ARROW-14163: --- Assignee: Sasha Krassovsky > [C++] Naive spillover implementation for join >

[jira] [Commented] (ARROW-16452) [R] After dataset scan, some RAM is left consumed until a garbage collection pass

2022-05-09 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17534000#comment-17534000 ] Weston Pace commented on ARROW-16452: - I'm pretty sure R's gc has to be called on R's thread so I

[jira] [Closed] (ARROW-16392) [C++] Substrait consumer cannot handle file URIs that contain a Windows drive letter

2022-05-09 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weston Pace closed ARROW-16392. --- Resolution: Duplicate > [C++] Substrait consumer cannot handle file URIs that contain a Windows

[jira] [Commented] (ARROW-16424) [C++] Update uri_path parsing in FromProto

2022-05-09 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533998#comment-17533998 ] Weston Pace commented on ARROW-16424: - When writing the python Substrait consumer we disabled one of

[jira] [Commented] (ARROW-16506) Pyarrow 8.0.0 write_dataset writes data in different order with use_threads=True

2022-05-09 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533996#comment-17533996 ] Weston Pace commented on ARROW-16506: - This is expected behavior at the moment. I'm guessing at

[jira] [Commented] (ARROW-16475) [Python] Publically expose Expression._call

2022-05-09 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17533991#comment-17533991 ] Weston Pace commented on ARROW-16475: - I don't think so but I could be doing something wrong:

[jira] [Created] (ARROW-16498) [C++] Fix potential deadlock in arrow::compute::TaskScheduler

2022-05-06 Thread Weston Pace (Jira)
Weston Pace created ARROW-16498: --- Summary: [C++] Fix potential deadlock in arrow::compute::TaskScheduler Key: ARROW-16498 URL: https://issues.apache.org/jira/browse/ARROW-16498 Project: Apache Arrow

[jira] [Created] (ARROW-16496) [C++] Add roundtrip support to plans + relations

2022-05-06 Thread Weston Pace (Jira)
Weston Pace created ARROW-16496: --- Summary: [C++] Add roundtrip support to plans + relations Key: ARROW-16496 URL: https://issues.apache.org/jira/browse/ARROW-16496 Project: Apache Arrow Issue

[jira] [Updated] (ARROW-16496) [C++] Add roundtrip support to plans + relations

2022-05-06 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weston Pace updated ARROW-16496: Labels: substrait (was: ) > [C++] Add roundtrip support to plans + relations >

[jira] [Created] (ARROW-16482) [C++] Automatically convert from Substrait options to Arrow options object

2022-05-05 Thread Weston Pace (Jira)
Weston Pace created ARROW-16482: --- Summary: [C++] Automatically convert from Substrait options to Arrow options object Key: ARROW-16482 URL: https://issues.apache.org/jira/browse/ARROW-16482 Project:

[jira] [Created] (ARROW-16483) [C++] Automatically convert from Substrait options to Arrow options object

2022-05-05 Thread Weston Pace (Jira)
Weston Pace created ARROW-16483: --- Summary: [C++] Automatically convert from Substrait options to Arrow options object Key: ARROW-16483 URL: https://issues.apache.org/jira/browse/ARROW-16483 Project:

[jira] [Assigned] (ARROW-15849) [C++] Add a method that accepts a Substrait plan and returns a RecordBatchReader

2022-05-05 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-15849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weston Pace reassigned ARROW-15849: --- Assignee: Sanjiban Sengupta > [C++] Add a method that accepts a Substrait plan and returns

[jira] [Created] (ARROW-16475) [Python] Publically expose Expression._call

2022-05-04 Thread Weston Pace (Jira)
Weston Pace created ARROW-16475: --- Summary: [Python] Publically expose Expression._call Key: ARROW-16475 URL: https://issues.apache.org/jira/browse/ARROW-16475 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-16463) [C++] Add support for non-local filesystem URIs in the Substrait consumer

2022-05-04 Thread Weston Pace (Jira)
Weston Pace created ARROW-16463: --- Summary: [C++] Add support for non-local filesystem URIs in the Substrait consumer Key: ARROW-16463 URL: https://issues.apache.org/jira/browse/ARROW-16463 Project:

[jira] [Commented] (ARROW-15081) [R][C++] Arrow crashes (OOM) on R client with large remote parquet files

2022-05-04 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-15081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17531783#comment-17531783 ] Weston Pace commented on ARROW-15081: - Moving to one file instead of many files will save you

[jira] [Commented] (ARROW-16409) [C++][Python][R] Deprecate "scanner" (but keep "scan node") from public API

2022-05-04 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17531752#comment-17531752 ] Weston Pace commented on ARROW-16409: - That is an important behavior. In R, which has already

[jira] [Commented] (ARROW-15081) [R][C++] Arrow crashes (OOM) on R client with large remote parquet files

2022-05-03 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-15081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17531476#comment-17531476 ] Weston Pace commented on ARROW-15081: - One mystery solved, a few more remained, I managed to

[jira] [Updated] (ARROW-16452) [R] After dataset scan, some RAM is left consumed until a garbage collection pass

2022-05-03 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weston Pace updated ARROW-16452: Description: This might be "not a bug" but I wonder if we can do something better here. When I

[jira] [Created] (ARROW-16452) [R] After dataset scan, some RAM is left consumed until a garbage collection pass

2022-05-03 Thread Weston Pace (Jira)
Weston Pace created ARROW-16452: --- Summary: [R] After dataset scan, some RAM is left consumed until a garbage collection pass Key: ARROW-16452 URL: https://issues.apache.org/jira/browse/ARROW-16452

[jira] [Created] (ARROW-16451) [C++] ParquetFileFragment caches parquet file metadata and there is no way to disable this

2022-05-03 Thread Weston Pace (Jira)
Weston Pace created ARROW-16451: --- Summary: [C++] ParquetFileFragment caches parquet file metadata and there is no way to disable this Key: ARROW-16451 URL: https://issues.apache.org/jira/browse/ARROW-16451

[jira] [Commented] (ARROW-16421) [R] Permission error on Windows when deleting file in dataset

2022-05-03 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17531411#comment-17531411 ] Weston Pace commented on ARROW-16421: - Right now the destruction of the record batch generator

[jira] [Commented] (ARROW-16421) [R] Permission error on Windows when deleting file in dataset

2022-05-03 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17531410#comment-17531410 ] Weston Pace commented on ARROW-16421: - It's rather spread out and not at all obvious. For example,

[jira] [Commented] (ARROW-16421) [R] Permission error on Windows when deleting file in dataset

2022-05-03 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17531338#comment-17531338 ] Weston Pace commented on ARROW-16421: - The scanner does need to close its files. It takes care of

[jira] [Commented] (ARROW-15081) [R][C++] Arrow crashes (OOM) on R client with large remote parquet files

2022-05-02 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-15081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17531027#comment-17531027 ] Weston Pace commented on ARROW-15081: - I'm going to keep looking into this but this doesn't seem to

[jira] [Commented] (ARROW-15081) [R][C++] Arrow crashes (OOM) on R client with large remote parquet files

2022-05-02 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-15081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17530967#comment-17530967 ] Weston Pace commented on ARROW-15081: - That should be new enough. I'll try and reproduce and see

[jira] [Commented] (ARROW-16433) [Release][C++] parquet-arrow-test test fails on windows

2022-05-02 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17530951#comment-17530951 ] Weston Pace commented on ARROW-16433: - Does the verification script build with mingw or MSVC? >

[jira] [Commented] (ARROW-15081) [R][C++] Arrow crashes (OOM) on R client with large remote parquet files

2022-05-02 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-15081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17530949#comment-17530949 ] Weston Pace commented on ARROW-15081: - 8.0.0 should behave better when it comes to reading large

[jira] [Commented] (ARROW-15582) [C++] Add support for registering tricky functions with the Substrait consumer (or add a bunch of substrait meta functions)

2022-05-02 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-15582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17530945#comment-17530945 ] Weston Pace commented on ARROW-15582: - I'd have a slight preference for #2 (Arrow has two ternary

[jira] [Resolved] (ARROW-16416) [C++] Support cast-function in Substrait

2022-04-29 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weston Pace resolved ARROW-16416. - Fix Version/s: 9.0.0 Resolution: Fixed Issue resolved by pull request 13032

[jira] [Commented] (ARROW-16421) [R] Permission error on Windows when deleting file in dataset

2022-04-29 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17530225#comment-17530225 ] Weston Pace commented on ARROW-16421: - Windows is notoriously stubborn about deleting files that

[jira] [Commented] (ARROW-15590) [C++] Add support for joins to the Substrait consumer

2022-04-29 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-15590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17530201#comment-17530201 ] Weston Pace commented on ARROW-15590: - The Substrait spec (the website) doesn't always match the

[jira] [Comment Edited] (ARROW-15590) [C++] Add support for joins to the Substrait consumer

2022-04-29 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-15590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17530201#comment-17530201 ] Weston Pace edited comment on ARROW-15590 at 4/29/22 7:16 PM: -- The

[jira] [Commented] (ARROW-16409) [C++][Python][R] Deprecate "scanner" (but keep "scan node") from public API

2022-04-28 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529607#comment-17529607 ] Weston Pace commented on ARROW-16409: - {{select}} sounds reasonable but right now the {{scanner()}}

[jira] [Created] (ARROW-16410) [C++] Scanner -> ScanNode

2022-04-28 Thread Weston Pace (Jira)
Weston Pace created ARROW-16410: --- Summary: [C++] Scanner -> ScanNode Key: ARROW-16410 URL: https://issues.apache.org/jira/browse/ARROW-16410 Project: Apache Arrow Issue Type: Improvement

[jira] [Created] (ARROW-16409) [C++][Python][R] Deprecate "scanner" (but keep "scan node") from public API

2022-04-28 Thread Weston Pace (Jira)
Weston Pace created ARROW-16409: --- Summary: [C++][Python][R] Deprecate "scanner" (but keep "scan node") from public API Key: ARROW-16409 URL: https://issues.apache.org/jira/browse/ARROW-16409 Project:

[jira] [Resolved] (ARROW-16390) [C++] Dataset initialization could segfault if called simultaneously

2022-04-28 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weston Pace resolved ARROW-16390. - Fix Version/s: 9.0.0 Resolution: Fixed Issue resolved by pull request 13019

[jira] [Commented] (ARROW-16391) pd.read_parquet using filters consumes too much memory

2022-04-28 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529576#comment-17529576 ] Weston Pace commented on ARROW-16391: - Release candidates for 8.0.0 are currently being voted on.

[jira] [Comment Edited] (ARROW-16392) [C++] Substrait consumer cannot handle file URIs that contain a Windows drive letter

2022-04-28 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529519#comment-17529519 ] Weston Pace edited comment on ARROW-16392 at 4/28/22 4:30 PM: -- Right now if

[jira] [Commented] (ARROW-16392) [C++] Substrait consumer cannot handle file URIs that contain a Windows drive letter

2022-04-28 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529519#comment-17529519 ] Weston Pace commented on ARROW-16392: - Right now if we see {{file:///}} we drop the first 7

[jira] [Updated] (ARROW-16392) [C++] Substrait consumer cannot handle file URIs that contain a Windows drive letter

2022-04-28 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weston Pace updated ARROW-16392: Summary: [C++] Substrait consumer cannot handle file URIs that contain a Windows drive letter

[jira] [Commented] (ARROW-16320) Dataset re-partitioning consumes considerable amount of memory

2022-04-27 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529181#comment-17529181 ] Weston Pace commented on ARROW-16320: - The writing behavior you described seemed odd so I modified

[jira] [Commented] (ARROW-16391) pd.read_parquet using filters consuming too much memory

2022-04-27 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529178#comment-17529178 ] Weston Pace commented on ARROW-16391: - Ah, looking at your use case a bit more carefully I don't

[jira] [Commented] (ARROW-16391) pd.read_parquet using filters consuming too much memory

2022-04-27 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-16391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529175#comment-17529175 ] Weston Pace commented on ARROW-16391: - We've worked on this as part of 8.0.0 (see ARROW-15410). Are

[jira] [Created] (ARROW-16390) [C++] Dataset initialization could segfault if called simultaneously

2022-04-27 Thread Weston Pace (Jira)
Weston Pace created ARROW-16390: --- Summary: [C++] Dataset initialization could segfault if called simultaneously Key: ARROW-16390 URL: https://issues.apache.org/jira/browse/ARROW-16390 Project: Apache

<    1   2   3   4   5   6   7   8   9   10   >