[GitHub] [arrow-datafusion] jorgecarleitao opened a new issue #533: Add extension plugin to parse SQL into logical plan

2021-06-09 Thread GitBox
jorgecarleitao opened a new issue #533: URL: https://github.com/apache/arrow-datafusion/issues/533 As a user of DataFusion, I would like to be able to install custom parsing rules of SQL to DataFusion, so that I can plan custom nodes from SQL. This would allow me to extend datafusion

[GitHub] [arrow-datafusion] codecov-commenter edited a comment on pull request #532: reuse datafusion physical planner in ballista building from protobuf

2021-06-09 Thread GitBox
codecov-commenter edited a comment on pull request #532: URL: https://github.com/apache/arrow-datafusion/pull/532#issuecomment-858336030 # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/532?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+

[GitHub] [arrow] projjal commented on pull request #10411: ARROW-12801: [CI][Packaging][Java] Include all modules in script that generate Arrow jars

2021-06-09 Thread GitBox
projjal commented on pull request #10411: URL: https://github.com/apache/arrow/pull/10411#issuecomment-858352950 @kszucs Thanks for fixing this. Is there anything left to be done in this patch? -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #532: reuse datafusion physical planner in ballista building from protobuf

2021-06-09 Thread GitBox
codecov-commenter commented on pull request #532: URL: https://github.com/apache/arrow-datafusion/pull/532#issuecomment-858336030 # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/532?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comment

[GitHub] [arrow-datafusion] Jimexist opened a new pull request #532: use logical planner in ballista building from protobuf

2021-06-09 Thread GitBox
Jimexist opened a new pull request #532: URL: https://github.com/apache/arrow-datafusion/pull/532 # Which issue does this PR close? Closes #. # Rationale for this change use logical planner in ballista building from protobuf, to reduce code duplication # What ch

[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #530: add error handling and boundary checking for window frames

2021-06-09 Thread GitBox
codecov-commenter commented on pull request #530: URL: https://github.com/apache/arrow-datafusion/pull/530#issuecomment-858286720 # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/530?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comment

[GitHub] [arrow-datafusion] viirya opened a new issue #531: `cargo build` cannot build the project

2021-06-09 Thread GitBox
viirya opened a new issue #531: URL: https://github.com/apache/arrow-datafusion/issues/531 **Describe the bug** Cannot build the project from a clean checkout. **To Reproduce** 1. git clone the project 2. cargo build Then the error: ``` ... Compi

[GitHub] [arrow] westonpace commented on a change in pull request #10401: ARROW-12878: [C++] Generalize thread pool to allow for different queuing strategies / worker loops

2021-06-09 Thread GitBox
westonpace commented on a change in pull request #10401: URL: https://github.com/apache/arrow/pull/10401#discussion_r648833804 ## File path: cpp/src/arrow/util/thread_pool_test.cc ## @@ -452,6 +454,42 @@ TEST_F(TestThreadPool, QuickShutdown) { add_tester.CheckNotAllComputed(

[GitHub] [arrow] westonpace commented on a change in pull request #10401: ARROW-12878: [C++] Generalize thread pool to allow for different queuing strategies / worker loops

2021-06-09 Thread GitBox
westonpace commented on a change in pull request #10401: URL: https://github.com/apache/arrow/pull/10401#discussion_r648833585 ## File path: cpp/src/arrow/util/thread_pool_test.cc ## @@ -452,6 +454,42 @@ TEST_F(TestThreadPool, QuickShutdown) { add_tester.CheckNotAllComputed(

[GitHub] [arrow] westonpace commented on a change in pull request #10401: ARROW-12878: [C++] Generalize thread pool to allow for different queuing strategies / worker loops

2021-06-09 Thread GitBox
westonpace commented on a change in pull request #10401: URL: https://github.com/apache/arrow/pull/10401#discussion_r648833516 ## File path: cpp/src/arrow/util/thread_pool.h ## @@ -321,37 +335,139 @@ class ARROW_EXPORT ThreadPool : public Executor { // tasks are finished.

[GitHub] [arrow] westonpace commented on a change in pull request #10401: ARROW-12878: [C++] Generalize thread pool to allow for different queuing strategies / worker loops

2021-06-09 Thread GitBox
westonpace commented on a change in pull request #10401: URL: https://github.com/apache/arrow/pull/10401#discussion_r648832182 ## File path: cpp/src/arrow/util/future_test.cc ## @@ -106,12 +107,18 @@ template class SimpleExecutor { public: explicit SimpleExecutor(int nfu

[GitHub] [arrow-rs] jorgecarleitao commented on pull request #439: [WIP] FFI bridge for Schema, Field and DataType

2021-06-09 Thread GitBox
jorgecarleitao commented on pull request #439: URL: https://github.com/apache/arrow-rs/pull/439#issuecomment-858270139 At the time I had limited knowledge of what I was doing. I think that we could refactor this so that the arrays are shared and the datatypes are copied, since we do not re

[GitHub] [arrow-datafusion] Jimexist opened a new pull request #530: add error handling and boundary checking for window frames

2021-06-09 Thread GitBox
Jimexist opened a new pull request #530: URL: https://github.com/apache/arrow-datafusion/pull/530 # Which issue does this PR close? Closes #529 Closes #528 # Rationale for this change # What changes are included in this PR? # Are there any user-facin

[GitHub] [arrow-datafusion] Jimexist opened a new issue #529: With window frame present and frame type = RANGE, order by must be present with 1 column

2021-06-09 Thread GitBox
Jimexist opened a new issue #529: URL: https://github.com/apache/arrow-datafusion/issues/529 **Describe the bug** With window frame present and frame type = RANGE, order by must be present with 1 column **To Reproduce** Steps to reproduce the behavior: **Expected beh

[GitHub] [arrow-datafusion] Jimexist opened a new issue #528: With window frame present and frame type = RANGE, the current implementation cannot handle numeric bounds

2021-06-09 Thread GitBox
Jimexist opened a new issue #528: URL: https://github.com/apache/arrow-datafusion/issues/528 **Describe the bug** With window frame present and frame type = RANGE, the current implementation cannot handle numeric bounds **To Reproduce** Steps to reproduce the behavior:

[GitHub] [arrow] github-actions[bot] commented on pull request #10500: ARROW-13031: [JS] Support arm in closure compiler on macOS

2021-06-09 Thread GitBox
github-actions[bot] commented on pull request #10500: URL: https://github.com/apache/arrow/pull/10500#issuecomment-858250811 https://issues.apache.org/jira/browse/ARROW-13031 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [arrow-rs] codecov-commenter edited a comment on pull request #439: [WIP] FFI bridge for Schema, Field and DataType

2021-06-09 Thread GitBox
codecov-commenter edited a comment on pull request #439: URL: https://github.com/apache/arrow-rs/pull/439#issuecomment-857974778 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/439?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_ter

[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #527: remove redundant `into_iter()` calls

2021-06-09 Thread GitBox
codecov-commenter commented on pull request #527: URL: https://github.com/apache/arrow-datafusion/pull/527#issuecomment-858246805 # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/527?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comment

[GitHub] [arrow] domoritz opened a new pull request #10500: ARROW-13031: [JS] Support arm in closure compiler on macOS

2021-06-09 Thread GitBox
domoritz opened a new pull request #10500: URL: https://github.com/apache/arrow/pull/10500 Includes https://github.com/google/closure-compiler-npm/pull/215 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [arrow-datafusion] Jimexist opened a new pull request #527: remove redundant into_iter() calls

2021-06-09 Thread GitBox
Jimexist opened a new pull request #527: URL: https://github.com/apache/arrow-datafusion/pull/527 # Which issue does this PR close? Closes #. # Rationale for this change remove redundant into_iter() calls # What changes are included in this PR?

[GitHub] [arrow-rs] codecov-commenter edited a comment on pull request #438: use iterator for partition kernel instead of generating vec

2021-06-09 Thread GitBox
codecov-commenter edited a comment on pull request #438: URL: https://github.com/apache/arrow-rs/pull/438#issuecomment-857406876 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/438?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_ter

[GitHub] [arrow] westonpace commented on pull request #10404: ARROW-12876: [R] Fix build flags on Raspberry Pi

2021-06-09 Thread GitBox
westonpace commented on pull request #10404: URL: https://github.com/apache/arrow/pull/10404#issuecomment-858115530 @thisisnic Not quite. We don't want to link to libatomic on all non-windows non-apple architectures. Most Linux architectures will have atomics provided by gcc itself.

[GitHub] [arrow] nealrichardson commented on pull request #10445: ARROW-9140: [R] Zero-copy Arrow to R where possible

2021-06-09 Thread GitBox
nealrichardson commented on pull request #10445: URL: https://github.com/apache/arrow/pull/10445#issuecomment-858109294 > I've used `arrow.use_altrep` but I can use `arrow.altrep` all the same. SGTM > > I've put `GetBoolOption("arrow.altrep", true)` after the `array->null_

[GitHub] [arrow] bkietz closed pull request #10475: ARROW-13001: [Go][Parquet] fix build failure on s390x

2021-06-09 Thread GitBox
bkietz closed pull request #10475: URL: https://github.com/apache/arrow/pull/10475 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [arrow-rs] nevi-me commented on a change in pull request #384: Implement faster arrow array reader

2021-06-09 Thread GitBox
nevi-me commented on a change in pull request #384: URL: https://github.com/apache/arrow-rs/pull/384#discussion_r648677116 ## File path: parquet/src/arrow/array_reader.rs ## @@ -1499,12 +1499,12 @@ impl<'a> ArrayReaderBuilder { arrow_type,

[GitHub] [arrow] github-actions[bot] commented on pull request #10499: WIP: ARROW-12738: [C++/Python/R] Update conda variant files

2021-06-09 Thread GitBox
github-actions[bot] commented on pull request #10499: URL: https://github.com/apache/arrow/pull/10499#issuecomment-858078349 Revision: 7a2818bf352eab0adb662921eaa2db30654857f4 Submitted crossbow builds: [ursacomputing/crossbow @ actions-492](https://github.com/ursacomputing/crossbow/

[GitHub] [arrow] github-actions[bot] commented on pull request #10499: WIP: ARROW-12738: [C++/Python/R] Update conda variant files

2021-06-09 Thread GitBox
github-actions[bot] commented on pull request #10499: URL: https://github.com/apache/arrow/pull/10499#issuecomment-858077945 https://issues.apache.org/jira/browse/ARROW-12738 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [arrow] thisisnic commented on a change in pull request #10404: ARROW-12876: [R] Fix build flags on Raspberry Pi

2021-06-09 Thread GitBox
thisisnic commented on a change in pull request #10404: URL: https://github.com/apache/arrow/pull/10404#discussion_r648654943 ## File path: r/configure ## @@ -185,6 +185,11 @@ else fi fi +# If on Raspberry Pi, need to manually link against latomic Review comment:

[GitHub] [arrow] thisisnic commented on pull request #10404: ARROW-12876: [R] Fix build flags on Raspberry Pi

2021-06-09 Thread GitBox
thisisnic commented on pull request #10404: URL: https://github.com/apache/arrow/pull/10404#issuecomment-858072175 @westonpace Like this (https://github.com/apache/arrow/pull/10404/commits/710ff09085dc6d5736c6e213b2605c9c23c5b89b) or did you mean something else? -- This is an autom

[GitHub] [arrow-rs] alamb commented on pull request #441: Cherry pick add lexicographically partition points and ranges to active_release

2021-06-09 Thread GitBox
alamb commented on pull request #441: URL: https://github.com/apache/arrow-rs/pull/441#issuecomment-858051089 Needs https://github.com/apache/arrow-rs/pull/442 to merge first -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [arrow-rs] alamb merged pull request #433: Cherry pick Derive Eq and PartialEq for SortOptions to active_release

2021-06-09 Thread GitBox
alamb merged pull request #433: URL: https://github.com/apache/arrow-rs/pull/433 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please co

[GitHub] [arrow-rs] codecov-commenter edited a comment on pull request #384: Implement faster arrow array reader

2021-06-09 Thread GitBox
codecov-commenter edited a comment on pull request #384: URL: https://github.com/apache/arrow-rs/pull/384#issuecomment-851063613 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/384?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_ter

[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #524: Expose ExecutionContext.register_csv to the python bindings

2021-06-09 Thread GitBox
codecov-commenter commented on pull request #524: URL: https://github.com/apache/arrow-datafusion/pull/524#issuecomment-858047325 # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/524?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comment

[GitHub] [arrow] xhochy commented on pull request #10499: WIP: ARROW-12738: [C++/Python/R] Update conda variant files

2021-06-09 Thread GitBox
xhochy commented on pull request #10499: URL: https://github.com/apache/arrow/pull/10499#issuecomment-858046624 @github-actions crossbow submit -g conda -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow] xhochy opened a new pull request #10499: WIP: ARROW-12738: [C++/Python/R] Update conda variant files

2021-06-09 Thread GitBox
xhochy opened a new pull request #10499: URL: https://github.com/apache/arrow/pull/10499 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, p

[GitHub] [arrow] github-actions[bot] commented on pull request #10498: ARROW-13027: [C++] Fix ASAN stack traces in CI

2021-06-09 Thread GitBox
github-actions[bot] commented on pull request #10498: URL: https://github.com/apache/arrow/pull/10498#issuecomment-858037852 https://issues.apache.org/jira/browse/ARROW-13027 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [arrow] westonpace commented on pull request #10498: ARROW-13027: [C++] Fix ASAN stack traces in CI

2021-06-09 Thread GitBox
westonpace commented on pull request #10498: URL: https://github.com/apache/arrow/pull/10498#issuecomment-858021084 Good catch. The dockerfile also has a variable for llvm version so I don't have to hard code 12. -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow-rs] yordan-pavlov commented on pull request #384: Implement faster arrow array reader

2021-06-09 Thread GitBox
yordan-pavlov commented on pull request #384: URL: https://github.com/apache/arrow-rs/pull/384#issuecomment-858010763 thanks for the heads up @alamb, I have rebased and cleaned up the code in preparation for merging, but still waiting for review by @nevi-me and @jorgecarleitao -- This

[GitHub] [arrow] pitrou commented on a change in pull request #10498: ARROW-13027: [C++] Fix ASAN stack traces in CI

2021-06-09 Thread GitBox
pitrou commented on a change in pull request #10498: URL: https://github.com/apache/arrow/pull/10498#discussion_r648596421 ## File path: docker-compose.yml ## @@ -372,6 +372,7 @@ services: ARROW_S3: "OFF" ARROW_USE_ASAN: "ON" ARROW_USE_UBSAN: "ON" + AS

[GitHub] [arrow] pitrou closed pull request #10496: ARROW-12951: [C++] Reduce generated code size for string kernels

2021-06-09 Thread GitBox
pitrou closed pull request #10496: URL: https://github.com/apache/arrow/pull/10496 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [arrow] westonpace opened a new pull request #10498: ARROW-13027: [C++] Fix ASAN stack traces in CI

2021-06-09 Thread GitBox
westonpace opened a new pull request #10498: URL: https://github.com/apache/arrow/pull/10498 Before change: ``` Direct leak of 65536 byte(s) in 1 object(s) allocated from: #0 0x522f09 in #1 0x7f28ae5826f4 in #2 0x7f28ae57fa5d in #3 0x7f28ae58cb0f in

[GitHub] [arrow] github-actions[bot] commented on pull request #10497: [R] Add bindings for pmin() and pmax()

2021-06-09 Thread GitBox
github-actions[bot] commented on pull request #10497: URL: https://github.com/apache/arrow/pull/10497#issuecomment-857984283 Thanks for opening a pull request! If this is not a [minor PR](https://github.com/apache/arrow/blob/master/CONTRIBUTING.md#Minor-Fixes). Could you ope

[GitHub] [arrow] bkietz closed pull request #10472: ARROW-12975: [C++][Python] if_else kernel doesn't support upcasting

2021-06-09 Thread GitBox
bkietz closed pull request #10472: URL: https://github.com/apache/arrow/pull/10472 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [arrow-rs] codecov-commenter commented on pull request #439: [WIP] FFI bridge for Schema, Field and DataType

2021-06-09 Thread GitBox
codecov-commenter commented on pull request #439: URL: https://github.com/apache/arrow-rs/pull/439#issuecomment-857974778 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/439?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+A

[GitHub] [arrow] pitrou commented on pull request #10485: ARROW-13015: [C++] Create benchmark for file iteration

2021-06-09 Thread GitBox
pitrou commented on pull request #10485: URL: https://github.com/apache/arrow/pull/10485#issuecomment-857953433 Also, the problem with making the block size smaller is that you will get tiny columns in many cases. The current value is probably a lower bound for efficiency, but files with d

[GitHub] [arrow] pitrou commented on pull request #10485: ARROW-13015: [C++] Create benchmark for file iteration

2021-06-09 Thread GitBox
pitrou commented on pull request #10485: URL: https://github.com/apache/arrow/pull/10485#issuecomment-857951624 I would expect performance characterics to be similar to the regular parallel CSV reader. -- This is an automated message from the Apache Git Service. To respond to the message

[GitHub] [arrow-datafusion] alamb commented on pull request #521: Return errors properly from RepartitionExec

2021-06-09 Thread GitBox
alamb commented on pull request #521: URL: https://github.com/apache/arrow-datafusion/pull/521#issuecomment-857951285 Rebased -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [arrow] westonpace commented on pull request #10485: ARROW-13015: [C++] Create benchmark for file iteration

2021-06-09 Thread GitBox
westonpace commented on pull request #10485: URL: https://github.com/apache/arrow/pull/10485#issuecomment-857950376 For context, I want to work on parallelizing the streaming CSV reader. I'd like to investigate smaller block sizes for the earlier stages since they perform effectively ran

[GitHub] [arrow-datafusion] alamb merged pull request #501: Add `partition by` constructs in window functions and modify logical planning

2021-06-09 Thread GitBox
alamb merged pull request #501: URL: https://github.com/apache/arrow-datafusion/pull/501 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, p

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #501: Add `partition by` constructs in window functions and modify logical planning

2021-06-09 Thread GitBox
alamb commented on a change in pull request #501: URL: https://github.com/apache/arrow-datafusion/pull/501#discussion_r648562970 ## File path: datafusion/src/sql/planner.rs ## @@ -1121,52 +1121,53 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { // then, win

[GitHub] [arrow-datafusion] alamb merged pull request #493: Define the unittests using pytest

2021-06-09 Thread GitBox
alamb merged pull request #493: URL: https://github.com/apache/arrow-datafusion/pull/493 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, p

[GitHub] [arrow-rs] alamb commented on pull request #438: use iterator for partition kernel instead of generating vec

2021-06-09 Thread GitBox
alamb commented on pull request #438: URL: https://github.com/apache/arrow-rs/pull/438#issuecomment-857944078 Looks like this PR needs a rebase. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[GitHub] [arrow-datafusion] nevi-me commented on issue #525: Add Delta Lake TableProvider

2021-06-09 Thread GitBox
nevi-me commented on issue #525: URL: https://github.com/apache/arrow-datafusion/issues/525#issuecomment-857942853 > I am also planning to promote datafusion as the default query engine for executing native delta lake queries in `delta-rs`. This will make it easier for us to provide delta

[GitHub] [arrow-rs] alamb opened a new pull request #442: Cherry pick refactor lexico sort for future code reuse to active_release

2021-06-09 Thread GitBox
alamb opened a new pull request #442: URL: https://github.com/apache/arrow-rs/pull/442 Automatic cherry-pick of a37dd4f * Originally appeared in https://github.com/apache/arrow-rs/pull/423: refactor lexico sort for future code reuse -- This is an automated message from the Apache

[GitHub] [arrow-rs] alamb opened a new pull request #441: Cherry pick add lexicographically partition points and ranges to active_release

2021-06-09 Thread GitBox
alamb opened a new pull request #441: URL: https://github.com/apache/arrow-rs/pull/441 Automatic cherry-pick of 0c00776 * Originally appeared in https://github.com/apache/arrow-rs/pull/424: add lexicographically partition points and ranges -- This is an automated message from the

[GitHub] [arrow-rs] alamb closed issue #428: Add partitioning kernel for sorted arrays

2021-06-09 Thread GitBox
alamb closed issue #428: URL: https://github.com/apache/arrow-rs/issues/428 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact

[GitHub] [arrow-rs] alamb merged pull request #424: add lexicographically partition points and ranges

2021-06-09 Thread GitBox
alamb merged pull request #424: URL: https://github.com/apache/arrow-rs/pull/424 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please co

[GitHub] [arrow-rs] alamb commented on pull request #419: Remove DictionaryArray::keys_array method

2021-06-09 Thread GitBox
alamb commented on pull request #419: URL: https://github.com/apache/arrow-rs/pull/419#issuecomment-857935812 I'll wait a few more days before merging this in to see if there is any more feedback. As this is not compatible, I won't backport to the 4.x line -- This is an automated

[GitHub] [arrow-rs] alamb closed pull request #422: disable lexsort bound check

2021-06-09 Thread GitBox
alamb closed pull request #422: URL: https://github.com/apache/arrow-rs/pull/422 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please co

[GitHub] [arrow-rs] alamb commented on pull request #422: disable lexsort bound check

2021-06-09 Thread GitBox
alamb commented on pull request #422: URL: https://github.com/apache/arrow-rs/pull/422#issuecomment-857934571 Closing per comments above. Thanks @Jimexist ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[GitHub] [arrow-rs] alamb merged pull request #436: Update release Readme.md

2021-06-09 Thread GitBox
alamb merged pull request #436: URL: https://github.com/apache/arrow-rs/pull/436 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please co

[GitHub] [arrow-rs] alamb merged pull request #421: Reenable MIRI check on PRs

2021-06-09 Thread GitBox
alamb merged pull request #421: URL: https://github.com/apache/arrow-rs/pull/421 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please co

[GitHub] [arrow-rs] alamb closed issue #345: MIRI CI check fails intermittently with `thread 'main' panicked at 'invalid time'`

2021-06-09 Thread GitBox
alamb closed issue #345: URL: https://github.com/apache/arrow-rs/issues/345 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact

[GitHub] [arrow] pachadotdev opened a new pull request #10497: [R] Add bindings for pmin() and pmax()

2021-06-09 Thread GitBox
pachadotdev opened a new pull request #10497: URL: https://github.com/apache/arrow/pull/10497 ``` r library(arrow) #> See arrow_info() for available features #> #> Attaching package: 'arrow' #> The following object is masked from 'package:utils': #> #> timestamp

[GitHub] [arrow-rs] alamb commented on pull request #421: Reenable MIRI check on PRs

2021-06-09 Thread GitBox
alamb commented on pull request #421: URL: https://github.com/apache/arrow-rs/pull/421#issuecomment-857933162 Fourth success: https://github.com/apache/arrow-rs/pull/421/checks?check_run_id=2778308761 I am merging this one in :) -- This is an automated message from the Apache Git

[GitHub] [arrow-rs] alamb merged pull request #435: Cherry pick Sort by float lists to active_release

2021-06-09 Thread GitBox
alamb merged pull request #435: URL: https://github.com/apache/arrow-rs/pull/435 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please co

[GitHub] [arrow-rs] alamb merged pull request #434: Cherry pick Fix bug with null buffer offset in boolean not kernel to active_release

2021-06-09 Thread GitBox
alamb merged pull request #434: URL: https://github.com/apache/arrow-rs/pull/434 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please co

[GitHub] [arrow-rs] alamb merged pull request #411: Cherry pick Reduce memory usage of concat (large)utf8 to active_release

2021-06-09 Thread GitBox
alamb merged pull request #411: URL: https://github.com/apache/arrow-rs/pull/411 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please co

[GitHub] [arrow-rs] alamb merged pull request #432: Cherry pick Fix out of bounds read in bit chunk iterator to active_release

2021-06-09 Thread GitBox
alamb merged pull request #432: URL: https://github.com/apache/arrow-rs/pull/432 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please co

[GitHub] [arrow-rs] alamb merged pull request #431: Cherry pick Add set_bit to BooleanBufferBuilder to allow mutating bit in index to active_release

2021-06-09 Thread GitBox
alamb merged pull request #431: URL: https://github.com/apache/arrow-rs/pull/431 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please co

[GitHub] [arrow-rs] alamb merged pull request #430: Cherry pick Respect max rowgroup size in Arrow writer to active_release

2021-06-09 Thread GitBox
alamb merged pull request #430: URL: https://github.com/apache/arrow-rs/pull/430 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please co

[GitHub] [arrow-rs] alamb merged pull request #429: Cherry pick add more tests for window::shift and handle boundary cases to active_release

2021-06-09 Thread GitBox
alamb merged pull request #429: URL: https://github.com/apache/arrow-rs/pull/429 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please co

[GitHub] [arrow-rs] alamb opened a new issue #440: Code search is not working in github

2021-06-09 Thread GitBox
alamb opened a new issue #440: URL: https://github.com/apache/arrow-rs/issues/440 **Describe the bug** I want to be able to search for code in the github repo **To Reproduce** Search for code that exists in arrow-rs, for example [`get_batch`](https://github.com/apache/arrow-rs/

[GitHub] [arrow] pitrou commented on a change in pull request #10401: ARROW-12878: [C++] Generalize thread pool to allow for different queuing strategies / worker loops

2021-06-09 Thread GitBox
pitrou commented on a change in pull request #10401: URL: https://github.com/apache/arrow/pull/10401#discussion_r648544911 ## File path: cpp/src/arrow/util/thread_pool_test.cc ## @@ -452,6 +454,42 @@ TEST_F(TestThreadPool, QuickShutdown) { add_tester.CheckNotAllComputed();

[GitHub] [arrow] pitrou commented on a change in pull request #10496: ARROW-12951: [C++] Reduce generated code size for string kernels

2021-06-09 Thread GitBox
pitrou commented on a change in pull request #10496: URL: https://github.com/apache/arrow/pull/10496#discussion_r648543624 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -2539,167 +2581,136 @@ struct TrimStateUTF8 { } }; -template -struct UTF8TrimBase

[GitHub] [arrow] westonpace commented on a change in pull request #10401: ARROW-12878: [C++] Generalize thread pool to allow for different queuing strategies / worker loops

2021-06-09 Thread GitBox
westonpace commented on a change in pull request #10401: URL: https://github.com/apache/arrow/pull/10401#discussion_r648543567 ## File path: cpp/src/arrow/util/thread_pool_test.cc ## @@ -452,6 +454,42 @@ TEST_F(TestThreadPool, QuickShutdown) { add_tester.CheckNotAllComputed(

[GitHub] [arrow] github-actions[bot] commented on pull request #10496: ARROW-12951: [C++] Reduce generated code size for string kernels

2021-06-09 Thread GitBox
github-actions[bot] commented on pull request #10496: URL: https://github.com/apache/arrow/pull/10496#issuecomment-857917015 https://issues.apache.org/jira/browse/ARROW-12951 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [arrow] westonpace commented on a change in pull request #10401: ARROW-12878: [C++] Generalize thread pool to allow for different queuing strategies / worker loops

2021-06-09 Thread GitBox
westonpace commented on a change in pull request #10401: URL: https://github.com/apache/arrow/pull/10401#discussion_r648541270 ## File path: cpp/src/arrow/util/thread_pool_test.cc ## @@ -452,6 +454,42 @@ TEST_F(TestThreadPool, QuickShutdown) { add_tester.CheckNotAllComputed(

[GitHub] [arrow] lidavidm commented on a change in pull request #10496: ARROW-12951: [C++] Reduce generated code size for string kernels

2021-06-09 Thread GitBox
lidavidm commented on a change in pull request #10496: URL: https://github.com/apache/arrow/pull/10496#discussion_r648536998 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -2539,167 +2581,136 @@ struct TrimStateUTF8 { } }; -template -struct UTF8TrimBa

[GitHub] [arrow-rs] kszucs commented on pull request #439: [WIP] FFI bridge for Schema, Field and DataType

2021-06-09 Thread GitBox
kszucs commented on pull request #439: URL: https://github.com/apache/arrow-rs/pull/439#issuecomment-857908771 @jorgecarleitao what is the rational behind wrapping `(FFI_ArrowArray, FFI_ArrowSchema)` with `ArrowArray` and the children with `ArrowArrayChild` ? -- This is an automated mess

[GitHub] [arrow-rs] Pand9 edited a comment on issue #349: parquet reading hangs when row_group contains more than 2048 rows of data

2021-06-09 Thread GitBox
Pand9 edited a comment on issue #349: URL: https://github.com/apache/arrow-rs/issues/349#issuecomment-857903895 I've also just encountered it. Common element with this reproduction is BOOLEAN field. It worked without BOOLEAN as well. After quick investigation of the looping code, I'v

[GitHub] [arrow-rs] Pand9 edited a comment on issue #349: parquet reading hangs when row_group contains more than 2048 rows of data

2021-06-09 Thread GitBox
Pand9 edited a comment on issue #349: URL: https://github.com/apache/arrow-rs/issues/349#issuecomment-857903895 I've also just encountered it. Common element with this reproduction is BOOLEAN field. It worked without BOOLEAN as well. After quick investigation of the looping code, I'v

[GitHub] [arrow-rs] kszucs opened a new pull request #439: [WIP] FFI bridge for Schema, Field and DataType

2021-06-09 Thread GitBox
kszucs opened a new pull request #439: URL: https://github.com/apache/arrow-rs/pull/439 # Which issue does this PR close? Closes #. # Rationale for this change # What changes are included in this PR? # Are there any user-facing changes?

[GitHub] [arrow-rs] Pand9 edited a comment on issue #349: parquet reading hangs when row_group contains more than 2048 rows of data

2021-06-09 Thread GitBox
Pand9 edited a comment on issue #349: URL: https://github.com/apache/arrow-rs/issues/349#issuecomment-857903895 I've also just encountered it. Common element with this reproduction is BOOLEAN field. It worked without BOOLEAN as well. After quick investigation of the looping code, I'v

[GitHub] [arrow] westonpace commented on a change in pull request #10401: ARROW-12878: [C++] Generalize thread pool to allow for different queuing strategies / worker loops

2021-06-09 Thread GitBox
westonpace commented on a change in pull request #10401: URL: https://github.com/apache/arrow/pull/10401#discussion_r648533784 ## File path: cpp/src/arrow/util/future_test.cc ## @@ -106,12 +107,18 @@ template class SimpleExecutor { public: explicit SimpleExecutor(int nfu

[GitHub] [arrow-rs] Pand9 commented on issue #349: parquet reading hangs when row_group contains more than 2048 rows of data

2021-06-09 Thread GitBox
Pand9 commented on issue #349: URL: https://github.com/apache/arrow-rs/issues/349#issuecomment-857903895 I've also just encountered it. Common element with this reproduction is BOOLEAN field. It worked without BOOLEAN as well. -- This is an automated message from the Apache Git Service.

[GitHub] [arrow] westonpace commented on pull request #10401: ARROW-12878: [C++] Generalize thread pool to allow for different queuing strategies / worker loops

2021-06-09 Thread GitBox
westonpace commented on pull request #10401: URL: https://github.com/apache/arrow/pull/10401#issuecomment-857902006 @pitrou Unfortunately, it passes when run locally in that way. My development system is also Ubuntu 20.04 so I've tried various stress runs but haven't had any luck. I'm ho

[GitHub] [arrow] pitrou commented on pull request #10401: ARROW-12878: [C++] Generalize thread pool to allow for different queuing strategies / worker loops

2021-06-09 Thread GitBox
pitrou commented on pull request #10401: URL: https://github.com/apache/arrow/pull/10401#issuecomment-857898290 @westonpace I would recommend running the ASAN build locally using `archery docker ...`. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] lidavidm commented on pull request #10494: ARROW-12948: [C++][Python] Add slice_replace kernel

2021-06-09 Thread GitBox
lidavidm commented on pull request #10494: URL: https://github.com/apache/arrow/pull/10494#issuecomment-857897413 Needs rebasing onto #10496. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] pitrou closed pull request #10176: ARROW-11759: [C++] Kernel to extract datetime components (year, month, day, etc) from timestamp type

2021-06-09 Thread GitBox
pitrou closed pull request #10176: URL: https://github.com/apache/arrow/pull/10176 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [arrow] pitrou commented on pull request #10496: ARROW-12951: [C++] Reduce generated code size for string kernels

2021-06-09 Thread GitBox
pitrou commented on pull request #10496: URL: https://github.com/apache/arrow/pull/10496#issuecomment-857889228 This reduces the code size of `compute/kernels/scalar_string.cc.o` by about 5% (in release mode). Not a terrific improvement, but a worthwhile cleanup IMHO. -- This is an autom

[GitHub] [arrow] westonpace commented on issue #10492: Doc update ? For Reading and Writing the Apache Parquet Format

2021-06-09 Thread GitBox
westonpace commented on issue #10492: URL: https://github.com/apache/arrow/issues/10492#issuecomment-857888278 > Cannot submit a bug since it's not especially a direct issue but it's more something not complete or up to date in the documentation Please do create a JIRA issue. Arrow uses

[GitHub] [arrow] westonpace edited a comment on issue #10492: Doc update ? For Reading and Writing the Apache Parquet Format

2021-06-09 Thread GitBox
westonpace edited a comment on issue #10492: URL: https://github.com/apache/arrow/issues/10492#issuecomment-857888278 > Cannot submit a bug since it's not especially a direct issue but it's more something not complete or up to date in the documentation Please do create a JIRA issue.

[GitHub] [arrow] pitrou opened a new pull request #10496: ARROW-12951: [C++] Reduce generated code size for string kernels

2021-06-09 Thread GitBox
pitrou opened a new pull request #10496: URL: https://github.com/apache/arrow/pull/10496 Factor out type-agnostic string operations (such as finding a split pattern) in separate classes to avoid generating several versions of them when generating the typed kernel execution classes. -

[GitHub] [arrow] rok commented on pull request #10176: ARROW-11759: [C++] Kernel to extract datetime components (year, month, day, etc) from timestamp type

2021-06-09 Thread GitBox
rok commented on pull request #10176: URL: https://github.com/apache/arrow/pull/10176#issuecomment-857861996 Actually @jorisvandenbossche what do you mean by "doesn't fit into the nanosecond range"? Like ` "1970-01-01T00:00:59.1234567890123"? -- This is an automated message from the A

[GitHub] [arrow] rok commented on a change in pull request #10476: ARROW-12499: [C++][Compute] Add ScalarAggregateOptions to Any and All kernels

2021-06-09 Thread GitBox
rok commented on a change in pull request #10476: URL: https://github.com/apache/arrow/pull/10476#discussion_r648471718 ## File path: cpp/src/arrow/compute/kernels/aggregate_basic.cc ## @@ -166,32 +168,48 @@ struct BooleanAnyImpl : public ScalarAggregator { Status MergeFrom(

[GitHub] [arrow] rok commented on pull request #10176: ARROW-11759: [C++] Kernel to extract datetime components (year, month, day, etc) from timestamp type

2021-06-09 Thread GitBox
rok commented on pull request #10176: URL: https://github.com/apache/arrow/pull/10176#issuecomment-857833357 Just noticed I missed this: > @jorisvandenbossche > Also, it would maybe be good to add a test for a timestamp that doesn't fit into the nanosecond range? Do you mind if

[GitHub] [arrow] revit13 closed pull request #8286: ARROW-9960: [C++] Enable external material and rotation for encryption keys

2021-06-09 Thread GitBox
revit13 closed pull request #8286: URL: https://github.com/apache/arrow/pull/8286 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please c

[GitHub] [arrow] zeroshade commented on a change in pull request #10379: ARROW-12851: [Go][Parquet] Add Golang Parquet encoding package

2021-06-09 Thread GitBox
zeroshade commented on a change in pull request #10379: URL: https://github.com/apache/arrow/pull/10379#discussion_r648390799 ## File path: go/parquet/internal/encoding/delta_bit_packing.go ## @@ -0,0 +1,514 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

[GitHub] [arrow] zeroshade commented on a change in pull request #10379: ARROW-12851: [Go][Parquet] Add Golang Parquet encoding package

2021-06-09 Thread GitBox
zeroshade commented on a change in pull request #10379: URL: https://github.com/apache/arrow/pull/10379#discussion_r648390452 ## File path: go/parquet/internal/encoding/decoder.go ## @@ -0,0 +1,178 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more co

  1   2   >