[GitHub] [arrow] edponce commented on a change in pull request #11447: ARROW-11238: [Python] Make SubTreeFileSystem print method more informative

2021-10-18 Thread GitBox
edponce commented on a change in pull request #11447: URL: https://github.com/apache/arrow/pull/11447#discussion_r731546683 ## File path: python/pyarrow/_fs.pyx ## @@ -833,6 +833,12 @@ cdef class SubTreeFileSystem(FileSystem): FileSystem.init(self, wrapped) se

[GitHub] [arrow-rs] novemberkilo commented on a change in pull request #832: Use kernel utility for parsing timestamps in csv reader.

2021-10-18 Thread GitBox
novemberkilo commented on a change in pull request #832: URL: https://github.com/apache/arrow-rs/pull/832#discussion_r731542422 ## File path: arrow/src/csv/reader.rs ## @@ -1371,6 +1372,95 @@ mod tests { ); } +/// Interprets a naive_datetime (with no explici

[GitHub] [arrow] edponce commented on a change in pull request #11447: ARROW-11238: [Python] Make SubTreeFileSystem print method more informative

2021-10-18 Thread GitBox
edponce commented on a change in pull request #11447: URL: https://github.com/apache/arrow/pull/11447#discussion_r731539352 ## File path: python/pyarrow/tests/test_fs.py ## @@ -587,11 +587,14 @@ def test_subtree_filesystem(): subfs = SubTreeFileSystem('/base', localfs)

[GitHub] [arrow] cyb70289 commented on pull request #11458: ARROW-14341: [C++] Improve decimal benchmark

2021-10-18 Thread GitBox
cyb70289 commented on pull request #11458: URL: https://github.com/apache/arrow/pull/11458#issuecomment-946408775 A bit surprised that gcc is much slower (~0.5x) than clang in Add256 and Multiply256 tests on xeon gold 5218. No obvious difference is observed on arm64 neoverse n1 between g

[GitHub] [arrow] AlenkaF commented on a change in pull request #11447: ARROW-11238: [Python] Make SubTreeFileSystem print method more informative

2021-10-18 Thread GitBox
AlenkaF commented on a change in pull request #11447: URL: https://github.com/apache/arrow/pull/11447#discussion_r731534033 ## File path: python/pyarrow/_fs.pyx ## @@ -833,6 +833,12 @@ cdef class SubTreeFileSystem(FileSystem): FileSystem.init(self, wrapped) se

[GitHub] [arrow] edponce commented on a change in pull request #11447: ARROW-11238: [Python] Make SubTreeFileSystem print method more informative

2021-10-18 Thread GitBox
edponce commented on a change in pull request #11447: URL: https://github.com/apache/arrow/pull/11447#discussion_r731530446 ## File path: python/pyarrow/_fs.pyx ## @@ -833,6 +833,12 @@ cdef class SubTreeFileSystem(FileSystem): FileSystem.init(self, wrapped) se

[GitHub] [arrow-datafusion] Dandandan edited a comment on pull request #1143: Add output_partitions_size for CoalescePartitionsExec

2021-10-18 Thread GitBox
Dandandan edited a comment on pull request #1143: URL: https://github.com/apache/arrow-datafusion/pull/1143#issuecomment-946377551 In Spark, repartition is using `coalesce` by setting parameter `shuffle=true`. I think it might be cleaner to keep the `RepartitionExec` and `CoalescePartit

[GitHub] [arrow] edponce commented on a change in pull request #11447: ARROW-11238: [Python] Make SubTreeFileSystem print method more informative

2021-10-18 Thread GitBox
edponce commented on a change in pull request #11447: URL: https://github.com/apache/arrow/pull/11447#discussion_r731530446 ## File path: python/pyarrow/_fs.pyx ## @@ -833,6 +833,12 @@ cdef class SubTreeFileSystem(FileSystem): FileSystem.init(self, wrapped) se

[GitHub] [arrow] github-actions[bot] commented on pull request #11456: ARROW-14361: [C++] Add default simd level

2021-10-18 Thread GitBox
github-actions[bot] commented on pull request #11456: URL: https://github.com/apache/arrow/pull/11456#issuecomment-946402305 Revision: 9e51f11644cc8ea87514265fd71ec02321d39ac2 Submitted crossbow builds: [ursacomputing/crossbow @ actions-1001](https://github.com/ursacomputing/crossbow

[GitHub] [arrow] github-actions[bot] commented on pull request #11456: ARROW-14361: [C++] Add default simd level

2021-10-18 Thread GitBox
github-actions[bot] commented on pull request #11456: URL: https://github.com/apache/arrow/pull/11456#issuecomment-946401995 Revision: 9e51f11644cc8ea87514265fd71ec02321d39ac2 Submitted crossbow builds: [ursacomputing/crossbow @ actions-1000](https://github.com/ursacomputing/crossbow

[GitHub] [arrow] cyb70289 removed a comment on pull request #11456: ARROW-14361: [C++] Add default simd level

2021-10-18 Thread GitBox
cyb70289 removed a comment on pull request #11456: URL: https://github.com/apache/arrow/pull/11456#issuecomment-946398267 @github-actions crossbow submit wheel-macos-*-arm64 wheel-macos-*-universal2 -- This is an automated message from the Apache Git Service. To respond to the message, pl

[GitHub] [arrow] cyb70289 commented on pull request #11456: ARROW-14361: [C++] Add default simd level

2021-10-18 Thread GitBox
cyb70289 commented on pull request #11456: URL: https://github.com/apache/arrow/pull/11456#issuecomment-946401898 @github-actions crossbow submit wheel-macos-*-arm64 wheel-macos-*-universal2 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [arrow] cyb70289 edited a comment on pull request #11456: ARROW-14361: [C++] Add default simd level

2021-10-18 Thread GitBox
cyb70289 edited a comment on pull request #11456: URL: https://github.com/apache/arrow/pull/11456#issuecomment-946398267 @github-actions crossbow submit wheel-macos-*-arm64 wheel-macos-*-universal2 -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [arrow] AlenkaF commented on a change in pull request #11447: ARROW-11238: [Python] Make SubTreeFileSystem print method more informative

2021-10-18 Thread GitBox
AlenkaF commented on a change in pull request #11447: URL: https://github.com/apache/arrow/pull/11447#discussion_r731527086 ## File path: python/pyarrow/tests/test_fs.py ## @@ -587,11 +587,14 @@ def test_subtree_filesystem(): subfs = SubTreeFileSystem('/base', localfs)

[GitHub] [arrow] github-actions[bot] commented on pull request #11458: ARROW-14341: [C++] Improve decimal benchmark

2021-10-18 Thread GitBox
github-actions[bot] commented on pull request #11458: URL: https://github.com/apache/arrow/pull/11458#issuecomment-946399777 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [arrow] cyb70289 opened a new pull request #11458: ARROW-14341: [C++] Improve decimal benchmark

2021-10-18 Thread GitBox
cyb70289 opened a new pull request #11458: URL: https://github.com/apache/arrow/pull/11458 Separate add/mul/div binary math benchmarks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #11447: ARROW-11238: [Python] Make SubTreeFileSystem print method more informative

2021-10-18 Thread GitBox
jorisvandenbossche commented on a change in pull request #11447: URL: https://github.com/apache/arrow/pull/11447#discussion_r731524772 ## File path: python/pyarrow/tests/test_fs.py ## @@ -587,11 +587,14 @@ def test_subtree_filesystem(): subfs = SubTreeFileSystem('/base', l

[GitHub] [arrow] cyb70289 commented on pull request #11456: ARROW-14361: [C++] Add default simd level

2021-10-18 Thread GitBox
cyb70289 commented on pull request #11456: URL: https://github.com/apache/arrow/pull/11456#issuecomment-946398267 @github-actions crossbow submit wheel-macos--arm64 wheel-macos--universal2 -- This is an automated message from the Apache Git Service. To respond to the message, please log o

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #11447: ARROW-11238: [Python] Make SubTreeFileSystem print method more informative

2021-10-18 Thread GitBox
jorisvandenbossche commented on a change in pull request #11447: URL: https://github.com/apache/arrow/pull/11447#discussion_r731524772 ## File path: python/pyarrow/tests/test_fs.py ## @@ -587,11 +587,14 @@ def test_subtree_filesystem(): subfs = SubTreeFileSystem('/base', l

[GitHub] [arrow] github-actions[bot] commented on pull request #11457: ARROW-13784: [Python] Table.from_arrays should raise an error when array is empty but names is not

2021-10-18 Thread GitBox
github-actions[bot] commented on pull request #11457: URL: https://github.com/apache/arrow/pull/11457#issuecomment-946396145 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [arrow] AlenkaF opened a new pull request #11457: ARROW-13784: [Python] Table.from_arrays should raise an error when array is empty but names is not

2021-10-18 Thread GitBox
AlenkaF opened a new pull request #11457: URL: https://github.com/apache/arrow/pull/11457 We already check that the list of arrays and list of names should have the same length in `table.pxi`. For a special case when a list of arrays is of length 0 but length of list of names is greater

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #11447: ARROW-11238: [Python] Make SubTreeFileSystem print method more informative

2021-10-18 Thread GitBox
jorisvandenbossche commented on a change in pull request #11447: URL: https://github.com/apache/arrow/pull/11447#discussion_r731521020 ## File path: python/pyarrow/_fs.pyx ## @@ -833,6 +833,12 @@ cdef class SubTreeFileSystem(FileSystem): FileSystem.init(self, wrapped)

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #11447: ARROW-11238: [Python] Make SubTreeFileSystem print method more informative

2021-10-18 Thread GitBox
jorisvandenbossche commented on a change in pull request #11447: URL: https://github.com/apache/arrow/pull/11447#discussion_r731520373 ## File path: python/pyarrow/_fs.pyx ## @@ -833,6 +833,12 @@ cdef class SubTreeFileSystem(FileSystem): FileSystem.init(self, wrapped)

[GitHub] [arrow-datafusion] houqp edited a comment on issue #958: Add support for parsing timestamps from CSV files

2021-10-18 Thread GitBox
houqp edited a comment on issue #958: URL: https://github.com/apache/arrow-datafusion/issues/958#issuecomment-946388821 @novemberkilo since apache/arrow-rs#832 doesn't break any public api, it will be released as part of arrrow 6.x. @alamb already have a PR ready to merge for arrow-rs 6.x

[GitHub] [arrow-datafusion] houqp commented on issue #958: Add support for parsing timestamps from CSV files

2021-10-18 Thread GitBox
houqp commented on issue #958: URL: https://github.com/apache/arrow-datafusion/issues/958#issuecomment-946388821 @novemberkilo since apache/arrow-rs#832 doesn't break any public api, it will be released as part of arrrow 6.x. @alamb already have a PR ready to merge for arrow-rs 6.x integr

[GitHub] [arrow-datafusion] houqp closed issue #204: Add support for partition pruning

2021-10-18 Thread GitBox
houqp closed issue #204: URL: https://github.com/apache/arrow-datafusion/issues/204 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

[GitHub] [arrow-datafusion] houqp commented on issue #204: Add support for partition pruning

2021-10-18 Thread GitBox
houqp commented on issue #204: URL: https://github.com/apache/arrow-datafusion/issues/204#issuecomment-946386345 thanks @rdettai ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [arrow] ursabot edited a comment on pull request #11294: ARROW-14192: [C++][Dataset] Backpressure broken on ordered scans

2021-10-18 Thread GitBox
ursabot edited a comment on pull request #11294: URL: https://github.com/apache/arrow/pull/11294#issuecomment-946342917 Benchmark runs are scheduled for baseline = f2f663be0a87e13c9cd5403dea51379deb4cf04d and contender = 9abd2b140813dfa941a592764ea07d38d2f0644e. 9abd2b140813dfa941a592764e

[GitHub] [arrow] ursabot edited a comment on pull request #11294: ARROW-14192: [C++][Dataset] Backpressure broken on ordered scans

2021-10-18 Thread GitBox
ursabot edited a comment on pull request #11294: URL: https://github.com/apache/arrow/pull/11294#issuecomment-946342917 Benchmark runs are scheduled for baseline = f2f663be0a87e13c9cd5403dea51379deb4cf04d and contender = 9abd2b140813dfa941a592764ea07d38d2f0644e. 9abd2b140813dfa941a592764e

[GitHub] [arrow-datafusion] Dandandan commented on pull request #1143: Add output_partitions_size for CoalescePartitionsExec

2021-10-18 Thread GitBox
Dandandan commented on pull request #1143: URL: https://github.com/apache/arrow-datafusion/pull/1143#issuecomment-946377551 In Spark, repartition is using `coalesce` by setting parameter `shuffle=true`. I think it might be cleaner to keep the `Repartition` and `CoalescePartitions separa

[GitHub] [arrow] ursabot edited a comment on pull request #11399: ARROW-14291: [CI][C++] Add cpp/examples/ files to lint targets

2021-10-18 Thread GitBox
ursabot edited a comment on pull request #11399: URL: https://github.com/apache/arrow/pull/11399#issuecomment-946366091 Benchmark runs are scheduled for baseline = d9ef519f458fc5989fd15f0af49f069c34110c35 and contender = b3767f78aacf7537b967f6c2f2f89cface3f6952. b3767f78aacf7537b967f6c2f2

[GitHub] [arrow-datafusion] houqp commented on pull request #1143: Add output_partitions_size for CoalescePartitionsExec

2021-10-18 Thread GitBox
houqp commented on pull request #1143: URL: https://github.com/apache/arrow-datafusion/pull/1143#issuecomment-946372811 I think the main difference is coalesce doesn't perform any shuffle while repartition does it depending on the partitioning scheme. This distinction comes from spark's

[GitHub] [arrow] ursabot edited a comment on pull request #11350: ARROW-14211: [C++][Compute] Fixing thread sanitizer problems in hash join node

2021-10-18 Thread GitBox
ursabot edited a comment on pull request #11350: URL: https://github.com/apache/arrow/pull/11350#issuecomment-946364232 Benchmark runs are scheduled for baseline = 6e1293b8b492d02a31961997173128651be62b9a and contender = d9ef519f458fc5989fd15f0af49f069c34110c35. d9ef519f458fc5989fd15f0af4

[GitHub] [arrow] ursabot edited a comment on pull request #11453: ARROW-14368: [CI] Use ubuntu-latest for Azure Pipelines

2021-10-18 Thread GitBox
ursabot edited a comment on pull request #11453: URL: https://github.com/apache/arrow/pull/11453#issuecomment-946362721 Benchmark runs are scheduled for baseline = 3723534d3f2cd95f5080366df956997b9c27107e and contender = 6e1293b8b492d02a31961997173128651be62b9a. 6e1293b8b492d02a3196199717

[GitHub] [arrow] ursabot commented on pull request #11399: ARROW-14291: [CI][C++] Add cpp/examples/ files to lint targets

2021-10-18 Thread GitBox
ursabot commented on pull request #11399: URL: https://github.com/apache/arrow/pull/11399#issuecomment-946366091 Benchmark runs are scheduled for baseline = d9ef519f458fc5989fd15f0af49f069c34110c35 and contender = b3767f78aacf7537b967f6c2f2f89cface3f6952. b3767f78aacf7537b967f6c2f2f89cfac

[GitHub] [arrow] kou closed pull request #11399: ARROW-14291: [CI][C++] Add cpp/examples/ files to lint targets

2021-10-18 Thread GitBox
kou closed pull request #11399: URL: https://github.com/apache/arrow/pull/11399 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

[GitHub] [arrow] ursabot edited a comment on pull request #11454: ARROW-14369: [C++][Python] Use std::move() explicitly for g++ 4.8.5

2021-10-18 Thread GitBox
ursabot edited a comment on pull request #11454: URL: https://github.com/apache/arrow/pull/11454#issuecomment-946362096 Benchmark runs are scheduled for baseline = 9abd2b140813dfa941a592764ea07d38d2f0644e and contender = 3723534d3f2cd95f5080366df956997b9c27107e. 3723534d3f2cd95f5080366df9

[GitHub] [arrow] ursabot commented on pull request #11350: ARROW-14211: [C++][Compute] Fixing thread sanitizer problems in hash join node

2021-10-18 Thread GitBox
ursabot commented on pull request #11350: URL: https://github.com/apache/arrow/pull/11350#issuecomment-946364232 Benchmark runs are scheduled for baseline = 6e1293b8b492d02a31961997173128651be62b9a and contender = d9ef519f458fc5989fd15f0af49f069c34110c35. d9ef519f458fc5989fd15f0af49f069c3

[GitHub] [arrow] edponce commented on a change in pull request #11023: ARROW-12712: [C++] String repeat kernel

2021-10-18 Thread GitBox
edponce commented on a change in pull request #11023: URL: https://github.com/apache/arrow/pull/11023#discussion_r731491178 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -417,6 +419,231 @@ struct StringTransformExecWithState } }; +struct StringBinaryT

[GitHub] [arrow] kou closed pull request #11350: ARROW-14211: [C++][Compute] Fixing thread sanitizer problems in hash join node

2021-10-18 Thread GitBox
kou closed pull request #11350: URL: https://github.com/apache/arrow/pull/11350 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

[GitHub] [arrow] kou commented on pull request #11350: ARROW-14211: [C++][Compute] Fixing thread sanitizer problems in hash join node

2021-10-18 Thread GitBox
kou commented on pull request #11350: URL: https://github.com/apache/arrow/pull/11350#issuecomment-946363711 OK. I merge this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[GitHub] [arrow] github-actions[bot] commented on pull request #11448: ARROW-14364: [CI][C++] Support LLVM 13

2021-10-18 Thread GitBox
github-actions[bot] commented on pull request #11448: URL: https://github.com/apache/arrow/pull/11448#issuecomment-946363721 Revision: 68c49b7f97380af3d68423b1a44af13c9e662ab7 Submitted crossbow builds: [ursacomputing/crossbow @ actions-999](https://github.com/ursacomputing/crossbow/

[GitHub] [arrow] kou commented on pull request #11448: ARROW-14364: [CI][C++] Support LLVM 13

2021-10-18 Thread GitBox
kou commented on pull request #11448: URL: https://github.com/apache/arrow/pull/11448#issuecomment-946363433 @github-actions crossbow submit -g nightly -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow] ursabot commented on pull request #11453: ARROW-14368: [CI] Use ubuntu-latest for Azure Pipelines

2021-10-18 Thread GitBox
ursabot commented on pull request #11453: URL: https://github.com/apache/arrow/pull/11453#issuecomment-946362721 Benchmark runs are scheduled for baseline = 3723534d3f2cd95f5080366df956997b9c27107e and contender = 6e1293b8b492d02a31961997173128651be62b9a. 6e1293b8b492d02a31961997173128651

[GitHub] [arrow] kou closed pull request #11453: ARROW-14368: [CI] Use ubuntu-latest for Azure Pipelines

2021-10-18 Thread GitBox
kou closed pull request #11453: URL: https://github.com/apache/arrow/pull/11453 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

[GitHub] [arrow] kou commented on pull request #11453: ARROW-14368: [CI] Use ubuntu-latest for Azure Pipelines

2021-10-18 Thread GitBox
kou commented on pull request #11453: URL: https://github.com/apache/arrow/pull/11453#issuecomment-946362394 +1 No canceled Azure Pipelines job. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow] ursabot commented on pull request #11454: ARROW-14369: [C++][Python] Use std::move() explicitly for g++ 4.8.5

2021-10-18 Thread GitBox
ursabot commented on pull request #11454: URL: https://github.com/apache/arrow/pull/11454#issuecomment-946362096 Benchmark runs are scheduled for baseline = 9abd2b140813dfa941a592764ea07d38d2f0644e and contender = 3723534d3f2cd95f5080366df956997b9c27107e. 3723534d3f2cd95f5080366df956997b9

[GitHub] [arrow] kou closed pull request #11454: ARROW-14369: [C++][Python] Use std::move() explicitly for g++ 4.8.5

2021-10-18 Thread GitBox
kou closed pull request #11454: URL: https://github.com/apache/arrow/pull/11454 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

[GitHub] [arrow] kou commented on pull request #11454: ARROW-14369: [C++][Python] Use std::move() explicitly for g++ 4.8.5

2021-10-18 Thread GitBox
kou commented on pull request #11454: URL: https://github.com/apache/arrow/pull/11454#issuecomment-946361684 +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

[GitHub] [arrow] github-actions[bot] commented on pull request #11456: ARROW-14361: [C++] Add default simd level

2021-10-18 Thread GitBox
github-actions[bot] commented on pull request #11456: URL: https://github.com/apache/arrow/pull/11456#issuecomment-946356476 https://issues.apache.org/jira/browse/ARROW-14361 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [arrow] michalursa commented on pull request #11350: ARROW-14211: [C++][Compute] Fixing thread sanitizer problems in hash join node

2021-10-18 Thread GitBox
michalursa commented on pull request #11350: URL: https://github.com/apache/arrow/pull/11350#issuecomment-946350905 > @michalursa Can we merge this? @kou Yes, I believe so. This is a relatively simple change (more precisely 3 simple fixes put together) and I don't see any errors that

[GitHub] [arrow] ursabot edited a comment on pull request #11294: ARROW-14192: [C++][Dataset] Backpressure broken on ordered scans

2021-10-18 Thread GitBox
ursabot edited a comment on pull request #11294: URL: https://github.com/apache/arrow/pull/11294#issuecomment-946342917 Benchmark runs are scheduled for baseline = f2f663be0a87e13c9cd5403dea51379deb4cf04d and contender = 9abd2b140813dfa941a592764ea07d38d2f0644e. 9abd2b140813dfa941a592764e

[GitHub] [arrow] ursabot commented on pull request #11294: ARROW-14192: [C++][Dataset] Backpressure broken on ordered scans

2021-10-18 Thread GitBox
ursabot commented on pull request #11294: URL: https://github.com/apache/arrow/pull/11294#issuecomment-946342917 Benchmark runs are scheduled for baseline = f2f663be0a87e13c9cd5403dea51379deb4cf04d and contender = 9abd2b140813dfa941a592764ea07d38d2f0644e. 9abd2b140813dfa941a592764ea07d38d

[GitHub] [arrow] westonpace closed pull request #11294: ARROW-14192: [C++][Dataset] Backpressure broken on ordered scans

2021-10-18 Thread GitBox
westonpace closed pull request #11294: URL: https://github.com/apache/arrow/pull/11294 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsu

[GitHub] [arrow] westonpace commented on pull request #11294: ARROW-14192: [C++][Dataset] Backpressure broken on ordered scans

2021-10-18 Thread GitBox
westonpace commented on pull request #11294: URL: https://github.com/apache/arrow/pull/11294#issuecomment-946342174 CI failures are unrelated (could not find LLVM) and these tests passed on earlier versions. I will proceed with merging. -- This is an automated message from the Apache Gi

[GitHub] [arrow] ursabot edited a comment on pull request #11414: MINOR: [R] cleanup some notes in our checks

2021-10-18 Thread GitBox
ursabot edited a comment on pull request #11414: URL: https://github.com/apache/arrow/pull/11414#issuecomment-946186348 Benchmark runs are scheduled for baseline = c6fdeaf9fb85622242963dc28660e9592088986c and contender = f2f663be0a87e13c9cd5403dea51379deb4cf04d. f2f663be0a87e13c9cd5403dea

[GitHub] [arrow-datafusion] yjshen commented on pull request #1104: Add Roadmap to Documentation

2021-10-18 Thread GitBox
yjshen commented on pull request #1104: URL: https://github.com/apache/arrow-datafusion/pull/1104#issuecomment-946339508 > also cc @yjshen in case we missed any item needed from your native spark executor work. Thanks, @houqp. I think what I need most is covered by the `Resource Man

[GitHub] [arrow] edponce commented on a change in pull request #11023: ARROW-12712: [C++] String repeat kernel

2021-10-18 Thread GitBox
edponce commented on a change in pull request #11023: URL: https://github.com/apache/arrow/pull/11023#discussion_r731465028 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -417,6 +419,231 @@ struct StringTransformExecWithState } }; +struct StringBinaryT

[GitHub] [arrow-datafusion] xudong963 edited a comment on pull request #1143: Add output_partitions_size for CoalescePartitionsExec

2021-10-18 Thread GitBox
xudong963 edited a comment on pull request #1143: URL: https://github.com/apache/arrow-datafusion/pull/1143#issuecomment-946313708 Thanks for your feedback! @alamb Sorry, I didn't notice `RepartitionExec` before. After looking through `RepartitionExec`, I agree with you! Tho

[GitHub] [arrow] github-actions[bot] commented on pull request #11455: ARROW-13668: [Python] Add `write_batch` and `write` methods to `ParquetWriter`

2021-10-18 Thread GitBox
github-actions[bot] commented on pull request #11455: URL: https://github.com/apache/arrow/pull/11455#issuecomment-946318508 https://issues.apache.org/jira/browse/ARROW-13668 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [arrow] save-buffer opened a new pull request #11455: ARROW-13668: [Python] Add `write_batch` and `write` methods to `ParquetWriter`

2021-10-18 Thread GitBox
save-buffer opened a new pull request #11455: URL: https://github.com/apache/arrow/pull/11455 Also adds a small test to make sure these methods do the right thing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[GitHub] [arrow-datafusion] xudong963 commented on pull request #1143: Add output_partitions_size for CoalescePartitionsExec

2021-10-18 Thread GitBox
xudong963 commented on pull request #1143: URL: https://github.com/apache/arrow-datafusion/pull/1143#issuecomment-946313708 Thanks for your feedback! @alamb I didn't notice `RepartitionExec` before. After looking through `RepartitionExec`, I agree with you! Thoughts? @houqp

[GitHub] [arrow-datafusion] xudong963 commented on a change in pull request #1104: Add Roadmap to Documentation

2021-10-18 Thread GitBox
xudong963 commented on a change in pull request #1104: URL: https://github.com/apache/arrow-datafusion/pull/1104#discussion_r731440525 ## File path: docs/source/specification/roadmap.md ## @@ -0,0 +1,93 @@ + + +# Roadmap + +This document describes high level goals of the DataFu

[GitHub] [arrow] ursabot edited a comment on pull request #11414: MINOR: [R] cleanup some notes in our checks

2021-10-18 Thread GitBox
ursabot edited a comment on pull request #11414: URL: https://github.com/apache/arrow/pull/11414#issuecomment-946186348 Benchmark runs are scheduled for baseline = c6fdeaf9fb85622242963dc28660e9592088986c and contender = f2f663be0a87e13c9cd5403dea51379deb4cf04d. f2f663be0a87e13c9cd5403dea

[GitHub] [arrow] ursabot edited a comment on pull request #11338: ARROW-14239: [R] Don't use rlang::as_label

2021-10-18 Thread GitBox
ursabot edited a comment on pull request #11338: URL: https://github.com/apache/arrow/pull/11338#issuecomment-946004212 Benchmark runs are scheduled for baseline = 776d81c2c992acbe9e72bf26a908dd4a137d8ad1 and contender = c6fdeaf9fb85622242963dc28660e9592088986c. c6fdeaf9fb85622242963dc286

[GitHub] [arrow] github-actions[bot] commented on pull request #11454: ARROW-14369: [C++][Python] Use std::move() explicitly for g++ 4.8.5

2021-10-18 Thread GitBox
github-actions[bot] commented on pull request #11454: URL: https://github.com/apache/arrow/pull/11454#issuecomment-946287838 Revision: e26426e55c621a0bcabd85bfeaf5c0f9a4a754a7 Submitted crossbow builds: [ursacomputing/crossbow @ actions-998](https://github.com/ursacomputing/crossbow/

[GitHub] [arrow] github-actions[bot] commented on pull request #11454: ARROW-14369: [C++][Python] Use std::move() explicitly for g++ 4.8.5

2021-10-18 Thread GitBox
github-actions[bot] commented on pull request #11454: URL: https://github.com/apache/arrow/pull/11454#issuecomment-946287218 https://issues.apache.org/jira/browse/ARROW-14369 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [arrow] kou commented on pull request #11454: ARROW-14369: [C++][Python] Use std::move() explicitly for g++ 4.8.5

2021-10-18 Thread GitBox
kou commented on pull request #11454: URL: https://github.com/apache/arrow/pull/11454#issuecomment-946287206 @github-actions crossbow submit -g nightly -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow] JayjeetAtGithub commented on pull request #10913: ARROW-13607: [C++] Add Skyhook to Arrow

2021-10-18 Thread GitBox
JayjeetAtGithub commented on pull request #10913: URL: https://github.com/apache/arrow/pull/10913#issuecomment-946275767 @kou @lidavidm @westonpace You must be seeing all the commits squashed into a single commit, that is because the git user on my laptop got messed up and when I rebased a

[GitHub] [arrow] kou commented on pull request #11448: ARROW-14364: [CI][C++] Support LLVM 13

2021-10-18 Thread GitBox
kou commented on pull request #11448: URL: https://github.com/apache/arrow/pull/11448#issuecomment-946270999 @kszucs Can we remove empty `dev/tasks/conda-recipes/azure.yml` ? https://github.com/apache/arrow/commit/15f4e56c8f3959d3bdf67e9a2f9e23fd1a5131f8#diff-59bfcc47236440735af1377f7ba0a9

[GitHub] [arrow] github-actions[bot] commented on pull request #10913: ARROW-13607: [C++] Add Skyhook to Arrow

2021-10-18 Thread GitBox
github-actions[bot] commented on pull request #10913: URL: https://github.com/apache/arrow/pull/10913#issuecomment-946268666 Revision: 1161d327ac22fbf4570b6b684315d3f8d112cbb7 Submitted crossbow builds: [ursacomputing/crossbow @ actions-997](https://github.com/ursacomputing/crossbow/

[GitHub] [arrow] kou commented on pull request #10913: ARROW-13607: [C++] Add Skyhook to Arrow

2021-10-18 Thread GitBox
kou commented on pull request #10913: URL: https://github.com/apache/arrow/pull/10913#issuecomment-946268311 @github-actions crossbow submit test-skyhook-integration -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow] kou commented on pull request #11350: ARROW-14211: [C++][Compute] Fixing thread sanitizer problems in hash join node

2021-10-18 Thread GitBox
kou commented on pull request #11350: URL: https://github.com/apache/arrow/pull/11350#issuecomment-946267294 @michalursa Can we merge this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [arrow] kou commented on pull request #11448: ARROW-14364: [CI][C++] Support LLVM 13

2021-10-18 Thread GitBox
kou commented on pull request #11448: URL: https://github.com/apache/arrow/pull/11448#issuecomment-946266754 Canceled Azure Pipelines jobs: https://github.com/apache/arrow/pull/11453 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[GitHub] [arrow] kou commented on pull request #11448: ARROW-14364: [CI][C++] Support LLVM 13

2021-10-18 Thread GitBox
kou commented on pull request #11448: URL: https://github.com/apache/arrow/pull/11448#issuecomment-946266576 The centos-7-amd64 failure: ARROW-14369 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] ursabot edited a comment on pull request #11338: ARROW-14239: [R] Don't use rlang::as_label

2021-10-18 Thread GitBox
ursabot edited a comment on pull request #11338: URL: https://github.com/apache/arrow/pull/11338#issuecomment-946004212 Benchmark runs are scheduled for baseline = 776d81c2c992acbe9e72bf26a908dd4a137d8ad1 and contender = c6fdeaf9fb85622242963dc28660e9592088986c. c6fdeaf9fb85622242963dc286

[GitHub] [arrow] github-actions[bot] commented on pull request #11453: ARROW-14368: [CI] Use ubuntu-latest for Azure Pipelines

2021-10-18 Thread GitBox
github-actions[bot] commented on pull request #11453: URL: https://github.com/apache/arrow/pull/11453#issuecomment-946262053 Revision: fa20ad7f0f16beb922ee8b9335461b90029e9771 Submitted crossbow builds: [ursacomputing/crossbow @ actions-996](https://github.com/ursacomputing/crossbow/

[GitHub] [arrow] github-actions[bot] commented on pull request #11453: ARROW-14368: [CI] Use ubuntu-latest for Azure Pipelines

2021-10-18 Thread GitBox
github-actions[bot] commented on pull request #11453: URL: https://github.com/apache/arrow/pull/11453#issuecomment-946261574 https://issues.apache.org/jira/browse/ARROW-14368 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [arrow] kou commented on pull request #11453: ARROW-14368: [CI] Use ubuntu-latest for Azure Pipelines

2021-10-18 Thread GitBox
kou commented on pull request #11453: URL: https://github.com/apache/arrow/pull/11453#issuecomment-946261506 @github-actions crossbow submit -g nightly -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow] kou opened a new pull request #11453: ARROW-14368: [CI] Use ubuntu-latest for Azure Pipelines

2021-10-18 Thread GitBox
kou opened a new pull request #11453: URL: https://github.com/apache/arrow/pull/11453 ubuntu-16.04 isn't available. https://docs.microsoft.com/en-us/azure/devops/release-notes/2021/pipelines/sprint-193-update > The removal of the image is planned for October 18th. -- This i

[GitHub] [arrow] edponce commented on a change in pull request #11023: ARROW-12712: [C++] String repeat kernel

2021-10-18 Thread GitBox
edponce commented on a change in pull request #11023: URL: https://github.com/apache/arrow/pull/11023#discussion_r731395418 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -417,6 +419,231 @@ struct StringTransformExecWithState } }; +struct StringBinaryT

[GitHub] [arrow] edponce commented on a change in pull request #11023: ARROW-12712: [C++] String repeat kernel

2021-10-18 Thread GitBox
edponce commented on a change in pull request #11023: URL: https://github.com/apache/arrow/pull/11023#discussion_r731394018 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -417,6 +419,231 @@ struct StringTransformExecWithState } }; +struct StringBinaryT

[GitHub] [arrow] edponce commented on a change in pull request #11023: ARROW-12712: [C++] String repeat kernel

2021-10-18 Thread GitBox
edponce commented on a change in pull request #11023: URL: https://github.com/apache/arrow/pull/11023#discussion_r731394018 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -417,6 +419,231 @@ struct StringTransformExecWithState } }; +struct StringBinaryT

[GitHub] [arrow] edponce commented on a change in pull request #11023: ARROW-12712: [C++] String repeat kernel

2021-10-18 Thread GitBox
edponce commented on a change in pull request #11023: URL: https://github.com/apache/arrow/pull/11023#discussion_r731389134 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -417,6 +419,231 @@ struct StringTransformExecWithState } }; +struct StringBinaryT

[GitHub] [arrow] ursabot edited a comment on pull request #11450: ARROW-14348: [R] add group_vars.RecordBatchReader method

2021-10-18 Thread GitBox
ursabot edited a comment on pull request #11450: URL: https://github.com/apache/arrow/pull/11450#issuecomment-945982590 Benchmark runs are scheduled for baseline = 41529c76fe80d1fe8e60b52c0da3669c901a45bb and contender = 776d81c2c992acbe9e72bf26a908dd4a137d8ad1. 776d81c2c992acbe9e72bf26a9

[GitHub] [arrow] edponce commented on a change in pull request #11023: ARROW-12712: [C++] String repeat kernel

2021-10-18 Thread GitBox
edponce commented on a change in pull request #11023: URL: https://github.com/apache/arrow/pull/11023#discussion_r731389134 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -417,6 +419,231 @@ struct StringTransformExecWithState } }; +struct StringBinaryT

[GitHub] [arrow] edponce commented on a change in pull request #11023: ARROW-12712: [C++] String repeat kernel

2021-10-18 Thread GitBox
edponce commented on a change in pull request #11023: URL: https://github.com/apache/arrow/pull/11023#discussion_r731389134 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -417,6 +419,231 @@ struct StringTransformExecWithState } }; +struct StringBinaryT

[GitHub] [arrow-datafusion] Hoeze commented on issue #847: Implement parquet page-level skipping with column index, using min/max stats

2021-10-18 Thread GitBox
Hoeze commented on issue #847: URL: https://github.com/apache/arrow-datafusion/issues/847#issuecomment-946251755 Also, pyspark 3.2 supports Parquet column index: https://issues.apache.org/jira/browse/SPARK-26345 -- This is an automated message from the Apache Git Service. To respond t

[GitHub] [arrow] edponce commented on a change in pull request #11023: ARROW-12712: [C++] String repeat kernel

2021-10-18 Thread GitBox
edponce commented on a change in pull request #11023: URL: https://github.com/apache/arrow/pull/11023#discussion_r731389134 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -417,6 +419,231 @@ struct StringTransformExecWithState } }; +struct StringBinaryT

[GitHub] [arrow] edponce commented on a change in pull request #11023: ARROW-12712: [C++] String repeat kernel

2021-10-18 Thread GitBox
edponce commented on a change in pull request #11023: URL: https://github.com/apache/arrow/pull/11023#discussion_r731389134 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -417,6 +419,231 @@ struct StringTransformExecWithState } }; +struct StringBinaryT

[GitHub] [arrow] edponce edited a comment on pull request #11447: ARROW-11238: [Python] Make SubTreeFileSystem print method more informative

2021-10-18 Thread GitBox
edponce edited a comment on pull request #11447: URL: https://github.com/apache/arrow/pull/11447#issuecomment-946226846 You can resolve the lint errors using Archery. I will send you some useful commands. https://arrow.apache.org/docs/developers/archery.html -- This is an automated me

[GitHub] [arrow] edponce commented on a change in pull request #11447: ARROW-11238: [Python] Make SubTreeFileSystem print method more informative

2021-10-18 Thread GitBox
edponce commented on a change in pull request #11447: URL: https://github.com/apache/arrow/pull/11447#discussion_r731375126 ## File path: python/pyarrow/_fs.pyx ## @@ -833,6 +833,12 @@ cdef class SubTreeFileSystem(FileSystem): FileSystem.init(self, wrapped) se

[GitHub] [arrow] edponce edited a comment on pull request #11447: ARROW-11238: [Python] Make SubTreeFileSystem print method more informative

2021-10-18 Thread GitBox
edponce edited a comment on pull request #11447: URL: https://github.com/apache/arrow/pull/11447#issuecomment-946226846 You can resolve the lint errors using Archery. I will send you some useful commands. -- This is an automated message from the Apache Git Service. To respond to the mess

[GitHub] [arrow] edponce commented on pull request #11447: ARROW-11238: [Python] Make SubTreeFileSystem print method more informative

2021-10-18 Thread GitBox
edponce commented on pull request #11447: URL: https://github.com/apache/arrow/pull/11447#issuecomment-946226846 You can resolve the linter errors using Archery. I will send you some useful commands. -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] JayjeetAtGithub commented on a change in pull request #10913: ARROW-13607: [C++] Add Skyhook to Arrow

2021-10-18 Thread GitBox
JayjeetAtGithub commented on a change in pull request #10913: URL: https://github.com/apache/arrow/pull/10913#discussion_r731368396 ## File path: cpp/src/skyhook/skyhook.pc.in ## @@ -0,0 +1,25 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contribut

[GitHub] [arrow] ursabot edited a comment on pull request #11450: ARROW-14348: [R] add group_vars.RecordBatchReader method

2021-10-18 Thread GitBox
ursabot edited a comment on pull request #11450: URL: https://github.com/apache/arrow/pull/11450#issuecomment-945982590 Benchmark runs are scheduled for baseline = 41529c76fe80d1fe8e60b52c0da3669c901a45bb and contender = 776d81c2c992acbe9e72bf26a908dd4a137d8ad1. 776d81c2c992acbe9e72bf26a9

[GitHub] [arrow] ursabot edited a comment on pull request #11390: ARROW-8453: [Go][Integration] Support and enable recursive nested type integration tests

2021-10-18 Thread GitBox
ursabot edited a comment on pull request #11390: URL: https://github.com/apache/arrow/pull/11390#issuecomment-945862314 Benchmark runs are scheduled for baseline = 8a540a1edb755e2c465202315058494ed3e72b39 and contender = 41529c76fe80d1fe8e60b52c0da3669c901a45bb. 41529c76fe80d1fe8e60b52c0d

[GitHub] [arrow] westonpace commented on pull request #11294: ARROW-14192: [C++][Dataset] Backpressure broken on ordered scans

2021-10-18 Thread GitBox
westonpace commented on pull request #11294: URL: https://github.com/apache/arrow/pull/11294#issuecomment-946193222 Rebased and addressed PR comments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [arrow] ursabot edited a comment on pull request #11414: MINOR: [R] cleanup some notes in our checks

2021-10-18 Thread GitBox
ursabot edited a comment on pull request #11414: URL: https://github.com/apache/arrow/pull/11414#issuecomment-946186348 Benchmark runs are scheduled for baseline = c6fdeaf9fb85622242963dc28660e9592088986c and contender = f2f663be0a87e13c9cd5403dea51379deb4cf04d. f2f663be0a87e13c9cd5403dea

[GitHub] [arrow] westonpace commented on a change in pull request #11294: ARROW-14192: [C++][Dataset] Backpressure broken on ordered scans

2021-10-18 Thread GitBox
westonpace commented on a change in pull request #11294: URL: https://github.com/apache/arrow/pull/11294#discussion_r731338879 ## File path: python/pyarrow/tests/test_dataset.py ## @@ -422,6 +422,42 @@ def test_scanner(dataset, dataset_reader): assert table.num_rows == sca

  1   2   3   4   >