[GitHub] [arrow] ursabot edited a comment on pull request #9272: [WIP] Benchmark placebo

2021-03-12 Thread GitBox
ursabot edited a comment on pull request #9272: URL: https://github.com/apache/arrow/pull/9272#issuecomment-797872526 Benchmark runs are scheduled for baseline = 356c300c5ee1e2b23a83652514af11e3a731d596 and contender = 0f7cd4b8cb71cd5a7135404b2abc6e77de3aea7f. Results will be available as

[GitHub] [arrow] ursabot edited a comment on pull request #9272: [WIP] Benchmark placebo

2021-03-12 Thread GitBox
ursabot edited a comment on pull request #9272: URL: https://github.com/apache/arrow/pull/9272#issuecomment-797872526 Benchmark runs are scheduled for baseline = 356c300c5ee1e2b23a83652514af11e3a731d596 and contender = 0f7cd4b8cb71cd5a7135404b2abc6e77de3aea7f. Results will be available as

[GitHub] [arrow] ursabot commented on pull request #9272: [WIP] Benchmark placebo

2021-03-12 Thread GitBox
ursabot commented on pull request #9272: URL: https://github.com/apache/arrow/pull/9272#issuecomment-797872526 Benchmark runs are scheduled for baseline = 356c300c5ee1e2b23a83652514af11e3a731d596 and contender = 0f7cd4b8cb71cd5a7135404b2abc6e77de3aea7f. Results will be available as each b

[GitHub] [arrow] ElenaHenderson commented on pull request #9272: [WIP] Benchmark placebo

2021-03-12 Thread GitBox
ElenaHenderson commented on pull request #9272: URL: https://github.com/apache/arrow/pull/9272#issuecomment-797872509 @ursabot please benchmark This is an automated message from the Apache Git Service. To respond to the messa

[GitHub] [arrow] nealrichardson commented on pull request #9579: ARROW-11774: [R] macos one line install

2021-03-12 Thread GitBox
nealrichardson commented on pull request #9579: URL: https://github.com/apache/arrow/pull/9579#issuecomment-797843873 I don't know why this PR now is showing commits from master on it; it seems related to whatever outage GitHub had today. Hopefully when master moves forward next and we can

[GitHub] [arrow] github-actions[bot] commented on pull request #9689: [WIP] Restore simpler ARROW_R_WITH_ARROW wrapping

2021-03-12 Thread GitBox
github-actions[bot] commented on pull request #9689: URL: https://github.com/apache/arrow/pull/9689#issuecomment-797839819 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then could

[GitHub] [arrow] nealrichardson opened a new pull request #9689: [WIP] Restore simpler ARROW_R_WITH_ARROW wrapping

2021-03-12 Thread GitBox
nealrichardson opened a new pull request #9689: URL: https://github.com/apache/arrow/pull/9689 Without the ARROW_R_WITH_ARROW wrapping around the code in arrowExports.cpp, if Arrow C++ is not available, the build fails--as expected. However, it does so with on the order of 10,000 lines of

[GitHub] [arrow] nealrichardson commented on a change in pull request #9674: ARROW-11925 [R]: Add `between` method for arrow_dplyr_query

2021-03-12 Thread GitBox
nealrichardson commented on a change in pull request #9674: URL: https://github.com/apache/arrow/pull/9674#discussion_r593532623 ## File path: r/tests/testthat/test-dplyr-filter.R ## @@ -155,6 +155,59 @@ test_that("filter() with %in%", { ) }) +test_that("filter() with bet

[GitHub] [arrow] westonpace commented on issue #9636: Is there an API to deserialize ListVector into double[] efficiently ?

2021-03-12 Thread GitBox
westonpace commented on issue #9636: URL: https://github.com/apache/arrow/issues/9636#issuecomment-797809421 cc @emkornfield @liyafan82 This is an automated message from the Apache Git Service. To respond to the message, plea

[GitHub] [arrow] nealrichardson commented on a change in pull request #9688: ARROW-11945: [R] filter doesn't accept negative numbers as valid

2021-03-12 Thread GitBox
nealrichardson commented on a change in pull request #9688: URL: https://github.com/apache/arrow/pull/9688#discussion_r593480391 ## File path: r/src/scalar.cpp ## @@ -68,6 +68,12 @@ SEXP Scalar__as_vector(const std::shared_ptr& scalar) { return Array__as_vector(array); }

[GitHub] [arrow] github-actions[bot] commented on pull request #9688: ARROW-11945: [R] filter doesn't accept negative numbers as valid

2021-03-12 Thread GitBox
github-actions[bot] commented on pull request #9688: URL: https://github.com/apache/arrow/pull/9688#issuecomment-797786663 https://issues.apache.org/jira/browse/ARROW-11945 This is an automated message from the Apache Git Ser

[GitHub] [arrow] nealrichardson opened a new pull request #9688: ARROW-11945: [R] filter doesn't accept negative numbers as valid

2021-03-12 Thread GitBox
nealrichardson opened a new pull request #9688: URL: https://github.com/apache/arrow/pull/9688 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[GitHub] [arrow] kou commented on pull request #9687: ARROW-11949: [Ruby] Accept raw Ruby objects as sort key and options

2021-03-12 Thread GitBox
kou commented on pull request #9687: URL: https://github.com/apache/arrow/pull/9687#issuecomment-797785079 This includes some unrelated YARD document warning fixes. Sorry. This is an automated message from the Apache Git Serv

[GitHub] [arrow] github-actions[bot] commented on pull request #9687: ARROW-11949: [Ruby] Accept raw Ruby objects as sort key and options

2021-03-12 Thread GitBox
github-actions[bot] commented on pull request #9687: URL: https://github.com/apache/arrow/pull/9687#issuecomment-797784771 https://issues.apache.org/jira/browse/ARROW-11949 This is an automated message from the Apache Git Ser

[GitHub] [arrow] kou opened a new pull request #9687: ARROW-11949: [Ruby] Accept raw Ruby objects as sort key and options

2021-03-12 Thread GitBox
kou opened a new pull request #9687: URL: https://github.com/apache/arrow/pull/9687 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] boshek commented on a change in pull request #9674: ARROW-11925 [R]: Add `between` method for arrow_dplyr_query

2021-03-12 Thread GitBox
boshek commented on a change in pull request #9674: URL: https://github.com/apache/arrow/pull/9674#discussion_r593477389 ## File path: r/tests/testthat/test-dplyr-filter.R ## @@ -155,6 +155,15 @@ test_that("filter() with %in%", { ) }) +test_that("filter() with between()",

[GitHub] [arrow] alamb commented on pull request #9682: ARROW-7364: [Rust] Add cast options to cast kernel [WIP]

2021-03-12 Thread GitBox
alamb commented on pull request #9682: URL: https://github.com/apache/arrow/pull/9682#issuecomment-797767814 👍 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] westonpace commented on issue #9679: [Python] How to append rows of a table to a memory mapped file?

2021-03-12 Thread GitBox
westonpace commented on issue #9679: URL: https://github.com/apache/arrow/issues/9679#issuecomment-797763542 Do you mind closing this now that you've accepted the SO answer? If you want to discuss resizing strategies for memory maps and maybe what you are planning at a higher level in hop

[GitHub] [arrow] lidavidm commented on pull request #9685: ARROW-10372: [Dataset][C++][Python][R] Support reading compressed CSV

2021-03-12 Thread GitBox
lidavidm commented on pull request #9685: URL: https://github.com/apache/arrow/pull/9685#issuecomment-797761274 Thanks for the feedback. I've fixed things and opened up the PR (the issue in #9680 apparently only affects my local build, not CI).

[GitHub] [arrow] lidavidm commented on pull request #9677: ARROW-11260: [C++][Dataset] Don't require dictionaries when specifying explicit partition schema

2021-03-12 Thread GitBox
lidavidm commented on pull request #9677: URL: https://github.com/apache/arrow/pull/9677#issuecomment-797757740 We could also split out read_partitioning and write_partitioning functions, perhaps, or add a similar flag, and accept the API break. ---

[GitHub] [arrow] kou commented on a change in pull request #9285: ARROW-10349: [Python] Build and publish aarch64 wheels

2021-03-12 Thread GitBox
kou commented on a change in pull request #9285: URL: https://github.com/apache/arrow/pull/9285#discussion_r593444798 ## File path: dev/tasks/python-wheels/travis.linux.arm64.yml ## @@ -0,0 +1,91 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contri

[GitHub] [arrow] github-actions[bot] commented on pull request #9686: ARROW-9749: [C++][GLib][Python][R][Ruby][Dataset] WIP: Introduce FragmentScanOptions, consolidate ScanContext/ScanOptions

2021-03-12 Thread GitBox
github-actions[bot] commented on pull request #9686: URL: https://github.com/apache/arrow/pull/9686#issuecomment-797740699 https://issues.apache.org/jira/browse/ARROW-9749 This is an automated message from the Apache Git Serv

[GitHub] [arrow] lidavidm opened a new pull request #9686: ARROW-9749: [C++][GLib][Python][R][Ruby][Dataset] WIP: Introduce FragmentScanOptions, consolidate ScanContext/ScanOptions

2021-03-12 Thread GitBox
lidavidm opened a new pull request #9686: URL: https://github.com/apache/arrow/pull/9686 - ScanContext/ScanOptions have been merged, since they were essentially always passed together. - For scan options that are specific to a scan (e.g. CSV conversion options), a new FragmentScanOption

[GitHub] [arrow] jorisvandenbossche commented on pull request #9677: ARROW-11260: [C++][Dataset] Don't require dictionaries when specifying explicit partition schema

2021-03-12 Thread GitBox
jorisvandenbossche commented on pull request #9677: URL: https://github.com/apache/arrow/pull/9677#issuecomment-797737946 > We could; we'd have to do that recursively, right? In case of a nested dictionary. (…though is that handled anyways?) I don't think we can parse nested types f

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #9285: ARROW-10349: [Python] Build and publish aarch64 wheels

2021-03-12 Thread GitBox
jorisvandenbossche commented on a change in pull request #9285: URL: https://github.com/apache/arrow/pull/9285#discussion_r593422984 ## File path: python/requirements-wheel-test.txt ## @@ -1,10 +1,11 @@ cffi cython hypothesis -numpy==1.19.4 -pandas<1.1.0; python_version < "3

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #9285: ARROW-10349: [Python] Build and publish aarch64 wheels

2021-03-12 Thread GitBox
jorisvandenbossche commented on a change in pull request #9285: URL: https://github.com/apache/arrow/pull/9285#discussion_r593421714 ## File path: python/requirements-wheel-test.txt ## @@ -1,10 +1,11 @@ cffi cython hypothesis -numpy==1.19.4 -pandas<1.1.0; python_version < "3

[GitHub] [arrow] lidavidm commented on pull request #9677: ARROW-11260: [C++][Dataset] Don't require dictionaries when specifying explicit partition schema

2021-03-12 Thread GitBox
lidavidm commented on pull request #9677: URL: https://github.com/apache/arrow/pull/9677#issuecomment-797723929 > What you currently pushed here breaks other tests? Yup, I realized that right after I pushed. > Maybe we could also check if the schema has any dictionary type?

[GitHub] [arrow] seddonm1 commented on pull request #9682: ARROW-7364: [Rust] Add cast options to cast kernel [WIP]

2021-03-12 Thread GitBox
seddonm1 commented on pull request #9682: URL: https://github.com/apache/arrow/pull/9682#issuecomment-797723955 Thanks both. I am away this weekend but will incorporate these changes and will notify early next week for final PR with more tests.

[GitHub] [arrow] jorisvandenbossche commented on pull request #9677: ARROW-11260: [C++][Dataset] Don't require dictionaries when specifying explicit partition schema

2021-03-12 Thread GitBox
jorisvandenbossche commented on pull request #9677: URL: https://github.com/apache/arrow/pull/9677#issuecomment-797722892 What you currently pushed here breaks other tests? Maybe we could also check if the schema has any dictionary type?

[GitHub] [arrow] seddonm1 commented on a change in pull request #9682: ARROW-7364: [Rust] Add cast options to cast kernel [WIP]

2021-03-12 Thread GitBox
seddonm1 commented on a change in pull request #9682: URL: https://github.com/apache/arrow/pull/9682#discussion_r593418530 ## File path: rust/arrow/src/compute/kernels/cast.rs ## @@ -939,133 +1244,184 @@ where from.as_any() .downcast_ref::>()

[GitHub] [arrow] seddonm1 commented on a change in pull request #9682: ARROW-7364: [Rust] Add cast options to cast kernel [WIP]

2021-03-12 Thread GitBox
seddonm1 commented on a change in pull request #9682: URL: https://github.com/apache/arrow/pull/9682#discussion_r593418012 ## File path: rust/arrow/src/compute/kernels/cast.rs ## @@ -849,7 +1137,11 @@ const EPOCH_DAYS_FROM_CE: i32 = 719_163; /// We do not perform this check on

[GitHub] [arrow] lidavidm commented on pull request #9677: ARROW-11260: [C++][Dataset] Don't require dictionaries when specifying explicit partition schema

2021-03-12 Thread GitBox
lidavidm commented on pull request #9677: URL: https://github.com/apache/arrow/pull/9677#issuecomment-797720183 I tried updating `ds.partitioning()` to give a PartitioningFactory when a schema (but not dictionaries) is given, but that breaks things (e.g. writers expecting `ds.partitioning(

[GitHub] [arrow] westonpace commented on a change in pull request #9643: ARROW-11883: [C++] Add ConcatMap, MergeMap, and an async-reentrant version of Map

2021-03-12 Thread GitBox
westonpace commented on a change in pull request #9643: URL: https://github.com/apache/arrow/pull/9643#discussion_r593392419 ## File path: cpp/src/arrow/util/async_generator.h ## @@ -177,6 +364,126 @@ class TransformingGenerator { std::shared_ptr state_; }; +/// \brief Tr

[GitHub] [arrow] westonpace commented on a change in pull request #9643: ARROW-11883: [C++] Add ConcatMap, MergeMap, and an async-reentrant version of Map

2021-03-12 Thread GitBox
westonpace commented on a change in pull request #9643: URL: https://github.com/apache/arrow/pull/9643#discussion_r593366303 ## File path: cpp/src/arrow/util/iterator.h ## @@ -503,12 +521,12 @@ class FlattenIterator { explicit FlattenIterator(Iterator> it) : parent_(std::mov

[GitHub] [arrow] rok commented on pull request #9683: ARROW-10403: [C++] Implement unique kernel for dictionary type

2021-03-12 Thread GitBox
rok commented on pull request #9683: URL: https://github.com/apache/arrow/pull/9683#issuecomment-797635910 > My instinct is that, rather than unifying first and then determining unique values/counting/hashing, what if we could do the aggregation on each chunk first and then unify the resul

[GitHub] [arrow] rok removed a comment on pull request #9683: ARROW-10403: [C++] Implement unique kernel for dictionary type

2021-03-12 Thread GitBox
rok removed a comment on pull request #9683: URL: https://github.com/apache/arrow/pull/9683#issuecomment-797629920 > I'm not familiar with this C++ code so I'll let others comment (cc @pitrou @bkietz @michalursa). It looks like the issue is only with ChunkedArrays where the chunks have dif

[GitHub] [arrow] rok commented on pull request #9683: ARROW-10403: [C++] Implement unique kernel for dictionary type

2021-03-12 Thread GitBox
rok commented on pull request #9683: URL: https://github.com/apache/arrow/pull/9683#issuecomment-797629920 > I'm not familiar with this C++ code so I'll let others comment (cc @pitrou @bkietz @michalursa). It looks like the issue is only with ChunkedArrays where the chunks have different d

[GitHub] [arrow] rok closed pull request #9683: ARROW-10403: [C++] Implement unique kernel for dictionary type

2021-03-12 Thread GitBox
rok closed pull request #9683: URL: https://github.com/apache/arrow/pull/9683 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [arrow] zeroshade edited a comment on pull request #9671: ARROW-7905: [Go][Parquet] Initial Chunk of Parquet port to Go

2021-03-12 Thread GitBox
zeroshade edited a comment on pull request #9671: URL: https://github.com/apache/arrow/pull/9671#issuecomment-797540716 @sbinet @emkornfield I've rebased this to include the changes to bump to go1.15 so now this is ready for reviews. After this gets merged, i'll push the next chunk of code

[GitHub] [arrow] bkietz commented on a change in pull request #9643: ARROW-11883: [C++] Add ConcatMap, MergeMap, and an async-reentrant version of Map

2021-03-12 Thread GitBox
bkietz commented on a change in pull request #9643: URL: https://github.com/apache/arrow/pull/9643#discussion_r593319536 ## File path: cpp/src/arrow/util/iterator.h ## @@ -503,12 +521,12 @@ class FlattenIterator { explicit FlattenIterator(Iterator> it) : parent_(std::move(it

[GitHub] [arrow] nealrichardson commented on a change in pull request #9685: ARROW-10372: [Dataset][C++][Python][R] Support reading compressed CSV

2021-03-12 Thread GitBox
nealrichardson commented on a change in pull request #9685: URL: https://github.com/apache/arrow/pull/9685#discussion_r593318541 ## File path: r/tests/testthat/test-dataset.R ## @@ -295,6 +295,29 @@ test_that("CSV dataset", { ) }) +test_that("compressed CSV dataset", { +

[GitHub] [arrow] nealrichardson commented on pull request #9683: ARROW-10403: [C++] Implement unique kernel for dictionary type

2021-03-12 Thread GitBox
nealrichardson commented on pull request #9683: URL: https://github.com/apache/arrow/pull/9683#issuecomment-797605376 I'm not familiar with this C++ code so I'll let others comment (cc @pitrou @bkietz @michalursa). It looks like the issue is only with ChunkedArrays where the chunks have di

[GitHub] [arrow] bkietz commented on a change in pull request #9685: ARROW-10372: [Dataset][C++][Python][R] Support reading compressed CSV

2021-03-12 Thread GitBox
bkietz commented on a change in pull request #9685: URL: https://github.com/apache/arrow/pull/9685#discussion_r593289272 ## File path: python/pyarrow/_dataset.pyx ## @@ -1383,8 +1385,22 @@ cdef class CsvFileFormat(FileFormat): def parse_options(self, ParseOptions parse_opt

[GitHub] [arrow] bkietz closed pull request #9670: ARROW-8658: [C++][Dataset] Implement subtree pruning for FileSystemDataset

2021-03-12 Thread GitBox
bkietz closed pull request #9670: URL: https://github.com/apache/arrow/pull/9670 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] bkietz commented on pull request #9670: ARROW-8658: [C++][Dataset] Implement subtree pruning for FileSystemDataset

2021-03-12 Thread GitBox
bkietz commented on pull request #9670: URL: https://github.com/apache/arrow/pull/9670#issuecomment-797579411 +1, merging This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[GitHub] [arrow] ursabot edited a comment on pull request #9670: ARROW-8658: [C++][Dataset] Implement subtree pruning for FileSystemDataset

2021-03-12 Thread GitBox
ursabot edited a comment on pull request #9670: URL: https://github.com/apache/arrow/pull/9670#issuecomment-797508371 Benchmark runs are scheduled for baseline = 2d140c3eeecca3ff7823edc8c9562ebd6a1c336a and contender = 2aee5b629fa74bf1568aeb78f420641bab9c93c4. Results will be available as

[GitHub] [arrow] ianmcook commented on a change in pull request #9681: ARROW-11880: [R] Handle empty or NULL transmute() args properly

2021-03-12 Thread GitBox
ianmcook commented on a change in pull request #9681: URL: https://github.com/apache/arrow/pull/9681#discussion_r592808241 ## File path: r/tests/testthat/test-dataset.R ## @@ -846,6 +846,38 @@ test_that("mutate() with NULL inputs", { ) }) +test_that("empty mutate()", { +

[GitHub] [arrow] ursabot edited a comment on pull request #9670: ARROW-8658: [C++][Dataset] Implement subtree pruning for FileSystemDataset

2021-03-12 Thread GitBox
ursabot edited a comment on pull request #9670: URL: https://github.com/apache/arrow/pull/9670#issuecomment-797508371 Benchmark runs are scheduled for baseline = 2d140c3eeecca3ff7823edc8c9562ebd6a1c336a and contender = 2aee5b629fa74bf1568aeb78f420641bab9c93c4. Results will be available as

[GitHub] [arrow] maartenbreddels commented on pull request #8468: ARROW-10306: [C++] Add string replacement kernel

2021-03-12 Thread GitBox
maartenbreddels commented on pull request #8468: URL: https://github.com/apache/arrow/pull/8468#issuecomment-797553590 @pitrou this is ready for review (assuming you agree with the above plan of doing a refactor later on) Th

[GitHub] [arrow] github-actions[bot] commented on pull request #9285: ARROW-10349: [Python] Build and publish aarch64 wheels

2021-03-12 Thread GitBox
github-actions[bot] commented on pull request #9285: URL: https://github.com/apache/arrow/pull/9285#issuecomment-797546572 Revision: 599f9fbfde615553c08eebb31e9cb2d4923192a1 Submitted crossbow builds: [ursacomputing/crossbow @ actions-223](https://github.com/ursacomputing/crossbow/br

[GitHub] [arrow] zeroshade commented on pull request #9671: ARROW-7905: [Go][Parquet] Initial Chunk of Parquet port to Go

2021-03-12 Thread GitBox
zeroshade commented on pull request #9671: URL: https://github.com/apache/arrow/pull/9671#issuecomment-797540716 @sbinet @emkornfield I've updated this to include the changes to bump to go1.15 so now this is ready for reviews. After this gets merged, i'll push the next chunk of code. Sorry

[GitHub] [arrow] github-actions[bot] commented on pull request #9685: ARROW-10372: [Dataset][C++][Python][R] Support reading compressed CSV

2021-03-12 Thread GitBox
github-actions[bot] commented on pull request #9685: URL: https://github.com/apache/arrow/pull/9685#issuecomment-797524480 https://issues.apache.org/jira/browse/ARROW-10372 This is an automated message from the Apache Git Ser

[GitHub] [arrow] ursabot edited a comment on pull request #9670: ARROW-8658: [C++][Dataset] Implement subtree pruning for FileSystemDataset

2021-03-12 Thread GitBox
ursabot edited a comment on pull request #9670: URL: https://github.com/apache/arrow/pull/9670#issuecomment-797508371 Benchmark runs are scheduled for baseline = 2d140c3eeecca3ff7823edc8c9562ebd6a1c336a and contender = 2aee5b629fa74bf1568aeb78f420641bab9c93c4. Results will be available as

[GitHub] [arrow] kszucs commented on pull request #9285: ARROW-10349: [Python] Build and publish aarch64 wheels

2021-03-12 Thread GitBox
kszucs commented on pull request #9285: URL: https://github.com/apache/arrow/pull/9285#issuecomment-797515456 @github-actions crossbow submit wheel-manylinux2014-*-arm64 This is an automated message from the Apache Git Servic

[GitHub] [arrow] ursabot edited a comment on pull request #9670: ARROW-8658: [C++][Dataset] Implement subtree pruning for FileSystemDataset

2021-03-12 Thread GitBox
ursabot edited a comment on pull request #9670: URL: https://github.com/apache/arrow/pull/9670#issuecomment-797508371 Benchmark runs are scheduled for baseline = 2d140c3eeecca3ff7823edc8c9562ebd6a1c336a and contender = 2aee5b629fa74bf1568aeb78f420641bab9c93c4. Results will be available as

[GitHub] [arrow] ursabot commented on pull request #9670: ARROW-8658: [C++][Dataset] Implement subtree pruning for FileSystemDataset

2021-03-12 Thread GitBox
ursabot commented on pull request #9670: URL: https://github.com/apache/arrow/pull/9670#issuecomment-797508371 Benchmark runs are scheduled for baseline = 2d140c3eeecca3ff7823edc8c9562ebd6a1c336a and contender = 2aee5b629fa74bf1568aeb78f420641bab9c93c4. Results will be available as each b

[GitHub] [arrow] lidavidm commented on pull request #9670: ARROW-8658: [C++][Dataset] Implement subtree pruning for FileSystemDataset

2021-03-12 Thread GitBox
lidavidm commented on pull request #9670: URL: https://github.com/apache/arrow/pull/9670#issuecomment-797508195 MacOS tests are fixed now that the sorting on subtrees is fully defined. This is an automated message from the Ap

[GitHub] [arrow] lidavidm commented on pull request #9685: ARROW-10372: [Dataset][C++][Python][R] Support reading compressed CSV

2021-03-12 Thread GitBox
lidavidm commented on pull request #9685: URL: https://github.com/apache/arrow/pull/9685#issuecomment-797506844 Leaving as draft for now since I observed the Python tests hang without ARROW-11937/https://github.com/apache/arrow/pull/9680 fixed.

[GitHub] [arrow] lidavidm opened a new pull request #9685: ARROW-10372: [Dataset][C++][Python][R] Support reading compressed CSV

2021-03-12 Thread GitBox
lidavidm opened a new pull request #9685: URL: https://github.com/apache/arrow/pull/9685 This adds support for reading compressed CSV datasets in C++/Python/R via an option on CsvFileFormat. It does not autodetect the type of compression (but perhaps this could be added, by inspecting File

[GitHub] [arrow] maartenbreddels commented on a change in pull request #8990: ARROW-10959: [C++] Add scalar string join kernel

2021-03-12 Thread GitBox
maartenbreddels commented on a change in pull request #8990: URL: https://github.com/apache/arrow/pull/8990#discussion_r593172558 ## File path: cpp/src/arrow/compute/kernels/scalar_string_test.cc ## @@ -428,6 +433,26 @@ TYPED_TEST(TestStringKernels, StrptimeDoesNotProvideDefau

[GitHub] [arrow] kszucs commented on a change in pull request #9285: ARROW-10349: [Python] Build and publish aarch64 wheels

2021-03-12 Thread GitBox
kszucs commented on a change in pull request #9285: URL: https://github.com/apache/arrow/pull/9285#discussion_r593170288 ## File path: .env ## @@ -33,9 +33,13 @@ COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 BUILDKIT_INLINE_CACHE=1 +# different architecture notations +ARCH=a

[GitHub] [arrow] pitrou commented on a change in pull request #9285: ARROW-10349: [Python] Build and publish aarch64 wheels

2021-03-12 Thread GitBox
pitrou commented on a change in pull request #9285: URL: https://github.com/apache/arrow/pull/9285#discussion_r593148810 ## File path: python/requirements-wheel-test.txt ## @@ -1,10 +1,11 @@ cffi cython hypothesis -numpy==1.19.4 -pandas<1.1.0; python_version < "3.8" -pandas;

[GitHub] [arrow] pitrou commented on a change in pull request #9285: ARROW-10349: [Python] Build and publish aarch64 wheels

2021-03-12 Thread GitBox
pitrou commented on a change in pull request #9285: URL: https://github.com/apache/arrow/pull/9285#discussion_r593147811 ## File path: .env ## @@ -33,9 +33,13 @@ COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 BUILDKIT_INLINE_CACHE=1 +# different architecture notations +ARCH=a

[GitHub] [arrow] pitrou commented on a change in pull request #9285: ARROW-10349: [Python] Build and publish aarch64 wheels

2021-03-12 Thread GitBox
pitrou commented on a change in pull request #9285: URL: https://github.com/apache/arrow/pull/9285#discussion_r593147612 ## File path: .env ## @@ -33,9 +33,13 @@ COMPOSE_DOCKER_CLI_BUILD=1 DOCKER_BUILDKIT=1 BUILDKIT_INLINE_CACHE=1 +# different architecture notations +ARCH=a

[GitHub] [arrow] kszucs commented on a change in pull request #9285: ARROW-10349: [Python] Build and publish aarch64 wheels

2021-03-12 Thread GitBox
kszucs commented on a change in pull request #9285: URL: https://github.com/apache/arrow/pull/9285#discussion_r593145497 ## File path: ci/docker/python-wheel-manylinux-201x.dockerfile ## @@ -62,12 +65,14 @@ ARG build_type=release ENV CMAKE_BUILD_TYPE=${build_type} \ VCPKG

[GitHub] [arrow] kszucs commented on a change in pull request #9285: ARROW-10349: [Python] Build and publish aarch64 wheels

2021-03-12 Thread GitBox
kszucs commented on a change in pull request #9285: URL: https://github.com/apache/arrow/pull/9285#discussion_r593145300 ## File path: dev/tasks/linux-packages/github.linux.amd64.yml ## @@ -73,7 +73,6 @@ jobs: sudo apt update sudo apt install -y \

[GitHub] [arrow] maartenbreddels commented on a change in pull request #8468: ARROW-10306: [C++] Add string replacement kernel

2021-03-12 Thread GitBox
maartenbreddels commented on a change in pull request #8468: URL: https://github.com/apache/arrow/pull/8468#discussion_r593128772 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -1194,6 +1198,197 @@ void AddSplit(FunctionRegistry* registry) { #endif } +//

[GitHub] [arrow] maartenbreddels commented on pull request #9000: ARROW-10557: [C++] Add scalar string slicing/substring kernel

2021-03-12 Thread GitBox
maartenbreddels commented on pull request #9000: URL: https://github.com/apache/arrow/pull/9000#issuecomment-797451974 @pitrou this is ready for review, the failure seems unrelated (minio) Sorry for taking so long to get back at this, I hope we can get this, and my other open string-

[GitHub] [arrow] westonpace commented on a change in pull request #9643: ARROW-11883: [C++] Add ConcatMap, MergeMap, and an async-reentrant version of Map

2021-03-12 Thread GitBox
westonpace commented on a change in pull request #9643: URL: https://github.com/apache/arrow/pull/9643#discussion_r593123657 ## File path: cpp/src/arrow/util/async_generator.h ## @@ -177,6 +364,126 @@ class TransformingGenerator { std::shared_ptr state_; }; +/// \brief Tr

[GitHub] [arrow] github-actions[bot] commented on pull request #9285: ARROW-10349: [Python] Build and publish aarch64 wheels

2021-03-12 Thread GitBox
github-actions[bot] commented on pull request #9285: URL: https://github.com/apache/arrow/pull/9285#issuecomment-797448591 Revision: 4e56f9d977a75a1f4d6d30f5a9defce990d37743 Submitted crossbow builds: [ursacomputing/crossbow @ actions-222](https://github.com/ursacomputing/crossbow/br

[GitHub] [arrow] westonpace commented on a change in pull request #9643: ARROW-11883: [C++] Add ConcatMap, MergeMap, and an async-reentrant version of Map

2021-03-12 Thread GitBox
westonpace commented on a change in pull request #9643: URL: https://github.com/apache/arrow/pull/9643#discussion_r593116560 ## File path: cpp/src/arrow/util/async_generator.h ## @@ -235,29 +541,209 @@ class ReadaheadGenerator { /// The source generator must be async-reentrant

[GitHub] [arrow] kszucs commented on a change in pull request #9285: ARROW-10349: [Python] Build and publish aarch64 wheels

2021-03-12 Thread GitBox
kszucs commented on a change in pull request #9285: URL: https://github.com/apache/arrow/pull/9285#discussion_r593112806 ## File path: python/requirements-wheel-test.txt ## @@ -1,10 +1,11 @@ cffi cython hypothesis -numpy==1.19.4 -pandas<1.1.0; python_version < "3.8" -pandas;

[GitHub] [arrow] kszucs commented on a change in pull request #9285: ARROW-10349: [Python] Build and publish aarch64 wheels

2021-03-12 Thread GitBox
kszucs commented on a change in pull request #9285: URL: https://github.com/apache/arrow/pull/9285#discussion_r593112188 ## File path: dev/tasks/python-wheels/github.linux.amd64.yml ## @@ -95,7 +87,18 @@ jobs: env: CROSSBOW_GITHUB_TOKEN: {{ '${{ secrets.CROS

[GitHub] [arrow] kszucs commented on pull request #9285: ARROW-10349: [Python] Build and publish aarch64 wheels

2021-03-12 Thread GitBox
kszucs commented on pull request #9285: URL: https://github.com/apache/arrow/pull/9285#issuecomment-797438464 @github-actions crossbow submit wheel-manylinux* This is an automated message from the Apache Git Service. To respo

[GitHub] [arrow] rok commented on pull request #9683: ARROW-10403: [C++] Implement unique kernel for dictionary type

2021-03-12 Thread GitBox
rok commented on pull request #9683: URL: https://github.com/apache/arrow/pull/9683#issuecomment-797434923 @nealrichardson what do you think about this approach? It introduces overhead to because it transposes dictionary indices but it gives us `value_counts`. ---

[GitHub] [arrow] westonpace commented on pull request #9678: ARROW-11907: [C++] Use our own executor in S3FileSystem

2021-03-12 Thread GitBox
westonpace commented on pull request #9678: URL: https://github.com/apache/arrow/pull/9678#issuecomment-797434279 Sure, it might take me a day or two. I want to learn the S3 stuff so I'll try running some of this. I'll try and get to it by the end of Monday.

[GitHub] [arrow] alamb commented on a change in pull request #9600: ARROW-11822: [Rust][Datafusion] Support case sensitive for function

2021-03-12 Thread GitBox
alamb commented on a change in pull request #9600: URL: https://github.com/apache/arrow/pull/9600#discussion_r593100188 ## File path: rust/datafusion/src/execution/context.rs ## @@ -495,13 +495,30 @@ impl QueryPlanner for DefaultQueryPlanner { } } +/// The style of case

[GitHub] [arrow] alamb commented on a change in pull request #9682: ARROW-7364: [Rust] Add cast options to cast kernel [WIP]

2021-03-12 Thread GitBox
alamb commented on a change in pull request #9682: URL: https://github.com/apache/arrow/pull/9682#discussion_r593082721 ## File path: rust/arrow/src/compute/kernels/cast.rs ## @@ -262,98 +275,126 @@ pub fn cast(array: &ArrayRef, to_type: &DataType) -> Result { return

[GitHub] [arrow] westonpace commented on a change in pull request #9643: ARROW-11883: [C++] Add ConcatMap, MergeMap, and an async-reentrant version of Map

2021-03-12 Thread GitBox
westonpace commented on a change in pull request #9643: URL: https://github.com/apache/arrow/pull/9643#discussion_r593084175 ## File path: cpp/src/arrow/util/async_generator.h ## @@ -65,6 +100,14 @@ Future<> VisitAsyncGenerator(AsyncGenerator generator, return Loop(LoopBody{

[GitHub] [arrow] westonpace commented on a change in pull request #9643: ARROW-11883: [C++] Add ConcatMap, MergeMap, and an async-reentrant version of Map

2021-03-12 Thread GitBox
westonpace commented on a change in pull request #9643: URL: https://github.com/apache/arrow/pull/9643#discussion_r593082909 ## File path: cpp/src/arrow/util/iterator_test.cc ## @@ -570,8 +616,48 @@ TEST(ReadaheadIterator, NextError) { // --

[GitHub] [arrow] westonpace commented on a change in pull request #9643: ARROW-11883: [C++] Add ConcatMap, MergeMap, and an async-reentrant version of Map

2021-03-12 Thread GitBox
westonpace commented on a change in pull request #9643: URL: https://github.com/apache/arrow/pull/9643#discussion_r593082143 ## File path: cpp/src/arrow/util/iterator_test.cc ## @@ -589,15 +675,255 @@ TEST(TestAsyncUtil, Collect) { ASSERT_EQ(expected, collected_val); } +T

[GitHub] [arrow] westonpace commented on a change in pull request #9643: ARROW-11883: [C++] Add ConcatMap, MergeMap, and an async-reentrant version of Map

2021-03-12 Thread GitBox
westonpace commented on a change in pull request #9643: URL: https://github.com/apache/arrow/pull/9643#discussion_r593081844 ## File path: cpp/src/arrow/util/vector.h ## @@ -81,5 +84,53 @@ std::vector FilterVector(std::vector values, Predicate&& predicate) { return values;

[GitHub] [arrow] westonpace commented on a change in pull request #9643: ARROW-11883: [C++] Add ConcatMap, MergeMap, and an async-reentrant version of Map

2021-03-12 Thread GitBox
westonpace commented on a change in pull request #9643: URL: https://github.com/apache/arrow/pull/9643#discussion_r593081752 ## File path: cpp/src/arrow/util/vector.h ## @@ -81,5 +84,53 @@ std::vector FilterVector(std::vector values, Predicate&& predicate) { return values;

[GitHub] [arrow] westonpace commented on a change in pull request #9643: ARROW-11883: [C++] Add ConcatMap, MergeMap, and an async-reentrant version of Map

2021-03-12 Thread GitBox
westonpace commented on a change in pull request #9643: URL: https://github.com/apache/arrow/pull/9643#discussion_r593081442 ## File path: cpp/src/arrow/util/type_fwd.h ## @@ -17,14 +17,14 @@ #pragma once +#include Review comment: Done. ## File path: cpp/

[GitHub] [arrow] westonpace commented on a change in pull request #9643: ARROW-11883: [C++] Add ConcatMap, MergeMap, and an async-reentrant version of Map

2021-03-12 Thread GitBox
westonpace commented on a change in pull request #9643: URL: https://github.com/apache/arrow/pull/9643#discussion_r593081103 ## File path: cpp/src/arrow/csv/reader.cc ## @@ -805,7 +798,7 @@ class SerialTableReader : public BaseTableReader {

[GitHub] [arrow] westonpace commented on pull request #9684: ARROW-11942: [C++] If tasks are submitted quickly the thread pool may fail to spin up new threads

2021-03-12 Thread GitBox
westonpace commented on pull request #9684: URL: https://github.com/apache/arrow/pull/9684#issuecomment-797397577 cc @pitrou This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [arrow] github-actions[bot] commented on pull request #9285: ARROW-10349: [Python] Build and publish aarch64 wheels

2021-03-12 Thread GitBox
github-actions[bot] commented on pull request #9285: URL: https://github.com/apache/arrow/pull/9285#issuecomment-797377305 Revision: fca5a0f2b6e4c4b116fea44369ba71afe866acf2 Submitted crossbow builds: [ursacomputing/crossbow @ actions-221](https://github.com/ursacomputing/crossbow/br

[GitHub] [arrow] westonpace commented on pull request #9533: ARROW-11590: [C++] Move CSV background generator to IO thread pool

2021-03-12 Thread GitBox
westonpace commented on pull request #9533: URL: https://github.com/apache/arrow/pull/9533#issuecomment-797349344 Actually, I'm going to do the split in a ARROW-11883 since that PR adds a bunch of tests and I can avoid a rebase headache this way. I think this particular PR is ready for re

[GitHub] [arrow] kou commented on pull request #9285: ARROW-10349: [Python] Build and publish aarch64 wheels

2021-03-12 Thread GitBox
kou commented on pull request #9285: URL: https://github.com/apache/arrow/pull/9285#issuecomment-797347475 @github-actions crossbow submit -g linux This is an automated message from the Apache Git Service. To respond to the m

[GitHub] [arrow] westonpace commented on a change in pull request #9533: ARROW-11590: [C++] Move CSV background generator to IO thread pool

2021-03-12 Thread GitBox
westonpace commented on a change in pull request #9533: URL: https://github.com/apache/arrow/pull/9533#discussion_r593013479 ## File path: cpp/src/arrow/util/async_generator.h ## @@ -177,6 +179,94 @@ class TransformingGenerator { std::shared_ptr state_; }; +template +cla

[GitHub] [arrow] github-actions[bot] commented on pull request #9285: ARROW-10349: [Python] Build and publish aarch64 wheels

2021-03-12 Thread GitBox
github-actions[bot] commented on pull request #9285: URL: https://github.com/apache/arrow/pull/9285#issuecomment-797317650 Revision: 68833b79ee9911ebfc5fd3e986b34c2d7d1f1f04 Submitted crossbow builds: [ursacomputing/crossbow @ actions-220](https://github.com/ursacomputing/crossbow/br

[GitHub] [arrow] kou commented on pull request #9285: ARROW-10349: [Python] Build and publish aarch64 wheels

2021-03-12 Thread GitBox
kou commented on pull request #9285: URL: https://github.com/apache/arrow/pull/9285#issuecomment-797317090 @github-actions crossbow submit centos-7-amd64 This is an automated message from the Apache Git Service. To respond to