[GitHub] [arrow] sagnikc-dremio commented on pull request #9450: ARROW-11565: [C++][Gandiva] Modify upper()/lower() logic to make them work for utf8 strings

2021-02-23 Thread GitBox
sagnikc-dremio commented on pull request #9450: URL: https://github.com/apache/arrow/pull/9450#issuecomment-784875144 @maartenbreddels @kou Hi Maarten/Sutou, I am facing some difficulties with respect to adding the utf8proc dependency for the case conversion of utf8 strings. Can you

[GitHub] [arrow] ursabot edited a comment on pull request #9272: [WIP] Benchmark placebo

2021-02-23 Thread GitBox
ursabot edited a comment on pull request #9272: URL: https://github.com/apache/arrow/pull/9272#issuecomment-784764126 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [arrow] liyafan82 commented on pull request #8949: ARROW-10880: [Java] Support compressing RecordBatch IPC buffers by LZ4

2021-02-23 Thread GitBox
liyafan82 commented on pull request #8949: URL: https://github.com/apache/arrow/pull/8949#issuecomment-784850125 To avoid the direct dependency on the lz4 library, I have extracted the concrete compression codec implementations to a separate module. Will continue to work on the integration

[GitHub] [arrow] ursabot edited a comment on pull request #9272: [WIP] Benchmark placebo

2021-02-23 Thread GitBox
ursabot edited a comment on pull request #9272: URL: https://github.com/apache/arrow/pull/9272#issuecomment-784764126 Benchmark runs are scheduled for baseline = 356c300c5ee1e2b23a83652514af11e3a731d596 and contender = 0f7cd4b8cb71cd5a7135404b2abc6e77de3aea7f. Results will be available as

[GitHub] [arrow] Dandandan commented on a change in pull request #9548: ARROW-11733: [Rust][DataFusion] Implement hash partitioning

2021-02-23 Thread GitBox
Dandandan commented on a change in pull request #9548: URL: https://github.com/apache/arrow/pull/9548#discussion_r581664638 ## File path: rust/datafusion/src/physical_plan/repartition.rs ## @@ -305,6 +347,33 @@ mod tests { Ok(()) } +#[tokio::test(flavor = "m

[GitHub] [arrow] Dandandan commented on a change in pull request #9548: ARROW-11733: [Rust][DataFusion] Implement hash partitioning

2021-02-23 Thread GitBox
Dandandan commented on a change in pull request #9548: URL: https://github.com/apache/arrow/pull/9548#discussion_r581663717 ## File path: rust/datafusion/src/physical_plan/repartition.rs ## @@ -120,23 +120,71 @@ impl ExecutionPlan for RepartitionExec { let (sen

[GitHub] [arrow] Dandandan commented on a change in pull request #9548: ARROW-11733: [Rust][DataFusion] Implement hash partitioning

2021-02-23 Thread GitBox
Dandandan commented on a change in pull request #9548: URL: https://github.com/apache/arrow/pull/9548#discussion_r581663717 ## File path: rust/datafusion/src/physical_plan/repartition.rs ## @@ -120,23 +120,71 @@ impl ExecutionPlan for RepartitionExec { let (sen

[GitHub] [arrow] sagnikc-dremio commented on a change in pull request #9450: ARROW-11565: [C++][Gandiva] Modify upper()/lower() logic to make them work for utf8 strings

2021-02-23 Thread GitBox
sagnikc-dremio commented on a change in pull request #9450: URL: https://github.com/apache/arrow/pull/9450#discussion_r581661888 ## File path: cpp/src/gandiva/precompiled/string_ops.cc ## @@ -249,25 +297,73 @@ const char* lower_utf8(gdv_int64 context, const char* data, gdv_int

[GitHub] [arrow] ursabot edited a comment on pull request #9272: [WIP] Benchmark placebo

2021-02-23 Thread GitBox
ursabot edited a comment on pull request #9272: URL: https://github.com/apache/arrow/pull/9272#issuecomment-784764126 Benchmark runs are scheduled for baseline = 356c300c5ee1e2b23a83652514af11e3a731d596 and contender = 0f7cd4b8cb71cd5a7135404b2abc6e77de3aea7f. Results will be available as

[GitHub] [arrow] pprudhvi commented on a change in pull request #9450: ARROW-11565: [C++][Gandiva] Modify upper()/lower() logic to make them work for utf8 strings

2021-02-23 Thread GitBox
pprudhvi commented on a change in pull request #9450: URL: https://github.com/apache/arrow/pull/9450#discussion_r581658737 ## File path: cpp/src/gandiva/precompiled/string_ops.cc ## @@ -249,25 +297,73 @@ const char* lower_utf8(gdv_int64 context, const char* data, gdv_int32 dat

[GitHub] [arrow] github-actions[bot] commented on pull request #9559: ARROW-11757: [C++][Gandiva] castTIMESTAMP(utf8) function should return rounded milliseconds if input has higher precision than mil

2021-02-23 Thread GitBox
github-actions[bot] commented on pull request #9559: URL: https://github.com/apache/arrow/pull/9559#issuecomment-784806700 https://issues.apache.org/jira/browse/ARROW-11757 This is an automated message from the Apache Git Ser

[GitHub] [arrow] projjal opened a new pull request #9559: ARROW-11757: [C++][Gandiva] castTIMESTAMP(utf8) function should return rounded milliseconds if input has higher precision than millis

2021-02-23 Thread GitBox
projjal opened a new pull request #9559: URL: https://github.com/apache/arrow/pull/9559 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] sagnikc-dremio commented on a change in pull request #9450: ARROW-11565: [C++][Gandiva] Modify upper()/lower() logic to make them work for utf8 strings

2021-02-23 Thread GitBox
sagnikc-dremio commented on a change in pull request #9450: URL: https://github.com/apache/arrow/pull/9450#discussion_r581644640 ## File path: cpp/src/gandiva/precompiled/string_ops.cc ## @@ -219,28 +225,74 @@ const char* upper_utf8(gdv_int64 context, const char* data, gdv_int

[GitHub] [arrow] emkornfield commented on pull request #9504: ARROW-2229: [C++][Python] Add WriteCsv functionality.

2021-02-23 Thread GitBox
emkornfield commented on pull request #9504: URL: https://github.com/apache/arrow/pull/9504#issuecomment-784785345 @pitrou I think I addressed all your comments. Not sure what is going on with the R CI builds? This is an au

[GitHub] [arrow] emkornfield commented on a change in pull request #9504: ARROW-2229: [C++][Python] Add WriteCsv functionality.

2021-02-23 Thread GitBox
emkornfield commented on a change in pull request #9504: URL: https://github.com/apache/arrow/pull/9504#discussion_r581622101 ## File path: python/pyarrow/_csv.pyx ## @@ -763,3 +765,86 @@ def open_csv(input_file, read_options=None, parse_options=None, move(c_

[GitHub] [arrow] ursabot commented on pull request #9272: [WIP] Benchmark placebo

2021-02-23 Thread GitBox
ursabot commented on pull request #9272: URL: https://github.com/apache/arrow/pull/9272#issuecomment-784764126 Benchmark runs are scheduled for baseline = 356c300c5ee1e2b23a83652514af11e3a731d596 and contender = 0f7cd4b8cb71cd5a7135404b2abc6e77de3aea7f. Results will be available as each b

[GitHub] [arrow] github-actions[bot] commented on pull request #9558: ARROW-11727: [C++][FlightRPC] Estimate latency quantiles with TDigest

2021-02-23 Thread GitBox
github-actions[bot] commented on pull request #9558: URL: https://github.com/apache/arrow/pull/9558#issuecomment-784759935 https://issues.apache.org/jira/browse/ARROW-11727 This is an automated message from the Apache Git Ser

[GitHub] [arrow] cyb70289 opened a new pull request #9558: ARROW-11727: [C++][FlightRPC] Estimate latency quantiles with TDigest

2021-02-23 Thread GitBox
cyb70289 opened a new pull request #9558: URL: https://github.com/apache/arrow/pull/9558 Current code uses P-Square algorithm from boost accumulator library to estimate latency quantiles. P-Square is very bad at estimating skewed quantiles like 0.99. This patch replaces boost accumulat

[GitHub] [arrow] cyb70289 commented on a change in pull request #9556: ARROW-11744: [C++] Add xsimd dependency

2021-02-23 Thread GitBox
cyb70289 commented on a change in pull request #9556: URL: https://github.com/apache/arrow/pull/9556#discussion_r581583765 ## File path: cpp/src/arrow/util/utf8.h ## @@ -240,54 +244,26 @@ inline bool ValidateAsciiSw(const uint8_t* data, int64_t len) { } } -#ifdef ARROW_H

[GitHub] [arrow] ianmcook commented on a change in pull request #9521: ARROW-11683: [R] Support dplyr::mutate()

2021-02-23 Thread GitBox
ianmcook commented on a change in pull request #9521: URL: https://github.com/apache/arrow/pull/9521#discussion_r581582449 ## File path: r/R/dplyr.R ## @@ -309,8 +344,27 @@ collect.arrow_dplyr_query <- function(x, as_data_frame = TRUE, ...) { # See dataset.R for Dataset a

[GitHub] [arrow] cyb70289 edited a comment on pull request #9556: ARROW-11744: [C++] Add xsimd dependency

2021-02-23 Thread GitBox
cyb70289 edited a comment on pull request #9556: URL: https://github.com/apache/arrow/pull/9556#issuecomment-784702171 > ```diff > +#if XSIMD_ARM_INSTR_SET >= XSIMD_ARM8_64_NEON_VERSION > +return vmaxvq_u8(rhs) != 0; > +#else > ``` ~~For Arm64, the `#if` sh

[GitHub] [arrow] ianmcook commented on a change in pull request #9521: ARROW-11683: [R] Support dplyr::mutate()

2021-02-23 Thread GitBox
ianmcook commented on a change in pull request #9521: URL: https://github.com/apache/arrow/pull/9521#discussion_r581572604 ## File path: r/R/dplyr.R ## @@ -309,8 +344,27 @@ collect.arrow_dplyr_query <- function(x, as_data_frame = TRUE, ...) { # See dataset.R for Dataset a

[GitHub] [arrow] cyb70289 commented on pull request #9556: ARROW-11744: [C++] Add xsimd dependency

2021-02-23 Thread GitBox
cyb70289 commented on pull request #9556: URL: https://github.com/apache/arrow/pull/9556#issuecomment-784702171 > ```diff > +#if XSIMD_ARM_INSTR_SET >= XSIMD_ARM8_64_NEON_VERSION > +return vmaxvq_u8(rhs) != 0; > +#else > ``` For Arm64, the `#if` should be t

[GitHub] [arrow] kou commented on a change in pull request #9556: ARROW-11744: [C++] Add xsimd dependency

2021-02-23 Thread GitBox
kou commented on a change in pull request #9556: URL: https://github.com/apache/arrow/pull/9556#discussion_r581548334 ## File path: cpp/cmake_modules/ThirdpartyToolchain.cmake ## @@ -1904,6 +1919,30 @@ if(ARROW_WITH_RAPIDJSON) include_directories(SYSTEM ${RAPIDJSON_INCLUDE_D

[GitHub] [arrow] cyb70289 commented on pull request #9556: ARROW-11744: [C++] Add xsimd dependency

2021-02-23 Thread GitBox
cyb70289 commented on pull request #9556: URL: https://github.com/apache/arrow/pull/9556#issuecomment-784671792 Add xsimd to /LICENSE.txt? This is an automated message from the Apache Git Service. To respond to the message, p

[GitHub] [arrow] ianmcook commented on a change in pull request #9521: ARROW-11683: [R] Support dplyr::mutate()

2021-02-23 Thread GitBox
ianmcook commented on a change in pull request #9521: URL: https://github.com/apache/arrow/pull/9521#discussion_r581537939 ## File path: r/R/dplyr.R ## @@ -309,8 +344,27 @@ collect.arrow_dplyr_query <- function(x, as_data_frame = TRUE, ...) { # See dataset.R for Dataset a

[GitHub] [arrow] ianmcook commented on a change in pull request #9521: ARROW-11683: [R] Support dplyr::mutate()

2021-02-23 Thread GitBox
ianmcook commented on a change in pull request #9521: URL: https://github.com/apache/arrow/pull/9521#discussion_r581514795 ## File path: r/tests/testthat/test-dplyr-mutate.R ## @@ -0,0 +1,311 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributo

[GitHub] [arrow] dianaclarke commented on a change in pull request #9557: ARROW-11746: [Developer][Archery] Fix prefer real time check

2021-02-23 Thread GitBox
dianaclarke commented on a change in pull request #9557: URL: https://github.com/apache/arrow/pull/9557#discussion_r581503176 ## File path: dev/archery/tests/test_benchmarks.py ## @@ -88,7 +143,7 @@ def test_omits_aggregates(): "name": name, "unit": "ns",

[GitHub] [arrow] seddonm1 commented on pull request #9551: ARROW-11738: [Rust][DataFusion] Fix Concat and Trim Functions

2021-02-23 Thread GitBox
seddonm1 commented on pull request #9551: URL: https://github.com/apache/arrow/pull/9551#issuecomment-784611213 thanks @alamb rebased 👍 This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [arrow] github-actions[bot] commented on pull request #9544: ARROW-11753: [Rust][DataFusion] Add tests for when Datafusion qualified field names resolved

2021-02-23 Thread GitBox
github-actions[bot] commented on pull request #9544: URL: https://github.com/apache/arrow/pull/9544#issuecomment-784606419 https://issues.apache.org/jira/browse/ARROW-11753 This is an automated message from the Apache Git Ser

[GitHub] [arrow] alamb edited a comment on pull request #9531: ARROW-11697: [Rust][DataFusion] Add provider for user defined function

2021-02-23 Thread GitBox
alamb edited a comment on pull request #9531: URL: https://github.com/apache/arrow/pull/9531#issuecomment-784596915 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[GitHub] [arrow] westonpace commented on a change in pull request #9533: ARROW-11590: [C++] Move CSV background generator to IO thread pool

2021-02-23 Thread GitBox
westonpace commented on a change in pull request #9533: URL: https://github.com/apache/arrow/pull/9533#discussion_r581464660 ## File path: cpp/src/arrow/util/iterator_test.cc ## @@ -726,6 +730,142 @@ TEST(TestAsyncUtil, CompleteBackgroundStressTest) { } } +template +clas

[GitHub] [arrow] alamb commented on pull request #9531: ARROW-11697: [Rust][DataFusion] Add provider for user defined function

2021-02-23 Thread GitBox
alamb commented on pull request #9531: URL: https://github.com/apache/arrow/pull/9531#issuecomment-784596915 @wqc200 here is a way to use `std::sync::Mutex` to accomplish changing database name: ``` use std::sync::{Arc, Mutex}; use arrow::array::ArrayRef; use arrow::arr

[GitHub] [arrow] westonpace commented on a change in pull request #9533: ARROW-11590: [C++] Move CSV background generator to IO thread pool

2021-02-23 Thread GitBox
westonpace commented on a change in pull request #9533: URL: https://github.com/apache/arrow/pull/9533#discussion_r581463629 ## File path: cpp/src/arrow/util/async_generator.h ## @@ -177,6 +179,90 @@ class TransformingGenerator { std::shared_ptr state_; }; +template +cla

[GitHub] [arrow] westonpace commented on a change in pull request #9533: ARROW-11590: [C++] Move CSV background generator to IO thread pool

2021-02-23 Thread GitBox
westonpace commented on a change in pull request #9533: URL: https://github.com/apache/arrow/pull/9533#discussion_r581463408 ## File path: cpp/src/arrow/util/async_generator.h ## @@ -177,6 +179,90 @@ class TransformingGenerator { std::shared_ptr state_; }; +template +cla

[GitHub] [arrow] nealrichardson commented on a change in pull request #9521: ARROW-11683: [R] Support dplyr::mutate()

2021-02-23 Thread GitBox
nealrichardson commented on a change in pull request #9521: URL: https://github.com/apache/arrow/pull/9521#discussion_r581458393 ## File path: r/R/dplyr.R ## @@ -423,26 +482,115 @@ ungroup.arrow_dplyr_query <- function(x, ...) { } ungroup.Dataset <- ungroup.ArrowTabular <- fo

[GitHub] [arrow] alamb commented on a change in pull request #9537: ARROW-11719: [Rust][Datafusion] support creating memory table with merged schema

2021-02-23 Thread GitBox
alamb commented on a change in pull request #9537: URL: https://github.com/apache/arrow/pull/9537#discussion_r581457207 ## File path: rust/datafusion/src/datasource/memory.rs ## @@ -365,4 +366,91 @@ mod tests { Ok(()) } + +#[test] +fn test_schema_validat

[GitHub] [arrow] alamb commented on pull request #9551: ARROW-11738: [Rust][DataFusion] Fix Concat and Trim Functions

2021-02-23 Thread GitBox
alamb commented on pull request #9551: URL: https://github.com/apache/arrow/pull/9551#issuecomment-784578819 This PR sadly does need a rebase This is an automated message from the Apache Git Service. To respond to the message

[GitHub] [arrow] jonkeane commented on a change in pull request #9521: ARROW-11683: [R] Support dplyr::mutate()

2021-02-23 Thread GitBox
jonkeane commented on a change in pull request #9521: URL: https://github.com/apache/arrow/pull/9521#discussion_r581452621 ## File path: r/R/dplyr.R ## @@ -423,26 +482,115 @@ ungroup.arrow_dplyr_query <- function(x, ...) { } ungroup.Dataset <- ungroup.ArrowTabular <- force

[GitHub] [arrow] alamb commented on a change in pull request #9548: ARROW-11733: [Rust][DataFusion] Implement hash partitioning

2021-02-23 Thread GitBox
alamb commented on a change in pull request #9548: URL: https://github.com/apache/arrow/pull/9548#discussion_r581451845 ## File path: rust/datafusion/src/physical_plan/repartition.rs ## @@ -305,6 +347,33 @@ mod tests { Ok(()) } +#[tokio::test(flavor = "multi

[GitHub] [arrow] westonpace commented on a change in pull request #9533: ARROW-11590: [C++] Move CSV background generator to IO thread pool

2021-02-23 Thread GitBox
westonpace commented on a change in pull request #9533: URL: https://github.com/apache/arrow/pull/9533#discussion_r581450731 ## File path: cpp/src/arrow/vendored/ProducerConsumerQueue.h ## @@ -0,0 +1,214 @@ +// Vendored from git tag v2021.02.15.00 + +/* + * Copyright (c) Facebo

[GitHub] [arrow] alamb commented on pull request #9548: ARROW-11733: [Rust][DataFusion] Implement hash partitioning

2021-02-23 Thread GitBox
alamb commented on pull request #9548: URL: https://github.com/apache/arrow/pull/9548#issuecomment-784571729 The integration failure looks like https://issues.apache.org/jira/browse/ARROW-11717 This is an automated m

[GitHub] [arrow] nealrichardson commented on a change in pull request #9521: ARROW-11683: [R] Support dplyr::mutate()

2021-02-23 Thread GitBox
nealrichardson commented on a change in pull request #9521: URL: https://github.com/apache/arrow/pull/9521#discussion_r581449337 ## File path: r/R/dplyr.R ## @@ -309,8 +344,27 @@ collect.arrow_dplyr_query <- function(x, as_data_frame = TRUE, ...) { # See dataset.R for Dat

[GitHub] [arrow] alamb commented on pull request #9536: ARROW-11718: [Rust] Don't write IPC footers on drop

2021-02-23 Thread GitBox
alamb commented on pull request #9536: URL: https://github.com/apache/arrow/pull/9536#issuecomment-784571131 I plan to merge this PR on this upcoming weekend barring any additional suggested changes This is an automated mess

[GitHub] [arrow] nealrichardson commented on a change in pull request #9521: ARROW-11683: [R] Support dplyr::mutate()

2021-02-23 Thread GitBox
nealrichardson commented on a change in pull request #9521: URL: https://github.com/apache/arrow/pull/9521#discussion_r581448403 ## File path: r/R/dplyr.R ## @@ -423,26 +482,115 @@ ungroup.arrow_dplyr_query <- function(x, ...) { } ungroup.Dataset <- ungroup.ArrowTabular <- fo

[GitHub] [arrow] alamb closed pull request #9526: ARROW-11688: [Rust] Casts between Utf8 and LargeUtf8

2021-02-23 Thread GitBox
alamb closed pull request #9526: URL: https://github.com/apache/arrow/pull/9526 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] alamb commented on a change in pull request #9495: ARROW-11627: [Rust] Make allocator be a generic over type T

2021-02-23 Thread GitBox
alamb commented on a change in pull request #9495: URL: https://github.com/apache/arrow/pull/9495#discussion_r581444906 ## File path: rust/arrow/src/alloc/alignment.rs ## @@ -0,0 +1,119 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

[GitHub] [arrow] alamb closed pull request #9544: ARROW-11753: [Rust][DataFusion] Add tests for when Datafusion qualified field names resolved

2021-02-23 Thread GitBox
alamb closed pull request #9544: URL: https://github.com/apache/arrow/pull/9544 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] alamb commented on pull request #9544: ARROW-11432: [Rust][DataFusion] Add tests for when Datafusion qualified field names resolved

2021-02-23 Thread GitBox
alamb commented on pull request #9544: URL: https://github.com/apache/arrow/pull/9544#issuecomment-784565782 I filed https://issues.apache.org/jira/browse/ARROW-11753 just for the test and will merge this PR in for that. Thanks again @TurnOfACard -

[GitHub] [arrow] westonpace commented on a change in pull request #9519: ARROW-11680: [C++] Add vendored version of folly's spsc queue

2021-02-23 Thread GitBox
westonpace commented on a change in pull request #9519: URL: https://github.com/apache/arrow/pull/9519#discussion_r581443079 ## File path: cpp/src/arrow/util/queue.h ## @@ -0,0 +1,29 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licen

[GitHub] [arrow] alamb commented on pull request #9544: ARROW-11432: [Rust][DataFusion] Add tests for when Datafusion qualified field names resolved

2021-02-23 Thread GitBox
alamb commented on pull request #9544: URL: https://github.com/apache/arrow/pull/9544#issuecomment-784564950 The integration failure looks like https://issues.apache.org/jira/browse/ARROW-11717 This is an automated m

[GitHub] [arrow] seddonm1 commented on pull request #9551: ARROW-11738: [Rust][DataFusion] Fix Concat and Trim Functions

2021-02-23 Thread GitBox
seddonm1 commented on pull request #9551: URL: https://github.com/apache/arrow/pull/9551#issuecomment-784561772 > LGTM! > I think from time to time we will can check on the status of the tpch queries. Feel free to split the PRs in the way you think makes sense @seddonm1 👍 @Dandand

[GitHub] [arrow] ianmcook commented on a change in pull request #9521: ARROW-11683: [R] Support dplyr::mutate()

2021-02-23 Thread GitBox
ianmcook commented on a change in pull request #9521: URL: https://github.com/apache/arrow/pull/9521#discussion_r581437454 ## File path: r/R/dplyr.R ## @@ -423,26 +482,115 @@ ungroup.arrow_dplyr_query <- function(x, ...) { } ungroup.Dataset <- ungroup.ArrowTabular <- force

[GitHub] [arrow] westonpace commented on a change in pull request #9532: ARROW-11174: [C++][Dataset] Make expressions available to projection

2021-02-23 Thread GitBox
westonpace commented on a change in pull request #9532: URL: https://github.com/apache/arrow/pull/9532#discussion_r581365819 ## File path: cpp/src/arrow/compute/kernels/scalar_nested.cc ## @@ -63,9 +63,15 @@ const FunctionDoc list_value_length_doc{ Result ProjectResolve(Kernel

[GitHub] [arrow] pitrou commented on pull request #9556: ARROW-11744: [C++] Add xsimd dependency

2021-02-23 Thread GitBox
pitrou commented on pull request #9556: URL: https://github.com/apache/arrow/pull/9556#issuecomment-784558707 Ok, I submitted a fix for the `any()` issue in xsimd: https://github.com/xtensor-stack/xsimd/pull/419 This is an a

[GitHub] [arrow] nealrichardson commented on a change in pull request #9521: ARROW-11683: [R] Support dplyr::mutate()

2021-02-23 Thread GitBox
nealrichardson commented on a change in pull request #9521: URL: https://github.com/apache/arrow/pull/9521#discussion_r581432453 ## File path: r/tests/testthat/test-dplyr-mutate.R ## @@ -0,0 +1,311 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more cont

[GitHub] [arrow] nealrichardson commented on a change in pull request #9521: ARROW-11683: [R] Support dplyr::mutate()

2021-02-23 Thread GitBox
nealrichardson commented on a change in pull request #9521: URL: https://github.com/apache/arrow/pull/9521#discussion_r581431268 ## File path: r/R/dplyr.R ## @@ -309,8 +344,27 @@ collect.arrow_dplyr_query <- function(x, as_data_frame = TRUE, ...) { # See dataset.R for Dat

[GitHub] [arrow] ianmcook commented on a change in pull request #9521: ARROW-11683: [R] Support dplyr::mutate()

2021-02-23 Thread GitBox
ianmcook commented on a change in pull request #9521: URL: https://github.com/apache/arrow/pull/9521#discussion_r581367505 ## File path: r/R/dplyr.R ## @@ -309,8 +344,27 @@ collect.arrow_dplyr_query <- function(x, as_data_frame = TRUE, ...) { # See dataset.R for Dataset a

[GitHub] [arrow] nealrichardson commented on a change in pull request #9521: ARROW-11683: [R] Support dplyr::mutate()

2021-02-23 Thread GitBox
nealrichardson commented on a change in pull request #9521: URL: https://github.com/apache/arrow/pull/9521#discussion_r581427567 ## File path: r/tests/testthat/helper-expectation.R ## @@ -59,3 +59,66 @@ verify_output <- function(...) { } testthat::verify_output(...) } +

[GitHub] [arrow] nealrichardson commented on a change in pull request #9521: ARROW-11683: [R] Support dplyr::mutate()

2021-02-23 Thread GitBox
nealrichardson commented on a change in pull request #9521: URL: https://github.com/apache/arrow/pull/9521#discussion_r581425493 ## File path: r/R/dplyr.R ## @@ -423,26 +482,115 @@ ungroup.arrow_dplyr_query <- function(x, ...) { } ungroup.Dataset <- ungroup.ArrowTabular <- fo

[GitHub] [arrow] nealrichardson commented on a change in pull request #9521: ARROW-11683: [R] Support dplyr::mutate()

2021-02-23 Thread GitBox
nealrichardson commented on a change in pull request #9521: URL: https://github.com/apache/arrow/pull/9521#discussion_r581424395 ## File path: r/R/dplyr.R ## @@ -309,8 +344,27 @@ collect.arrow_dplyr_query <- function(x, as_data_frame = TRUE, ...) { # See dataset.R for Dat

[GitHub] [arrow] nealrichardson commented on a change in pull request #9521: ARROW-11683: [R] Support dplyr::mutate()

2021-02-23 Thread GitBox
nealrichardson commented on a change in pull request #9521: URL: https://github.com/apache/arrow/pull/9521#discussion_r581423955 ## File path: r/R/dplyr.R ## @@ -73,6 +80,22 @@ print.arrow_dplyr_query <- function(x, ...) { invisible(x) } +get_field_names <- function(selec

[GitHub] [arrow] seddonm1 commented on a change in pull request #9551: ARROW-11738: [Rust][DataFusion] Fix Concat and Trim Functions

2021-02-23 Thread GitBox
seddonm1 commented on a change in pull request #9551: URL: https://github.com/apache/arrow/pull/9551#discussion_r581423358 ## File path: rust/datafusion/src/physical_plan/functions.rs ## @@ -171,44 +180,51 @@ impl FromStr for BuiltinScalarFunction { type Err = DataFusionEr

[GitHub] [arrow] jonkeane commented on a change in pull request #9521: ARROW-11683: [R] Support dplyr::mutate()

2021-02-23 Thread GitBox
jonkeane commented on a change in pull request #9521: URL: https://github.com/apache/arrow/pull/9521#discussion_r581372997 ## File path: r/tests/testthat/helper-expectation.R ## @@ -59,3 +59,66 @@ verify_output <- function(...) { } testthat::verify_output(...) } + +expec

[GitHub] [arrow] kou commented on a change in pull request #9519: ARROW-11680: [C++] Add vendored version of folly's spsc queue

2021-02-23 Thread GitBox
kou commented on a change in pull request #9519: URL: https://github.com/apache/arrow/pull/9519#discussion_r581422485 ## File path: cpp/src/arrow/util/queue.h ## @@ -0,0 +1,29 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agre

[GitHub] [arrow] Dandandan commented on a change in pull request #9551: ARROW-11738: [Rust][DataFusion] Fix Concat and Trim Functions

2021-02-23 Thread GitBox
Dandandan commented on a change in pull request #9551: URL: https://github.com/apache/arrow/pull/9551#discussion_r581421495 ## File path: rust/datafusion/src/physical_plan/functions.rs ## @@ -171,44 +180,51 @@ impl FromStr for BuiltinScalarFunction { type Err = DataFusionE

[GitHub] [arrow] bkietz closed pull request #9457: ARROW-11573: [Developer][Archery] Google benchmark now reports run type

2021-02-23 Thread GitBox
bkietz closed pull request #9457: URL: https://github.com/apache/arrow/pull/9457 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] bkietz commented on a change in pull request #9533: ARROW-11590: [C++] Move CSV background generator to IO thread pool

2021-02-23 Thread GitBox
bkietz commented on a change in pull request #9533: URL: https://github.com/apache/arrow/pull/9533#discussion_r581379184 ## File path: cpp/src/arrow/util/async_generator.h ## @@ -177,6 +179,90 @@ class TransformingGenerator { std::shared_ptr state_; }; +template +class S

[GitHub] [arrow] seddonm1 commented on pull request #9551: ARROW-11738: [Rust][DataFusion] Fix Concat and Trim Functions

2021-02-23 Thread GitBox
seddonm1 commented on pull request #9551: URL: https://github.com/apache/arrow/pull/9551#issuecomment-784538869 @Dandandan I have added `substr` to this PR This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow] nealrichardson closed pull request #9555: ARROW-11743: [R] Use pkgdown's new found ability to autolink Jiras

2021-02-23 Thread GitBox
nealrichardson closed pull request #9555: URL: https://github.com/apache/arrow/pull/9555 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [arrow] seddonm1 commented on pull request #9551: ARROW-11738: [Rust][DataFusion] Fix Concat and Trim Functions

2021-02-23 Thread GitBox
seddonm1 commented on pull request #9551: URL: https://github.com/apache/arrow/pull/9551#issuecomment-784490422 @Dandandan I can add `substr` to this PR if that would help you with TCPH-q22? This is an automated message from

[GitHub] [arrow] github-actions[bot] commented on pull request #9557: ARROW-11746: [Developer][Archery] Fix prefer real time check

2021-02-23 Thread GitBox
github-actions[bot] commented on pull request #9557: URL: https://github.com/apache/arrow/pull/9557#issuecomment-784477305 https://issues.apache.org/jira/browse/ARROW-11746 This is an automated message from the Apache Git Ser

[GitHub] [arrow] bkietz commented on a change in pull request #9554: ARROW-11741: [C++] Fix decimal casts on big endian platforms

2021-02-23 Thread GitBox
bkietz commented on a change in pull request #9554: URL: https://github.com/apache/arrow/pull/9554#discussion_r581340233 ## File path: cpp/src/arrow/compute/kernels/codegen_internal.h ## @@ -663,13 +663,14 @@ struct ScalarUnaryNotNullStateful { static void Exec(const ThisT

[GitHub] [arrow] alamb commented on pull request #9543: ARROW-11725: [Rust][DataFusion] Make use of the new divide_scalar kernel in arrow

2021-02-23 Thread GitBox
alamb commented on pull request #9543: URL: https://github.com/apache/arrow/pull/9543#issuecomment-784469122 Merged. 🎉 Thanks @abreis This is an automated message from the Apache Git Service. To respond to the message, plea

[GitHub] [arrow] alamb closed pull request #9543: ARROW-11725: [Rust][DataFusion] Make use of the new divide_scalar kernel in arrow

2021-02-23 Thread GitBox
alamb closed pull request #9543: URL: https://github.com/apache/arrow/pull/9543 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] alamb commented on pull request #9543: ARROW-11725: [Rust][DataFusion] Make use of the new divide_scalar kernel in arrow

2021-02-23 Thread GitBox
alamb commented on pull request #9543: URL: https://github.com/apache/arrow/pull/9543#issuecomment-784468236 The integration failure looks like https://issues.apache.org/jira/browse/ARROW-11717 This is an automated message f

[GitHub] [arrow] pitrou commented on pull request #9554: ARROW-11741: [C++] Fix decimal casts on big endian platforms

2021-02-23 Thread GitBox
pitrou commented on pull request #9554: URL: https://github.com/apache/arrow/pull/9554#issuecomment-784448840 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [arrow] westonpace commented on a change in pull request #9519: ARROW-11680: [C++] Add vendored version of folly's spsc queue

2021-02-23 Thread GitBox
westonpace commented on a change in pull request #9519: URL: https://github.com/apache/arrow/pull/9519#discussion_r581334121 ## File path: LICENSE.txt ## @@ -2119,6 +2119,11 @@ DEALINGS IN THE SOFTWARE. ---

[GitHub] [arrow] bkietz commented on pull request #9554: ARROW-11741: [C++] Fix decimal casts on big endian platforms

2021-02-23 Thread GitBox
bkietz commented on pull request #9554: URL: https://github.com/apache/arrow/pull/9554#issuecomment-784445517 Isn't it impossible for conformant data? This is an automated message from the Apache Git Service. To respond to th

[GitHub] [arrow] pitrou commented on pull request #9554: ARROW-11741: [C++] Fix decimal casts on big endian platforms

2021-02-23 Thread GitBox
pitrou commented on pull request #9554: URL: https://github.com/apache/arrow/pull/9554#issuecomment-78883 That would be quite unlikely indeed :-) This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] bkietz commented on pull request #9554: ARROW-11741: [C++] Fix decimal casts on big endian platforms

2021-02-23 Thread GitBox
bkietz commented on pull request #9554: URL: https://github.com/apache/arrow/pull/9554#issuecomment-784443332 When would any of these elements not be aligned to a 16 byte boundary? This is an automated message from the Apache

[GitHub] [arrow] dianaclarke opened a new pull request #9557: ARROW-11746: [Developer][Archery] Fix prefer real time check

2021-02-23 Thread GitBox
dianaclarke opened a new pull request #9557: URL: https://github.com/apache/arrow/pull/9557 See: https://issues.apache.org/jira/browse/ARROW-11746 Google Benchmark adds `/real_time` to the end of the benchmark name to indicate if the `real_time` observation should be preferred over t

[GitHub] [arrow] nealrichardson commented on a change in pull request #9555: ARROW-11743: [R] Add dev version of pkgdown for Jira autolinking

2021-02-23 Thread GitBox
nealrichardson commented on a change in pull request #9555: URL: https://github.com/apache/arrow/pull/9555#discussion_r581300715 ## File path: r/DESCRIPTION ## @@ -97,3 +97,4 @@ Collate: 'reexports-tidyselect.R' 'schema.R' 'util.R' + Review comment: ```su

[GitHub] [arrow] pitrou commented on a change in pull request #9504: ARROW-2229: [C++][Python] Add WriteCsv functionality.

2021-02-23 Thread GitBox
pitrou commented on a change in pull request #9504: URL: https://github.com/apache/arrow/pull/9504#discussion_r581283432 ## File path: cpp/cmake_modules/SetupCxxFlags.cmake ## @@ -629,3 +629,8 @@ if(MSVC) set(CMAKE_SHARED_LINKER_FLAGS "${CMAKE_SHARED_LINKER_FLAGS} ${MSVC_

[GitHub] [arrow] abreis commented on pull request #9543: ARROW-11725: [Rust][DataFusion] Make use of the new divide_scalar kernel in arrow

2021-02-23 Thread GitBox
abreis commented on pull request #9543: URL: https://github.com/apache/arrow/pull/9543#issuecomment-784418876 CI failure seems unrelated. Note that the first commit already passed CI, and this second commit only changes a few error strings, so it should be safe to merge.

[GitHub] [arrow] pitrou commented on a change in pull request #9458: ARROW-11575: [Developer][Archery] Expose execution time in benchmark results

2021-02-23 Thread GitBox
pitrou commented on a change in pull request #9458: URL: https://github.com/apache/arrow/pull/9458#discussion_r581280930 ## File path: dev/archery/archery/benchmark/core.py ## @@ -27,11 +27,14 @@ def median(values): class Benchmark: -def __init__(self, name, unit, less

[GitHub] [arrow] pitrou commented on pull request #9556: ARROW-11744: [C++] Add xsimd dependency

2021-02-23 Thread GitBox
pitrou commented on pull request #9556: URL: https://github.com/apache/arrow/pull/9556#issuecomment-784388250 @edponce You might want to take a short look at this. This is an automated message from the Apache Git Service. To

[GitHub] [arrow] alamb commented on pull request #9543: ARROW-11725: [Rust][DataFusion] Make use of the new divide_scalar kernel in arrow

2021-02-23 Thread GitBox
alamb commented on pull request #9543: URL: https://github.com/apache/arrow/pull/9543#issuecomment-784361313 I plan to merge this once the CI goes green This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] pitrou edited a comment on pull request #9556: ARROW-11744: [C++] Add xsimd dependency

2021-02-23 Thread GitBox
pitrou edited a comment on pull request #9556: URL: https://github.com/apache/arrow/pull/9556#issuecomment-784342153 Looks like a possible fix for xsimd could be (untested): ```diff diff --git a/include/xsimd/types/xsimd_neon_bool.hpp b/include/xsimd/types/xsimd_neon_bool.hpp index

[GitHub] [arrow] github-actions[bot] commented on pull request #9556: ARROW-11744: [C++] Add xsimd dependency

2021-02-23 Thread GitBox
github-actions[bot] commented on pull request #9556: URL: https://github.com/apache/arrow/pull/9556#issuecomment-784355315 https://issues.apache.org/jira/browse/ARROW-11744 This is an automated message from the Apache Git Ser

[GitHub] [arrow] dianaclarke commented on a change in pull request #9458: ARROW-11575: [Developer][Archery] Expose execution time in benchmark results

2021-02-23 Thread GitBox
dianaclarke commented on a change in pull request #9458: URL: https://github.com/apache/arrow/pull/9458#discussion_r581212464 ## File path: dev/archery/archery/benchmark/core.py ## @@ -27,11 +27,14 @@ def median(values): class Benchmark: -def __init__(self, name, unit,

[GitHub] [arrow] pitrou commented on pull request #9556: ARROW-11744: [C++] Add xsimd dependency

2021-02-23 Thread GitBox
pitrou commented on pull request #9556: URL: https://github.com/apache/arrow/pull/9556#issuecomment-784342153 Looks like a possible fix for xsimd could be: ```diff diff --git a/include/xsimd/types/xsimd_neon_bool.hpp b/include/xsimd/types/xsimd_neon_bool.hpp index 0d2baf2..75bf3e1

[GitHub] [arrow] emkornfield commented on pull request #9504: ARROW-2229: [C++][Python] Add WriteCsv functionality.

2021-02-23 Thread GitBox
emkornfield commented on pull request #9504: URL: https://github.com/apache/arrow/pull/9504#issuecomment-784334926 @pitrou rebased. This is an automated message from the Apache Git Service. To respond to the message, please l

[GitHub] [arrow] pitrou commented on pull request #9504: ARROW-2229: [C++][Python] Add WriteCsv functionality.

2021-02-23 Thread GitBox
pitrou commented on pull request #9504: URL: https://github.com/apache/arrow/pull/9504#issuecomment-784332306 Can you rebase from master to fix Windows builds? This is an automated message from the Apache Git Service. To resp

[GitHub] [arrow] abreis commented on a change in pull request #9543: ARROW-11725: [Rust][DataFusion] Make use of the new divide_scalar kernel in arrow

2021-02-23 Thread GitBox
abreis commented on a change in pull request #9543: URL: https://github.com/apache/arrow/pull/9543#discussion_r581191828 ## File path: rust/datafusion/src/physical_plan/expressions/binary.rs ## @@ -209,12 +211,37 @@ macro_rules! binary_primitive_array_op { }}; } +/// In

[GitHub] [arrow] pitrou commented on pull request #9556: ARROW-11744: [C++] Add xsimd dependency

2021-02-23 Thread GitBox
pitrou commented on pull request #9556: URL: https://github.com/apache/arrow/pull/9556#issuecomment-784330951 (in any case, looks like boolean reduction with Neon is tricky: https://stackoverflow.com/questions/31197216/optimizing-horizontal-boolean-reduction-in-arm-neon) -

[GitHub] [arrow] dianaclarke commented on a change in pull request #9458: ARROW-11575: [Developer][Archery] Expose execution time in benchmark results

2021-02-23 Thread GitBox
dianaclarke commented on a change in pull request #9458: URL: https://github.com/apache/arrow/pull/9458#discussion_r581190754 ## File path: dev/archery/archery/benchmark/core.py ## @@ -27,11 +27,14 @@ def median(values): class Benchmark: -def __init__(self, name, unit,

[GitHub] [arrow] dianaclarke commented on a change in pull request #9458: ARROW-11575: [Developer][Archery] Expose execution time in benchmark results

2021-02-23 Thread GitBox
dianaclarke commented on a change in pull request #9458: URL: https://github.com/apache/arrow/pull/9458#discussion_r581177513 ## File path: dev/archery/archery/benchmark/core.py ## @@ -27,11 +27,14 @@ def median(values): class Benchmark: -def __init__(self, name, unit,

[GitHub] [arrow] pitrou commented on pull request #9556: ARROW-11744: [C++] Add xsimd dependency

2021-02-23 Thread GitBox
pitrou commented on pull request #9556: URL: https://github.com/apache/arrow/pull/9556#issuecomment-784316245 Looking at the source, it seems the implementation of `xsimd::batch_bool::any` may be buggy: https://github.com/xtensor-stack/xsimd/blob/master/include/xsimd/types/xsimd_neon_bool.

[GitHub] [arrow] nealrichardson commented on a change in pull request #9555: ARROW-11743: [R] Add dev version of pkgdown for Jira autolinking

2021-02-23 Thread GitBox
nealrichardson commented on a change in pull request #9555: URL: https://github.com/apache/arrow/pull/9555#discussion_r581171391 ## File path: r/DESCRIPTION ## @@ -97,3 +97,5 @@ Collate: 'reexports-tidyselect.R' 'schema.R' 'util.R' +Remotes: Review comment:

  1   2   >