[GitHub] [arrow] codecov-io edited a comment on pull request #9592: ARROW-11803: [Rust] [Parquet] Support v2 LogicalType

2021-03-05 Thread GitBox
codecov-io edited a comment on pull request #9592: URL: https://github.com/apache/arrow/pull/9592#issuecomment-787041283 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9592?src=pr=h1) Report > Merging [#9592](https://codecov.io/gh/apache/arrow/pull/9592?src=pr=desc) (209622a) into

[GitHub] [arrow] nevi-me commented on pull request #9592: ARROW-11803: [Rust] [Parquet] Support v2 LogicalType

2021-03-05 Thread GitBox
nevi-me commented on pull request #9592: URL: https://github.com/apache/arrow/pull/9592#issuecomment-791885806 > One seems to be failing for me locally: @alamb I was writing to the same file from 2 tests, so it looks like it was a timing issue. I've now fixed this.

[GitHub] [arrow] nevi-me closed pull request #9642: ARROW-11881: [Rust][DataFusion] Fix clippy lint

2021-03-05 Thread GitBox
nevi-me closed pull request #9642: URL: https://github.com/apache/arrow/pull/9642 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] jorgecarleitao commented on pull request #9624: ARROW-11845: [Rust] Implement to_isize() for ArrowNativeTypes

2021-03-05 Thread GitBox
jorgecarleitao commented on pull request #9624: URL: https://github.com/apache/arrow/pull/9624#issuecomment-791864271 I agree @alamb : I think that this requires much more work because we do this in multiple places. @ericwburden thank you very much for your interest in arrow and

[GitHub] [arrow] github-actions[bot] commented on pull request #9644: ARROW-11887: [C++] Add asynchronous read to streaming CSV reader

2021-03-05 Thread GitBox
github-actions[bot] commented on pull request #9644: URL: https://github.com/apache/arrow/pull/9644#issuecomment-791862179 https://issues.apache.org/jira/browse/ARROW-11887 This is an automated message from the Apache Git

[GitHub] [arrow] westonpace opened a new pull request #9644: ARROW-11887: [C++] Add asynchronous read to streaming CSV reader

2021-03-05 Thread GitBox
westonpace opened a new pull request #9644: URL: https://github.com/apache/arrow/pull/9644 This moves the read to the IO context and runs continuations on the CPU thread pool. It does not add any parallelism. The resulting reader is not reentrant or async-reentrant and it does not do

[GitHub] [arrow] nealrichardson closed pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-05 Thread GitBox
nealrichardson closed pull request #9610: URL: https://github.com/apache/arrow/pull/9610 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow] github-actions[bot] commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-05 Thread GitBox
github-actions[bot] commented on pull request #9610: URL: https://github.com/apache/arrow/pull/9610#issuecomment-791854899 Revision: 1b1680f2e8b1cc05f19ed460905db2b349d6d1e3 Submitted crossbow builds: [ursacomputing/crossbow @

[GitHub] [arrow] nealrichardson commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-05 Thread GitBox
nealrichardson commented on pull request #9610: URL: https://github.com/apache/arrow/pull/9610#issuecomment-791854799 @github-actions crossbow submit test-r-rstudio* test-r-minimal-build This is an automated message from the

[GitHub] [arrow] ianmcook commented on a change in pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-05 Thread GitBox
ianmcook commented on a change in pull request #9610: URL: https://github.com/apache/arrow/pull/9610#discussion_r588825758 ## File path: dev/tasks/r/azure.linux.yml ## @@ -50,7 +50,12 @@ jobs: # we have to export this (right?) because we need it in the build env

[GitHub] [arrow] nealrichardson commented on a change in pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-05 Thread GitBox
nealrichardson commented on a change in pull request #9610: URL: https://github.com/apache/arrow/pull/9610#discussion_r588825318 ## File path: dev/tasks/r/azure.linux.yml ## @@ -50,7 +50,12 @@ jobs: # we have to export this (right?) because we need it in the build env

[GitHub] [arrow] github-actions[bot] commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-05 Thread GitBox
github-actions[bot] commented on pull request #9610: URL: https://github.com/apache/arrow/pull/9610#issuecomment-791851303 Revision: 96e91046ebcc2cd83289c63fe4f78c8e7841d40f Submitted crossbow builds: [ursacomputing/crossbow @

[GitHub] [arrow] nealrichardson commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-05 Thread GitBox
nealrichardson commented on pull request #9610: URL: https://github.com/apache/arrow/pull/9610#issuecomment-791851206 @github-actions crossbow submit test-r-minimal-build This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-05 Thread GitBox
github-actions[bot] commented on pull request #9610: URL: https://github.com/apache/arrow/pull/9610#issuecomment-791841697 Revision: b5ed1c0dace3dfa93b17d8545fd8a93a96dc4153 Submitted crossbow builds: [ursacomputing/crossbow @

[GitHub] [arrow] ianmcook commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-05 Thread GitBox
ianmcook commented on pull request #9610: URL: https://github.com/apache/arrow/pull/9610#issuecomment-791841345 @github-actions crossbow submit test-r-minimal-build This is an automated message from the Apache Git Service.

[GitHub] [arrow] ianmcook commented on a change in pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-05 Thread GitBox
ianmcook commented on a change in pull request #9610: URL: https://github.com/apache/arrow/pull/9610#discussion_r588816147 ## File path: dev/tasks/tasks.yml ## @@ -1789,6 +1789,18 @@ tasks: r_image: r-base r_tag: 3.6-opensuse42 not_cran: "TRUE" + +

[GitHub] [arrow] ianmcook commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-05 Thread GitBox
ianmcook commented on pull request #9610: URL: https://github.com/apache/arrow/pull/9610#issuecomment-791839980 > Sorry, I just merged the dataset mutate() PR which added a cpp function. Can you please rebase and update the annotation on that function? Otherwise this is looking good, will

[GitHub] [arrow] lidavidm commented on pull request #9386: ARROW-11373: [Python][Docs] Add example of specifying type for a column when reading csv file

2021-03-05 Thread GitBox
lidavidm commented on pull request #9386: URL: https://github.com/apache/arrow/pull/9386#issuecomment-791838356 I touched up the whitespace and merged this. Thanks @jaroszan! (If you make a JIRA account, let us know and we can assign the issue to you there, too.)

[GitHub] [arrow] lidavidm closed pull request #9386: ARROW-11373: [Python][Docs] Add example of specifying type for a column when reading csv file

2021-03-05 Thread GitBox
lidavidm closed pull request #9386: URL: https://github.com/apache/arrow/pull/9386 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] timkpaine commented on pull request #9611: ARROW-11833: [C++] Fix missing architecture flag for vendored fast_float

2021-03-05 Thread GitBox
timkpaine commented on pull request #9611: URL: https://github.com/apache/arrow/pull/9611#issuecomment-791820886 this has now been merged upstream This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] nealrichardson commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-05 Thread GitBox
nealrichardson commented on pull request #9610: URL: https://github.com/apache/arrow/pull/9610#issuecomment-791817416 Sorry, I just merged the dataset mutate() PR which added a cpp function. Can you please rebase and update the annotation on that function? Otherwise this is looking good,

[GitHub] [arrow] nealrichardson commented on pull request #9634: ARROW-11864: [R] Document arrow.int64_downcast option

2021-03-05 Thread GitBox
nealrichardson commented on pull request #9634: URL: https://github.com/apache/arrow/pull/9634#issuecomment-791813964 @msummersgill did you want to add mention of this option in any other parts of the docs? This is an

[GitHub] [arrow] nealrichardson commented on a change in pull request #9579: ARROW-11774: [R] macos one line install

2021-03-05 Thread GitBox
nealrichardson commented on a change in pull request #9579: URL: https://github.com/apache/arrow/pull/9579#discussion_r588794433 ## File path: r/configure ## @@ -130,22 +130,43 @@ else if [ "${LIBARROW_MINIMAL}" = "" ] && [ "${NOT_CRAN}" = "true" ]; then

[GitHub] [arrow] github-actions[bot] commented on pull request #9579: ARROW-11774: [R] macos one line install

2021-03-05 Thread GitBox
github-actions[bot] commented on pull request #9579: URL: https://github.com/apache/arrow/pull/9579#issuecomment-791794711 Revision: 71fbe74c3aaa06c68aee82cb2c8f1b8b1939afea Submitted crossbow builds: [ursacomputing/crossbow @

[GitHub] [arrow] jonkeane commented on pull request #9579: ARROW-11774: [R] macos one line install

2021-03-05 Thread GitBox
jonkeane commented on pull request #9579: URL: https://github.com/apache/arrow/pull/9579#issuecomment-791794482 @github-actions crossbow submit test-r-install-macos This is an automated message from the Apache Git Service.

[GitHub] [arrow] ianmcook commented on a change in pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-05 Thread GitBox
ianmcook commented on a change in pull request #9610: URL: https://github.com/apache/arrow/pull/9610#discussion_r588782921 ## File path: dev/tasks/conda-recipes/r-arrow/configure.win ## @@ -2,7 +2,7 @@ set -euxo pipefail -echo "PKG_CPPFLAGS=-DNDEBUG

[GitHub] [arrow] ianmcook commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-05 Thread GitBox
ianmcook commented on pull request #9610: URL: https://github.com/apache/arrow/pull/9610#issuecomment-791783688 > 1. configure.win should determine Arrow C++ library capabilities dynamically (like we do in configure) ARROW-11884 > 2. `LIBARROW_MINIMAL=true` should entail that

[GitHub] [arrow] github-actions[bot] commented on pull request #9579: ARROW-11774: [R] macos one line install

2021-03-05 Thread GitBox
github-actions[bot] commented on pull request #9579: URL: https://github.com/apache/arrow/pull/9579#issuecomment-791773084 Revision: 90ddad904393bb50dc6f6f823b75dc9bf906aea4 Submitted crossbow builds: [ursacomputing/crossbow @

[GitHub] [arrow] jonkeane commented on pull request #9579: ARROW-11774: [R] macos one line install

2021-03-05 Thread GitBox
jonkeane commented on pull request #9579: URL: https://github.com/apache/arrow/pull/9579#issuecomment-791772862 @github-actions crossbow submit test-r-install-macos This is an automated message from the Apache Git Service.

[GitHub] [arrow] github-actions[bot] commented on pull request #9643: ARROW-11883: [C++] Add ConcatMap, MergeMap, and an async-reentrant version of Map

2021-03-05 Thread GitBox
github-actions[bot] commented on pull request #9643: URL: https://github.com/apache/arrow/pull/9643#issuecomment-791770741 https://issues.apache.org/jira/browse/ARROW-11883 This is an automated message from the Apache Git

[GitHub] [arrow] westonpace opened a new pull request #9643: ARROW-11883: [C++] Add ConcatMap, MergeMap, and an async-reentrant version of Map

2021-03-05 Thread GitBox
westonpace opened a new pull request #9643: URL: https://github.com/apache/arrow/pull/9643 These items can all stand on their own and they are used by the async datasets conversion. MergeMap - Given AsyncGenerator> return AsyncGenerator. This method flattens a generator of

[GitHub] [arrow] nealrichardson closed pull request #9586: ARROW-11704: [R] Wire up dplyr::mutate() for datasets

2021-03-05 Thread GitBox
nealrichardson closed pull request #9586: URL: https://github.com/apache/arrow/pull/9586 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow] jonkeane commented on pull request #9579: ARROW-11774: [R] macos one line install

2021-03-05 Thread GitBox
jonkeane commented on pull request #9579: URL: https://github.com/apache/arrow/pull/9579#issuecomment-791754118 @github-actions crossbow submit test-r-install-macos This is an automated message from the Apache Git Service.

[GitHub] [arrow] alamb commented on pull request #9600: ARROW-11822: [Rust][Datafusion] Support case sensitive for function

2021-03-05 Thread GitBox
alamb commented on pull request #9600: URL: https://github.com/apache/arrow/pull/9600#issuecomment-791751613 I have some thoughts on this PR -- I plan to write them up coherently tomorrow This is an automated message from

[GitHub] [arrow] nealrichardson commented on a change in pull request #9641: Arrow-11507: [R] Bindings for GetRuntimeInfo

2021-03-05 Thread GitBox
nealrichardson commented on a change in pull request #9641: URL: https://github.com/apache/arrow/pull/9641#discussion_r588759057 ## File path: r/R/runtime-info.R ## @@ -0,0 +1,36 @@ +# Licensed to the Apache Software Foundation (ASF) under one Review comment: You can

[GitHub] [arrow] alamb commented on pull request #9639: ARROW-11879 [Rust][DataFusion] Make ExecutionContext::sql return dataframe with optimized plan

2021-03-05 Thread GitBox
alamb commented on pull request #9639: URL: https://github.com/apache/arrow/pull/9639#issuecomment-791749787 > Someone still might want to add some filter / aggregate on the dataframe, so maybe it makes sense the optimization pass only works on collect? Ideally in my mind we would

[GitHub] [arrow] nealrichardson commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-05 Thread GitBox
nealrichardson commented on pull request #9610: URL: https://github.com/apache/arrow/pull/9610#issuecomment-791749746 Good catch about the Windows builds. Could you make followup JIRAs for: 1. configure.win should determine Arrow C++ library capabilities dynamically (like we do in

[GitHub] [arrow] nealrichardson commented on a change in pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-05 Thread GitBox
nealrichardson commented on a change in pull request #9610: URL: https://github.com/apache/arrow/pull/9610#discussion_r588753808 ## File path: dev/tasks/conda-recipes/r-arrow/configure.win ## @@ -2,7 +2,7 @@ set -euxo pipefail -echo "PKG_CPPFLAGS=-DNDEBUG

[GitHub] [arrow] Dandandan commented on pull request #9639: ARROW-11879 [Rust][DataFusion] Make ExecutionContext::sql return dataframe with optimized plan

2021-03-05 Thread GitBox
Dandandan commented on pull request #9639: URL: https://github.com/apache/arrow/pull/9639#issuecomment-791743351 keeping as a draft for now, I think it's more open for discussion maybe what to do here. Do we want the dataframe from `ExecutionContext::sql` to return an optimized

[GitHub] [arrow] Dandandan closed pull request #9639: ARROW-11879 [Rust][DataFusion] Make ExecutionContext::sql return dataframe with optimized plan

2021-03-05 Thread GitBox
Dandandan closed pull request #9639: URL: https://github.com/apache/arrow/pull/9639 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] Dandandan commented on pull request #9639: ARROW-11879 [Rust][DataFusion] Make ExecutionContext::sql return dataframe with optimized plan

2021-03-05 Thread GitBox
Dandandan commented on pull request #9639: URL: https://github.com/apache/arrow/pull/9639#issuecomment-791742280 hm it seems it's slightly more complicated * `DataFrame::collect` currently also runs `optimize` (makes sense, as this is a kind of a last "build" function) * But not

[GitHub] [arrow] codecov-io commented on pull request #9642: ARROW-11881: [Rust][DataFusion] Fix clippy lint

2021-03-05 Thread GitBox
codecov-io commented on pull request #9642: URL: https://github.com/apache/arrow/pull/9642#issuecomment-791739813 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9642?src=pr=h1) Report > Merging [#9642](https://codecov.io/gh/apache/arrow/pull/9642?src=pr=desc) (c9bb438) into

[GitHub] [arrow] seddonm1 commented on pull request #9428: ARROW-10354: [Rust][DataFusion] regexp_extract function to select regex groups from strings

2021-03-05 Thread GitBox
seddonm1 commented on pull request #9428: URL: https://github.com/apache/arrow/pull/9428#issuecomment-791739578 @sweb I can help on Monday. I'm planning to raise the PR for those other regexp functions then can help work through this?

[GitHub] [arrow] kou commented on a change in pull request #9637: ARROW-11870: [Dev] Automatically run merge script in virtual environment

2021-03-05 Thread GitBox
kou commented on a change in pull request #9637: URL: https://github.com/apache/arrow/pull/9637#discussion_r588738321 ## File path: dev/merge_arrow_pr.sh ## @@ -0,0 +1,56 @@ +#!/bin/sh + +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor

[GitHub] [arrow] alamb closed pull request #9624: ARROW-11845: [Rust] Implement to_isize() for ArrowNativeTypes

2021-03-05 Thread GitBox
alamb closed pull request #9624: URL: https://github.com/apache/arrow/pull/9624 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] alamb commented on pull request #9624: ARROW-11845: [Rust] Implement to_isize() for ArrowNativeTypes

2021-03-05 Thread GitBox
alamb commented on pull request #9624: URL: https://github.com/apache/arrow/pull/9624#issuecomment-791735382 > fwiw, I think that there is a different way of approaching this. Thank you @jorgecarleitao -- I like that proposal a lot and I have filed it as

[GitHub] [arrow] github-actions[bot] commented on pull request #9579: ARROW-11774: [R] macos one line install

2021-03-05 Thread GitBox
github-actions[bot] commented on pull request #9579: URL: https://github.com/apache/arrow/pull/9579#issuecomment-791733889 Revision: afcd4535bb7b14d54c7c3f39281f6bd0ae3ac859 Submitted crossbow builds: [ursacomputing/crossbow @

[GitHub] [arrow] jonkeane commented on pull request #9579: ARROW-11774: [R] macos one line install

2021-03-05 Thread GitBox
jonkeane commented on pull request #9579: URL: https://github.com/apache/arrow/pull/9579#issuecomment-791733341 @github-actions crossbow submit test-r-install-macos This is an automated message from the Apache Git Service.

[GitHub] [arrow] alamb closed pull request #9625: ARROW-11653: [Rust][DataFusion] Postgres String Functions: ascii, chr, initcap, repeat, reverse, to_hex

2021-03-05 Thread GitBox
alamb closed pull request #9625: URL: https://github.com/apache/arrow/pull/9625 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] github-actions[bot] commented on pull request #9642: ARROW-11881: [Rust][DataFusion] Fix clippy lint

2021-03-05 Thread GitBox
github-actions[bot] commented on pull request #9642: URL: https://github.com/apache/arrow/pull/9642#issuecomment-791727188 https://issues.apache.org/jira/browse/ARROW-11881 This is an automated message from the Apache Git

[GitHub] [arrow] alamb opened a new pull request #9642: ARROW-11881: [Rust][DataFusion] Fix clippy lint

2021-03-05 Thread GitBox
alamb opened a new pull request #9642: URL: https://github.com/apache/arrow/pull/9642 ARROW-11881: [Rust][DataFusion] Fix clippy lint A linter error has appeared on master somehow: ``` error: unnecessary parentheses around `for` iterator expression -->

[GitHub] [arrow] ianmcook commented on a change in pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-05 Thread GitBox
ianmcook commented on a change in pull request #9610: URL: https://github.com/apache/arrow/pull/9610#discussion_r588715419 ## File path: dev/tasks/conda-recipes/r-arrow/configure.win ## @@ -2,7 +2,7 @@ set -euxo pipefail -echo "PKG_CPPFLAGS=-DNDEBUG

[GitHub] [arrow] alamb commented on pull request #9612: ARROW-11824: [Rust] [Parquet] Use logical types in Arrow schema conversion

2021-03-05 Thread GitBox
alamb commented on pull request #9612: URL: https://github.com/apache/arrow/pull/9612#issuecomment-791725587 Sorry I did not mean to close this PR This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] alamb commented on pull request #9612: ARROW-11824: [Rust] [Parquet] Use logical types in Arrow schema conversion

2021-03-05 Thread GitBox
alamb commented on pull request #9612: URL: https://github.com/apache/arrow/pull/9612#issuecomment-791724823 The clippy error seems unrelated to this PR: ``` error: unnecessary parentheses around `for` iterator expression --> datafusion/src/physical_plan/merge.rs:124:31

[GitHub] [arrow] alamb closed pull request #9612: ARROW-11824: [Rust] [Parquet] Use logical types in Arrow schema conversion

2021-03-05 Thread GitBox
alamb closed pull request #9612: URL: https://github.com/apache/arrow/pull/9612 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] github-actions[bot] commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-05 Thread GitBox
github-actions[bot] commented on pull request #9610: URL: https://github.com/apache/arrow/pull/9610#issuecomment-791722491 Revision: e46ce97d204950e7f9088abd710c0f009522a816 Submitted crossbow builds: [ursacomputing/crossbow @

[GitHub] [arrow] ianmcook commented on pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-05 Thread GitBox
ianmcook commented on pull request #9610: URL: https://github.com/apache/arrow/pull/9610#issuecomment-791722178 @github-actions crossbow submit -g r This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] ianmcook commented on a change in pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-05 Thread GitBox
ianmcook commented on a change in pull request #9610: URL: https://github.com/apache/arrow/pull/9610#discussion_r588715419 ## File path: dev/tasks/conda-recipes/r-arrow/configure.win ## @@ -2,7 +2,7 @@ set -euxo pipefail -echo "PKG_CPPFLAGS=-DNDEBUG

[GitHub] [arrow] github-actions[bot] commented on pull request #9641: Arrow-11507: [R] Bindings for GetRuntimeInfo

2021-03-05 Thread GitBox
github-actions[bot] commented on pull request #9641: URL: https://github.com/apache/arrow/pull/9641#issuecomment-791719281 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then

[GitHub] [arrow] pachamaltese opened a new pull request #9641: Arrow-11507: [R] Bindings for GetRuntimeInfo

2021-03-05 Thread GitBox
pachamaltese opened a new pull request #9641: URL: https://github.com/apache/arrow/pull/9641 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [arrow] ianmcook commented on a change in pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-05 Thread GitBox
ianmcook commented on a change in pull request #9610: URL: https://github.com/apache/arrow/pull/9610#discussion_r588696628 ## File path: dev/tasks/tasks.yml ## @@ -1789,6 +1789,18 @@ tasks: r_image: r-base r_tag: 3.6-opensuse42 not_cran: "TRUE" + +

[GitHub] [arrow] ianmcook commented on a change in pull request #9610: ARROW-11735: [R] Allow Parquet and Arrow Dataset to be optional components

2021-03-05 Thread GitBox
ianmcook commented on a change in pull request #9610: URL: https://github.com/apache/arrow/pull/9610#discussion_r588692513 ## File path: dev/tasks/tasks.yml ## @@ -1789,6 +1789,18 @@ tasks: r_image: r-base r_tag: 3.6-opensuse42 not_cran: "TRUE" + +

[GitHub] [arrow] github-actions[bot] commented on pull request #9579: ARROW-11774: [R] macos one line install

2021-03-05 Thread GitBox
github-actions[bot] commented on pull request #9579: URL: https://github.com/apache/arrow/pull/9579#issuecomment-791671603 Revision: b0e5391d68bedc580cfbe56ea5446ea942e62e98 Submitted crossbow builds: [ursacomputing/crossbow @

[GitHub] [arrow] jonkeane commented on pull request #9579: ARROW-11774: [R] macos one line install

2021-03-05 Thread GitBox
jonkeane commented on pull request #9579: URL: https://github.com/apache/arrow/pull/9579#issuecomment-791670805 @github-actions crossbow submit test-r-install-macos This is an automated message from the Apache Git Service.

[GitHub] [arrow] github-actions[bot] commented on pull request #9640: ARROW-11872 [C++]: Fix Array validation when Array contains non-CPU buffers

2021-03-05 Thread GitBox
github-actions[bot] commented on pull request #9640: URL: https://github.com/apache/arrow/pull/9640#issuecomment-791669390 https://issues.apache.org/jira/browse/ARROW-11872 This is an automated message from the Apache Git

[GitHub] [arrow] trxcllnt opened a new pull request #9640: ARROW-11872 [C++]: Fix Array validation when Array contains non-CPU buffers

2021-03-05 Thread GitBox
trxcllnt opened a new pull request #9640: URL: https://github.com/apache/arrow/pull/9640 Constructing an Arrow Table from Arrays that contains `CudaBuffer` presently fails. `ValidateArrayImpl::IsBufferValid` is checking the buffers of each Array, but when an Array's buffers aren't

[GitHub] [arrow] westonpace commented on a change in pull request #9528: ARROW-8732: [C++] Add basic cancellation API

2021-03-05 Thread GitBox
westonpace commented on a change in pull request #9528: URL: https://github.com/apache/arrow/pull/9528#discussion_r588547359 ## File path: cpp/src/arrow/csv/reader.cc ## @@ -934,22 +946,34 @@ class AsyncThreadedTableReader Result> MakeTableReader( MemoryPool* pool,

[GitHub] [arrow] ianmcook commented on a change in pull request #9586: ARROW-11704: [R] Wire up dplyr::mutate() for datasets

2021-03-05 Thread GitBox
ianmcook commented on a change in pull request #9586: URL: https://github.com/apache/arrow/pull/9586#discussion_r588638823 ## File path: r/R/dplyr.R ## @@ -539,6 +537,13 @@ mutate.arrow_dplyr_query <- function(.data, if (inherits(results[[new_var]], "try-error")) {

[GitHub] [arrow] ianmcook commented on a change in pull request #9586: ARROW-11704: [R] Wire up dplyr::mutate() for datasets

2021-03-05 Thread GitBox
ianmcook commented on a change in pull request #9586: URL: https://github.com/apache/arrow/pull/9586#discussion_r588638823 ## File path: r/R/dplyr.R ## @@ -539,6 +537,13 @@ mutate.arrow_dplyr_query <- function(.data, if (inherits(results[[new_var]], "try-error")) {

[GitHub] [arrow] Dandandan commented on a change in pull request #9639: ARROW-11879 [Rust][DataFusion] Make ExecutionContext::sql return dataframe with optimized plan

2021-03-05 Thread GitBox
Dandandan commented on a change in pull request #9639: URL: https://github.com/apache/arrow/pull/9639#discussion_r588625713 ## File path: rust/datafusion/src/execution/context.rs ## @@ -1702,6 +1702,23 @@ mod tests { } Ok(()) } +#[test] +fn

[GitHub] [arrow] ianmcook commented on a change in pull request #9586: ARROW-11704: [R] Wire up dplyr::mutate() for datasets

2021-03-05 Thread GitBox
ianmcook commented on a change in pull request #9586: URL: https://github.com/apache/arrow/pull/9586#discussion_r588630280 ## File path: r/R/dplyr.R ## @@ -539,6 +537,13 @@ mutate.arrow_dplyr_query <- function(.data, if (inherits(results[[new_var]], "try-error")) {

[GitHub] [arrow] Dandandan commented on a change in pull request #9639: ARROW-11879 [Rust][DataFusion] Make ExecutionContext::sql return dataframe with optimized plan

2021-03-05 Thread GitBox
Dandandan commented on a change in pull request #9639: URL: https://github.com/apache/arrow/pull/9639#discussion_r588625713 ## File path: rust/datafusion/src/execution/context.rs ## @@ -1702,6 +1702,23 @@ mod tests { } Ok(()) } +#[test] +fn

[GitHub] [arrow] nealrichardson commented on a change in pull request #9586: ARROW-11704: [R] Wire up dplyr::mutate() for datasets

2021-03-05 Thread GitBox
nealrichardson commented on a change in pull request #9586: URL: https://github.com/apache/arrow/pull/9586#discussion_r588628931 ## File path: r/R/dplyr.R ## @@ -539,6 +537,13 @@ mutate.arrow_dplyr_query <- function(.data, if (inherits(results[[new_var]], "try-error")) {

[GitHub] [arrow] Dandandan commented on a change in pull request #9639: ARROW-11879 [Rust][DataFusion] Make ExecutionContext::sql return dataframe with optimized plan

2021-03-05 Thread GitBox
Dandandan commented on a change in pull request #9639: URL: https://github.com/apache/arrow/pull/9639#discussion_r588625713 ## File path: rust/datafusion/src/execution/context.rs ## @@ -1702,6 +1702,23 @@ mod tests { } Ok(()) } +#[test] +fn

[GitHub] [arrow] ianmcook commented on a change in pull request #9586: ARROW-11704: [R] Wire up dplyr::mutate() for datasets

2021-03-05 Thread GitBox
ianmcook commented on a change in pull request #9586: URL: https://github.com/apache/arrow/pull/9586#discussion_r588623822 ## File path: r/R/dplyr.R ## @@ -539,6 +537,13 @@ mutate.arrow_dplyr_query <- function(.data, if (inherits(results[[new_var]], "try-error")) {

[GitHub] [arrow] ianmcook commented on a change in pull request #9586: ARROW-11704: [R] Wire up dplyr::mutate() for datasets

2021-03-05 Thread GitBox
ianmcook commented on a change in pull request #9586: URL: https://github.com/apache/arrow/pull/9586#discussion_r588621424 ## File path: r/R/dplyr.R ## @@ -539,6 +537,13 @@ mutate.arrow_dplyr_query <- function(.data, if (inherits(results[[new_var]], "try-error")) {

[GitHub] [arrow] github-actions[bot] commented on pull request #9639: ARROW-11879 [Rust][DataFusion] ExecutionContext::sql should optimize plan

2021-03-05 Thread GitBox
github-actions[bot] commented on pull request #9639: URL: https://github.com/apache/arrow/pull/9639#issuecomment-791633831 https://issues.apache.org/jira/browse/ARROW-11879 This is an automated message from the Apache Git

[GitHub] [arrow] Dandandan opened a new pull request #9639: ARROW-11879 [Rust][DataFusion] ExecutionContext::sql should optimize plan

2021-03-05 Thread GitBox
Dandandan opened a new pull request #9639: URL: https://github.com/apache/arrow/pull/9639 I believe we should expect `ExecutionContext::sql` to return an optimized logical plan (with current applying config) rather than a DataFrame with an unoptimized plan.

[GitHub] [arrow] nealrichardson commented on a change in pull request #9586: ARROW-11704: [R] Wire up dplyr::mutate() for datasets

2021-03-05 Thread GitBox
nealrichardson commented on a change in pull request #9586: URL: https://github.com/apache/arrow/pull/9586#discussion_r588607812 ## File path: r/R/dplyr.R ## @@ -539,6 +537,13 @@ mutate.arrow_dplyr_query <- function(.data, if (inherits(results[[new_var]], "try-error")) {

[GitHub] [arrow] ianmcook commented on a change in pull request #9586: ARROW-11704: [R] Wire up dplyr::mutate() for datasets

2021-03-05 Thread GitBox
ianmcook commented on a change in pull request #9586: URL: https://github.com/apache/arrow/pull/9586#discussion_r588604176 ## File path: r/R/dplyr.R ## @@ -539,6 +537,13 @@ mutate.arrow_dplyr_query <- function(.data, if (inherits(results[[new_var]], "try-error")) {

[GitHub] [arrow] nealrichardson commented on a change in pull request #9586: ARROW-11704: [R] Wire up dplyr::mutate() for datasets

2021-03-05 Thread GitBox
nealrichardson commented on a change in pull request #9586: URL: https://github.com/apache/arrow/pull/9586#discussion_r588604008 ## File path: r/R/dplyr.R ## @@ -539,6 +537,13 @@ mutate.arrow_dplyr_query <- function(.data, if (inherits(results[[new_var]], "try-error")) {

[GitHub] [arrow] nealrichardson commented on pull request #9586: ARROW-11704: [R] Wire up dplyr::mutate() for datasets

2021-03-05 Thread GitBox
nealrichardson commented on pull request #9586: URL: https://github.com/apache/arrow/pull/9586#issuecomment-791624414 > So now we have scalar recycling in `mutate()` for Datasets but not yet for Tables and RecordBatches, correct? Correct, that's ARROW-11705

[GitHub] [arrow] ianmcook commented on pull request #9586: ARROW-11704: [R] Wire up dplyr::mutate() for datasets

2021-03-05 Thread GitBox
ianmcook commented on pull request #9586: URL: https://github.com/apache/arrow/pull/9586#issuecomment-791623959 So now we have scalar recycling in `mutate()` for Datasets but not yet for Tables and RecordBatches, correct?

[GitHub] [arrow] ianmcook commented on a change in pull request #9586: ARROW-11704: [R] Wire up dplyr::mutate() for datasets

2021-03-05 Thread GitBox
ianmcook commented on a change in pull request #9586: URL: https://github.com/apache/arrow/pull/9586#discussion_r588596519 ## File path: r/R/dplyr.R ## @@ -539,6 +537,13 @@ mutate.arrow_dplyr_query <- function(.data, if (inherits(results[[new_var]], "try-error")) {

[GitHub] [arrow] ianmcook edited a comment on pull request #9586: ARROW-11704: [R] Wire up dplyr::mutate() for datasets

2021-03-05 Thread GitBox
ianmcook edited a comment on pull request #9586: URL: https://github.com/apache/arrow/pull/9586#issuecomment-791618188 `transmute()` with no arguments should return no columns. Currently it returns _all_ columns. This is true for Tables and RecordBatches too.

[GitHub] [arrow] ianmcook commented on pull request #9586: ARROW-11704: [R] Wire up dplyr::mutate() for datasets

2021-03-05 Thread GitBox
ianmcook commented on pull request #9586: URL: https://github.com/apache/arrow/pull/9586#issuecomment-791618188 I noticed that `transmute()` with no arguments should return no columns. Currently it returns _all_ columns. This is true for Tables and RecordBatches too.

[GitHub] [arrow] github-actions[bot] commented on pull request #9638: ARROW-11877: [C++] Add microbenchmark for SimplifyWithGuarantee

2021-03-05 Thread GitBox
github-actions[bot] commented on pull request #9638: URL: https://github.com/apache/arrow/pull/9638#issuecomment-791618041 https://issues.apache.org/jira/browse/ARROW-11877 This is an automated message from the Apache Git

[GitHub] [arrow] lidavidm commented on pull request #9638: ARROW-11877: [C++] Add microbenchmark for SimplifyWithGuarantee

2021-03-05 Thread GitBox
lidavidm commented on pull request #9638: URL: https://github.com/apache/arrow/pull/9638#issuecomment-791617925 Results ``` -- Benchmark

[GitHub] [arrow] lidavidm opened a new pull request #9638: ARROW-11877: [C++] Add microbenchmark for SimplifyWithGuarantee

2021-03-05 Thread GitBox
lidavidm opened a new pull request #9638: URL: https://github.com/apache/arrow/pull/9638 This adds a microbenchmark for SimplifyWithGuarantee which, especially for a large dataset, can contribute a significant amount of time to reading a dataset, as it's used to evaluate partition

[GitHub] [arrow] ianmcook commented on a change in pull request #9586: ARROW-11704: [R] Wire up dplyr::mutate() for datasets

2021-03-05 Thread GitBox
ianmcook commented on a change in pull request #9586: URL: https://github.com/apache/arrow/pull/9586#discussion_r588587073 ## File path: r/R/dataset-scan.R ## @@ -157,13 +157,19 @@ ScannerBuilder <- R6Class("ScannerBuilder", inherit = ArrowObject, public = list(

[GitHub] [arrow] nealrichardson commented on a change in pull request #9586: ARROW-11704: [R] Wire up dplyr::mutate() for datasets

2021-03-05 Thread GitBox
nealrichardson commented on a change in pull request #9586: URL: https://github.com/apache/arrow/pull/9586#discussion_r588578476 ## File path: r/R/dataset-scan.R ## @@ -157,13 +157,19 @@ ScannerBuilder <- R6Class("ScannerBuilder", inherit = ArrowObject, public = list(

[GitHub] [arrow] github-actions[bot] commented on pull request #9579: ARROW-11774: [R] macos one line install

2021-03-05 Thread GitBox
github-actions[bot] commented on pull request #9579: URL: https://github.com/apache/arrow/pull/9579#issuecomment-791610757 Revision: c1557d4c6e6daf9f6d4f5913775db43fee3c0c93 Submitted crossbow builds: [ursacomputing/crossbow @

[GitHub] [arrow] ianmcook commented on a change in pull request #9586: ARROW-11704: [R] Wire up dplyr::mutate() for datasets

2021-03-05 Thread GitBox
ianmcook commented on a change in pull request #9586: URL: https://github.com/apache/arrow/pull/9586#discussion_r588576224 ## File path: r/R/dataset-scan.R ## @@ -157,13 +157,19 @@ ScannerBuilder <- R6Class("ScannerBuilder", inherit = ArrowObject, public = list(

[GitHub] [arrow] jonkeane commented on pull request #9579: ARROW-11774: [R] macos one line install

2021-03-05 Thread GitBox
jonkeane commented on pull request #9579: URL: https://github.com/apache/arrow/pull/9579#issuecomment-791607897 @github-actions crossbow submit test-r-install-macos This is an automated message from the Apache Git Service.

[GitHub] [arrow] nealrichardson commented on pull request #9586: ARROW-11704: [R] Wire up dplyr::mutate() for datasets

2021-03-05 Thread GitBox
nealrichardson commented on pull request #9586: URL: https://github.com/apache/arrow/pull/9586#issuecomment-791602869 Thanks @ianmcook, fixed in https://github.com/apache/arrow/pull/9586/commits/3ac4086cd651411bb17409e726b7261c13347bf3, PTAL.

[GitHub] [arrow] ianmcook commented on pull request #9586: ARROW-11704: [R] Wire up dplyr::mutate() for datasets

2021-03-05 Thread GitBox
ianmcook commented on pull request #9586: URL: https://github.com/apache/arrow/pull/9586#issuecomment-791567699 Looks like `mutate()` on datasets errors when expressions are literals: ```r ds %>% transmute(x=42) %>% head(1) ## Error in dataset___ScannerBuilder__ProjectExprs(self,

[GitHub] [arrow] pitrou commented on pull request #9637: ARROW-11870: [Dev] Automatically run merge script in virtual environment

2021-03-05 Thread GitBox
pitrou commented on pull request #9637: URL: https://github.com/apache/arrow/pull/9637#issuecomment-791557710 @jorgecarleitao Could you try this out? It should be usable instead of the Python script (same CLI syntax). Perhaps someone can also contribute a Windows (.bat) version.

[GitHub] [arrow] github-actions[bot] commented on pull request #9637: ARROW-11870: [Dev] Automatically run merge script in virtual environment

2021-03-05 Thread GitBox
github-actions[bot] commented on pull request #9637: URL: https://github.com/apache/arrow/pull/9637#issuecomment-791557328 https://issues.apache.org/jira/browse/ARROW-11870 This is an automated message from the Apache Git

[GitHub] [arrow] pitrou opened a new pull request #9637: ARROW-11870: [Dev] Automatically run merge script in virtual environment

2021-03-05 Thread GitBox
pitrou opened a new pull request #9637: URL: https://github.com/apache/arrow/pull/9637 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] sweb commented on a change in pull request #9428: ARROW-10354: [Rust][DataFusion] regexp_extract function to select regex groups from strings

2021-03-05 Thread GitBox
sweb commented on a change in pull request #9428: URL: https://github.com/apache/arrow/pull/9428#discussion_r588468947 ## File path: rust/arrow/src/compute/kernels/regexp.rs ## @@ -0,0 +1,117 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

  1   2   >