[GitHub] [arrow-datafusion] jon-chuang commented on issue #2361: Validate subquery expressions

2022-05-06 Thread GitBox
jon-chuang commented on issue #2361: URL: https://github.com/apache/arrow-datafusion/issues/2361#issuecomment-1120147996 @andygrove is there any reason why we can't support multiple columns in the IN subquery? As far as implementation goes, it's fairly trivial. -- This is an automated me

[GitHub] [arrow] ursabot commented on pull request #13033: ARROW-16413: [Python] Certain dataset APIs hang with a python filesystem

2022-05-06 Thread GitBox
ursabot commented on PR #13033: URL: https://github.com/apache/arrow/pull/13033#issuecomment-1120143655 ['Python', 'R'] benchmarks have high level of regressions. [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/40e29d083fe44db58242b29c69a1a836...0222e551cf1f4ca39de2783eb4a80be2/)

[GitHub] [arrow] ursabot commented on pull request #13033: ARROW-16413: [Python] Certain dataset APIs hang with a python filesystem

2022-05-06 Thread GitBox
ursabot commented on PR #13033: URL: https://github.com/apache/arrow/pull/13033#issuecomment-1120143631 Benchmark runs are scheduled for baseline = 26f2d877b5d08d2a9c3c7289efcd4a56ba068e26 and contender = d8977165d610d3b828eea0923d733cc5a1cf2c4e. d8977165d610d3b828eea0923d733cc5a1cf2c4e is

[GitHub] [arrow-datafusion] WinkerDu commented on pull request #2364: Add proper support for `null` literal by introducing `ScalarValue::Null`

2022-05-06 Thread GitBox
WinkerDu commented on PR #2364: URL: https://github.com/apache/arrow-datafusion/pull/2364#issuecomment-1120132022 Thank you all @alamb @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] ursabot commented on pull request #13053: MINOR: [Java][Flight] Add context of descriptor on integration tests when not found

2022-05-06 Thread GitBox
ursabot commented on PR #13053: URL: https://github.com/apache/arrow/pull/13053#issuecomment-1120123844 Benchmark runs are scheduled for baseline = 3c3e68c194ca6ac07086ddc1bb44fe153970213e and contender = 26f2d877b5d08d2a9c3c7289efcd4a56ba068e26. 26f2d877b5d08d2a9c3c7289efcd4a56ba068e26 is

[GitHub] [arrow-datafusion] timvw commented on issue #2445: ObjectStore Directory Semantics

2022-05-06 Thread GitBox
timvw commented on issue #2445: URL: https://github.com/apache/arrow-datafusion/issues/2445#issuecomment-1120114668 The most flexible signature of that method would be: /// Calls `list_file` with a suffix filter async fn list_file_with_suffix( &self, prefix: &str,

[GitHub] [arrow-datafusion] timvw commented on issue #2445: ObjectStore Directory Semantics

2022-05-06 Thread GitBox
timvw commented on issue #2445: URL: https://github.com/apache/arrow-datafusion/issues/2445#issuecomment-1120114181 In summary, I agree with the ObjectStore semantics being sufficient. I also do want to point out that globbing is nothing more than making the suffix filter more powerf

[GitHub] [arrow] niyue commented on pull request #11588: ARROW-14548: [C++] Add madvise random support for memory mapped file

2022-05-06 Thread GitBox
niyue commented on PR #11588: URL: https://github.com/apache/arrow/pull/11588#issuecomment-1120107357 I happened to revisit this PR a week ago. > So I think the work I'm doing on [ARROW-14577](https://issues.apache.org/jira/browse/ARROW-14577) / https://github.com/apache/arrow/pull/11616

[GitHub] [arrow-rs] viirya commented on a diff in pull request #1664: Replace `fn is_large` with `const IS_LARGE`

2022-05-06 Thread GitBox
viirya commented on code in PR #1664: URL: https://github.com/apache/arrow-rs/pull/1664#discussion_r867286279 ## arrow/src/array/array_binary.rs: ## @@ -44,8 +44,11 @@ pub struct GenericBinaryArray { } impl GenericBinaryArray { +/// Get the data type of the array. +/

[GitHub] [arrow-rs] viirya commented on a diff in pull request #1664: Replace `fn is_large` with `const IS_LARGE`

2022-05-06 Thread GitBox
viirya commented on code in PR #1664: URL: https://github.com/apache/arrow-rs/pull/1664#discussion_r867286149 ## arrow/src/array/array_binary.rs: ## @@ -44,8 +44,11 @@ pub struct GenericBinaryArray { } impl GenericBinaryArray { +/// Get the data type of the array. +/

[GitHub] [arrow] niyue commented on a diff in pull request #13041: ARROW-16430: [Python] Add support for reading record batch custom metadata API

2022-05-06 Thread GitBox
niyue commented on code in PR #13041: URL: https://github.com/apache/arrow/pull/13041#discussion_r867284969 ## python/pyarrow/includes/libarrow.pxd: ## @@ -794,6 +794,10 @@ cdef extern from "arrow/api.h" namespace "arrow" nogil: shared_ptr[CRecordBatch] Slice(int64_t of

[GitHub] [arrow] niyue commented on a diff in pull request #13041: ARROW-16430: [Python] Add support for reading record batch custom metadata API

2022-05-06 Thread GitBox
niyue commented on code in PR #13041: URL: https://github.com/apache/arrow/pull/13041#discussion_r867283915 ## cpp/src/arrow/ipc/writer.h: ## @@ -104,8 +104,7 @@ class ARROW_EXPORT RecordBatchWriter { virtual Status WriteRecordBatch( const RecordBatch& batch, co

[GitHub] [arrow] ursabot commented on pull request #13050: ARROW-16434: [R][CI] Revert devdocs to setup-r@v1 for now

2022-05-06 Thread GitBox
ursabot commented on PR #13050: URL: https://github.com/apache/arrow/pull/13050#issuecomment-1120094604 ['Python', 'R'] benchmarks have high level of regressions. [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/a2b3665f960e461e9cbf366e9516840c...461595eec5d04497a9e7b776fd0b793d/)

[GitHub] [arrow] ursabot commented on pull request #13050: ARROW-16434: [R][CI] Revert devdocs to setup-r@v1 for now

2022-05-06 Thread GitBox
ursabot commented on PR #13050: URL: https://github.com/apache/arrow/pull/13050#issuecomment-1120094569 Benchmark runs are scheduled for baseline = 2025673f0c49b8b8ccd41d5494b29653b57366e4 and contender = 3c3e68c194ca6ac07086ddc1bb44fe153970213e. 3c3e68c194ca6ac07086ddc1bb44fe153970213e is

[GitHub] [arrow-rs] HaoYang670 commented on a diff in pull request #1664: Replace `fn is_large` with `const IS_LARGE`

2022-05-06 Thread GitBox
HaoYang670 commented on code in PR #1664: URL: https://github.com/apache/arrow-rs/pull/1664#discussion_r867279175 ## arrow/src/array/array_binary.rs: ## @@ -44,8 +44,11 @@ pub struct GenericBinaryArray { } impl GenericBinaryArray { +/// Get the data type of the array. +

[GitHub] [arrow] westonpace commented on pull request #13091: ARROW-16498: [C++] Fix potential deadlock in arrow::compute::TaskScheduler

2022-05-06 Thread GitBox
westonpace commented on PR #13091: URL: https://github.com/apache/arrow/pull/13091#issuecomment-1120093156 CC @michalursa `TaskScheduler, StressTwo` should reproduce this but it can still be a bit of a pain on a fast system. Added a print statement between: ``` const auto& t

[GitHub] [arrow] github-actions[bot] commented on pull request #13091: ARROW-16498: [C++] Fix potential deadlock in arrow::compute::TaskScheduler

2022-05-06 Thread GitBox
github-actions[bot] commented on PR #13091: URL: https://github.com/apache/arrow/pull/13091#issuecomment-1120092486 https://issues.apache.org/jira/browse/ARROW-16498 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow] github-actions[bot] commented on pull request #13091: ARROW-16498: [C++] Fix potential deadlock in arrow::compute::TaskScheduler

2022-05-06 Thread GitBox
github-actions[bot] commented on PR #13091: URL: https://github.com/apache/arrow/pull/13091#issuecomment-1120092489 :warning: Ticket **has not been started in JIRA**, please click 'Start Progress'. -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [arrow-rs] sunchao commented on a diff in pull request #1665: Add dictionary array support for substring function

2022-05-06 Thread GitBox
sunchao commented on code in PR #1665: URL: https://github.com/apache/arrow-rs/pull/1665#discussion_r867275202 ## arrow/src/compute/kernels/substring.rs: ## @@ -954,6 +992,56 @@ mod tests { without_nulls_generic_string::() } +#[test] +fn dictionary() -> R

[GitHub] [arrow-rs] sunchao commented on a diff in pull request #1665: Add dictionary array support for substring function

2022-05-06 Thread GitBox
sunchao commented on code in PR #1665: URL: https://github.com/apache/arrow-rs/pull/1665#discussion_r867274835 ## arrow/src/compute/kernels/substring.rs: ## @@ -18,13 +18,135 @@ //! Defines kernel to extract a substring of an Array //! Supported array types: \[Large\]StringArr

[GitHub] [arrow-rs] codecov-commenter commented on pull request #1665: Add dictionary array support for substring function

2022-05-06 Thread GitBox
codecov-commenter commented on PR #1665: URL: https://github.com/apache/arrow-rs/pull/1665#issuecomment-1120081463 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1665?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+S

[GitHub] [arrow-rs] viirya commented on a diff in pull request #1665: Add dictionary array support for substring function

2022-05-06 Thread GitBox
viirya commented on code in PR #1665: URL: https://github.com/apache/arrow-rs/pull/1665#discussion_r867268461 ## arrow/src/compute/kernels/substring.rs: ## @@ -954,6 +992,56 @@ mod tests { without_nulls_generic_string::() } +#[test] +fn dictionary() -> Re

[GitHub] [arrow] westonpace commented on pull request #13028: ARROW-16083: [WIP][C++] Implement AsofJoin execution node

2022-05-06 Thread GitBox
westonpace commented on PR #13028: URL: https://github.com/apache/arrow/pull/13028#issuecomment-1120079201 The second thing you will often see mentioned is the "morsel / batch" model. When reading data in you often want to read it in largish blocks of data (counter-intuitively, these large

[GitHub] [arrow-rs] viirya commented on a diff in pull request #1665: Add dictionary array support for substring function

2022-05-06 Thread GitBox
viirya commented on code in PR #1665: URL: https://github.com/apache/arrow-rs/pull/1665#discussion_r867268156 ## arrow/src/compute/kernels/substring.rs: ## @@ -18,13 +18,135 @@ //! Defines kernel to extract a substring of an Array //! Supported array types: \[Large\]StringArra

[GitHub] [arrow-rs] viirya commented on a diff in pull request #1665: Add dictionary array support for substring function

2022-05-06 Thread GitBox
viirya commented on code in PR #1665: URL: https://github.com/apache/arrow-rs/pull/1665#discussion_r867267938 ## arrow/src/compute/kernels/substring.rs: ## @@ -18,13 +18,135 @@ //! Defines kernel to extract a substring of an Array //! Supported array types: \[Large\]StringArra

[GitHub] [arrow] westonpace commented on pull request #13028: ARROW-16083: [WIP][C++] Implement AsofJoin execution node

2022-05-06 Thread GitBox
westonpace commented on PR #13028: URL: https://github.com/apache/arrow/pull/13028#issuecomment-1120077381 Sure. "Thread per core" is probably a bit of a misnomer too, but I haven't found a nicer term yet. The default thread pool size is std::hardware_concurrency which is the maximum numb

[GitHub] [arrow-rs] sunchao commented on a diff in pull request #1665: Add dictionary array support for substring function

2022-05-06 Thread GitBox
sunchao commented on code in PR #1665: URL: https://github.com/apache/arrow-rs/pull/1665#discussion_r867265840 ## arrow/src/compute/kernels/substring.rs: ## @@ -18,13 +18,135 @@ //! Defines kernel to extract a substring of an Array //! Supported array types: \[Large\]StringArr

[GitHub] [arrow-cookbook] toddfarmer commented on issue #190: [Java] Clarify build requirements

2022-05-06 Thread GitBox
toddfarmer commented on issue #190: URL: https://github.com/apache/arrow-cookbook/issues/190#issuecomment-1120075460 Thanks @westonpace - I'll constrain my proposed changes to the Java CONTRIBUTING file, then. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow-rs] sunchao commented on a diff in pull request #1665: Add dictionary array support for substring function

2022-05-06 Thread GitBox
sunchao commented on code in PR #1665: URL: https://github.com/apache/arrow-rs/pull/1665#discussion_r867265638 ## arrow/src/compute/kernels/substring.rs: ## @@ -18,13 +18,135 @@ //! Defines kernel to extract a substring of an Array //! Supported array types: \[Large\]StringArr

[GitHub] [arrow-rs] sunchao opened a new pull request, #1665: Add dictionary array support for substring function

2022-05-06 Thread GitBox
sunchao opened a new pull request, #1665: URL: https://github.com/apache/arrow-rs/pull/1665 # Which issue does this PR close? Closes #1656. # Rationale for this change Currently the `substring` kernel only support "plain" arrays but not dictionary encoded ones.

[GitHub] [arrow-cookbook] westonpace commented on issue #190: [Java] Clarify build requirements

2022-05-06 Thread GitBox
westonpace commented on issue #190: URL: https://github.com/apache/arrow-cookbook/issues/190#issuecomment-1120073287 That looks accurate to me, the Java build documents should probably be updated. Though that is a Java cookbook decision and not a general decision (which is probably why the

[GitHub] [arrow] github-actions[bot] commented on pull request #13090: ARROW-15622: Implement union_all and union for arrow_dplyr_query

2022-05-06 Thread GitBox
github-actions[bot] commented on PR #13090: URL: https://github.com/apache/arrow/pull/13090#issuecomment-1120069730 https://issues.apache.org/jira/browse/ARROW-15622 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow] jonkeane commented on a diff in pull request #13086: ARROW-16414: [R] Remove ARROW_R_WITH_ARROW and arrow_available()

2022-05-06 Thread GitBox
jonkeane commented on code in PR #13086: URL: https://github.com/apache/arrow/pull/13086#discussion_r867257196 ## r/R/arrow-package.R: ## @@ -80,90 +78,29 @@ } } - if (arrow_available()) { -# register extension types that we use internally -reregister_extensio

[GitHub] [arrow] github-actions[bot] commented on pull request #13089: ARROW-16497: [R] Update version in NEWS.md

2022-05-06 Thread GitBox
github-actions[bot] commented on PR #13089: URL: https://github.com/apache/arrow/pull/13089#issuecomment-1120059786 https://issues.apache.org/jira/browse/ARROW-16497 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow] github-actions[bot] commented on pull request #13089: ARROW-16497: [R] Update version in NEWS.md

2022-05-06 Thread GitBox
github-actions[bot] commented on PR #13089: URL: https://github.com/apache/arrow/pull/13089#issuecomment-1120059800 :warning: Ticket **has not been started in JIRA**, please click 'Start Progress'. -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [arrow] kou opened a new pull request, #13089: ARROW-16497: [R] Update version in NEWS.md

2022-05-06 Thread GitBox
kou opened a new pull request, #13089: URL: https://github.com/apache/arrow/pull/13089 This solves the following CI failure: https://github.com/apache/arrow/runs/6329293119?check_suite_focus=true Failure: test_version_pre_tag(PrepareTest) /home/runner/work/arrow/arrow

[GitHub] [arrow] github-actions[bot] commented on pull request #13079: ARROW-16474: [C++][Packaging] Require Python 3.7 or later

2022-05-06 Thread GitBox
github-actions[bot] commented on PR #13079: URL: https://github.com/apache/arrow/pull/13079#issuecomment-1120056469 Revision: 52331d595fff924c29eae9383e6343dfe226dda3 Submitted crossbow builds: [ursacomputing/crossbow @ actions-2032](https://github.com/ursacomputing/crossbow/branches/

[GitHub] [arrow] kou closed pull request #12918: ARROW-16228: [CI][Packaging][Conan] Add a job to test minimum build

2022-05-06 Thread GitBox
kou closed pull request #12918: ARROW-16228: [CI][Packaging][Conan] Add a job to test minimum build URL: https://github.com/apache/arrow/pull/12918 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [arrow] kou commented on pull request #13079: ARROW-16474: [C++][Packaging] Require Python 3.7 or later

2022-05-06 Thread GitBox
kou commented on PR #13079: URL: https://github.com/apache/arrow/pull/13079#issuecomment-1120055951 @github-actions crossbow submit -g linux -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [arrow] ursabot commented on pull request #13049: ARROW-16035: [Java] Handling empty JDBC ResultSet

2022-05-06 Thread GitBox
ursabot commented on PR #13049: URL: https://github.com/apache/arrow/pull/13049#issuecomment-1120041556 Benchmark runs are scheduled for baseline = cf2a35c0826e4e96cf5d8e2321a4e3a0a243 and contender = 2025673f0c49b8b8ccd41d5494b29653b57366e4. 2025673f0c49b8b8ccd41d5494b29653b57366e4 is

[GitHub] [arrow-datafusion] ovr commented on a diff in pull request #2196: feat: Support GetIndexedFieldExpr for ScalarValue

2022-05-06 Thread GitBox
ovr commented on code in PR #2196: URL: https://github.com/apache/arrow-datafusion/pull/2196#discussion_r867235433 ## datafusion/physical-expr/src/expressions/get_indexed_field.rs: ## @@ -105,9 +105,51 @@ impl PhysicalExpr for GetIndexedFieldExpr { }

[GitHub] [arrow-datafusion] ovr commented on a diff in pull request #2196: feat: Support GetIndexedFieldExpr for ScalarValue

2022-05-06 Thread GitBox
ovr commented on code in PR #2196: URL: https://github.com/apache/arrow-datafusion/pull/2196#discussion_r867235433 ## datafusion/physical-expr/src/expressions/get_indexed_field.rs: ## @@ -105,9 +105,51 @@ impl PhysicalExpr for GetIndexedFieldExpr { }

[GitHub] [arrow] github-actions[bot] commented on pull request #13088: ARROW-16085: [C++][R] InMemoryDataset::ReplaceSchema does not alter scan output

2022-05-06 Thread GitBox
github-actions[bot] commented on PR #13088: URL: https://github.com/apache/arrow/pull/13088#issuecomment-1120017512 :warning: Ticket **has not been started in JIRA**, please click 'Start Progress'. -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [arrow] github-actions[bot] commented on pull request #13088: ARROW-16085: [C++][R] InMemoryDataset::ReplaceSchema does not alter scan output

2022-05-06 Thread GitBox
github-actions[bot] commented on PR #13088: URL: https://github.com/apache/arrow/pull/13088#issuecomment-1120017501 https://issues.apache.org/jira/browse/ARROW-16085 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow] vibhatha commented on a diff in pull request #13069: ARROW-15901: [C++] Support flat custom output field names in Substrait

2022-05-06 Thread GitBox
vibhatha commented on code in PR #13069: URL: https://github.com/apache/arrow/pull/13069#discussion_r867223442 ## cpp/src/arrow/compute/exec/options.h: ## @@ -232,10 +232,13 @@ class ARROW_EXPORT SinkNodeConsumer { /// \brief Add a sink node which consumes data within the exec

[GitHub] [arrow] zagto commented on pull request #12957: ARROW-16280: [C++] Avoid copying shared_ptr in Expression::type()

2022-05-06 Thread GitBox
zagto commented on PR #12957: URL: https://github.com/apache/arrow/pull/12957#issuecomment-1120009112 I now run ran one of the most-affected benchmarks - [tpch, arrow, parquet, memory_map=False, TPCH-15, scale_factor=1, R](https://conbench.ursa.dev/compare/benchmarks/3c980569f8be4d85a8d54e0

[GitHub] [arrow] github-actions[bot] commented on pull request #13079: ARROW-16474: [C++][Packaging] Require Python 3.7 or later

2022-05-06 Thread GitBox
github-actions[bot] commented on PR #13079: URL: https://github.com/apache/arrow/pull/13079#issuecomment-111264 Revision: 3641dec28909531785c3f90275c1ece37c7736ab Submitted crossbow builds: [ursacomputing/crossbow @ actions-2031](https://github.com/ursacomputing/crossbow/branches/

[GitHub] [arrow] kou commented on pull request #13079: ARROW-16474: [C++][Packaging] Require Python 3.7 or later

2022-05-06 Thread GitBox
kou commented on PR #13079: URL: https://github.com/apache/arrow/pull/13079#issuecomment-1119998633 @github-actions crossbow submit debian-* ubuntu-* -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] github-actions[bot] commented on pull request #13087: ARROW-16487: [C++][Parquet] Fix parquet::Statistics::Equals() with minmax

2022-05-06 Thread GitBox
github-actions[bot] commented on PR #13087: URL: https://github.com/apache/arrow/pull/13087#issuecomment-1119988291 https://issues.apache.org/jira/browse/ARROW-16487 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow] kou opened a new pull request, #13087: ARROW-16487: [C++][Parquet] Fix parquet::Statistics::Equals() with minmax

2022-05-06 Thread GitBox
kou opened a new pull request, #13087: URL: https://github.com/apache/arrow/pull/13087 The following cases return wrong result: * statistics_no_minmax.Equals(statistics_no_minmax): This must returns true but false is returned. * statistics_minmax.Equals(statistics_minmax):

[GitHub] [arrow-cookbook] toddfarmer commented on issue #190: [Java] Clarify build requirements

2022-05-06 Thread GitBox
toddfarmer commented on issue #190: URL: https://github.com/apache/arrow-cookbook/issues/190#issuecomment-1119988004 After installing pip, the build script successfully installed required dependencies, but failed due to missing sphinx-build: ``` todd@pop-os:~/arrow-cookbook$ make java

[GitHub] [arrow-cookbook] toddfarmer closed issue #191: Clarify prerequisites for building cookbooks

2022-05-06 Thread GitBox
toddfarmer closed issue #191: Clarify prerequisites for building cookbooks URL: https://github.com/apache/arrow-cookbook/issues/191 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [arrow-cookbook] toddfarmer commented on issue #191: Clarify prerequisites for building cookbooks

2022-05-06 Thread GitBox
toddfarmer commented on issue #191: URL: https://github.com/apache/arrow-cookbook/issues/191#issuecomment-1119987401 Will consolidate this with 190. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [arrow-datafusion] timvw commented on issue #2445: ObjectStore Directory Semantics

2022-05-06 Thread GitBox
timvw commented on issue #2445: URL: https://github.com/apache/arrow-datafusion/issues/2445#issuecomment-1119987284 Another thing to consider is that we probably need the capability to ignore some keys (eg: some apps such as spark, when badly configured, generate files such as /xxx/

[GitHub] [arrow] kou closed pull request #13083: ARROW-16490: [C++][Windows] Don't force to use bundled GoogleTest

2022-05-06 Thread GitBox
kou closed pull request #13083: ARROW-16490: [C++][Windows] Don't force to use bundled GoogleTest URL: https://github.com/apache/arrow/pull/13083 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow-datafusion] andygrove commented on issue #1980: Ballista crates cannot be released from DafaFusion 7.0.0 source release

2022-05-06 Thread GitBox
andygrove commented on issue #1980: URL: https://github.com/apache/arrow-datafusion/issues/1980#issuecomment-1119962752 Closing this and will fix as part of 8.0.0 release -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

[GitHub] [arrow-datafusion] andygrove closed issue #1980: Ballista crates cannot be released from DafaFusion 7.0.0 source release

2022-05-06 Thread GitBox
andygrove closed issue #1980: Ballista crates cannot be released from DafaFusion 7.0.0 source release URL: https://github.com/apache/arrow-datafusion/issues/1980 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[GitHub] [arrow-datafusion] andygrove merged pull request #2473: Minor: Move test code from `context.rs` into `sql_integration`

2022-05-06 Thread GitBox
andygrove merged PR #2473: URL: https://github.com/apache/arrow-datafusion/pull/2473 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[GitHub] [arrow-datafusion] andygrove commented on pull request #2472: Fix stage key extraction

2022-05-06 Thread GitBox
andygrove commented on PR #2472: URL: https://github.com/apache/arrow-datafusion/pull/2472#issuecomment-1119961098 I also ran the integration tests with this change and they passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

[GitHub] [arrow-datafusion] andygrove merged pull request #2472: Fix stage key extraction

2022-05-06 Thread GitBox
andygrove merged PR #2472: URL: https://github.com/apache/arrow-datafusion/pull/2472 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[GitHub] [arrow] kou closed pull request #13084: ARROW-16494: [C++] Add missing include that is making some packaging jobs fail

2022-05-06 Thread GitBox
kou closed pull request #13084: ARROW-16494: [C++] Add missing include that is making some packaging jobs fail URL: https://github.com/apache/arrow/pull/13084 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [arrow-datafusion] jon-chuang commented on a diff in pull request #2451: Add `EXISTS` and `IN` subquery rewriting for correlated filters at filter depth 1

2022-05-06 Thread GitBox
jon-chuang commented on code in PR #2451: URL: https://github.com/apache/arrow-datafusion/pull/2451#discussion_r865692147 ## datafusion/core/src/optimizer/subquery_filter_to_join.rs: ## @@ -46,6 +47,227 @@ impl SubqueryFilterToJoin { pub fn new() -> Self { Self {}

[GitHub] [arrow-datafusion] andygrove commented on pull request #2446: Add SQL planner support for `ROLLUP` and `CUBE` grouping set expressions

2022-05-06 Thread GitBox
andygrove commented on PR #2446: URL: https://github.com/apache/arrow-datafusion/pull/2446#issuecomment-1119955111 @alamb Yet another SQL planner PR for review when you have time. Also @Jimexist I think you may be interested in this one. -- This is an automated message from the Apa

[GitHub] [arrow-datafusion] andygrove commented on a diff in pull request #2446: Add SQL planner support for `ROLLUP` and `CUBE` grouping set expressions

2022-05-06 Thread GitBox
andygrove commented on code in PR #2446: URL: https://github.com/apache/arrow-datafusion/pull/2446#discussion_r867166355 ## datafusion/core/src/logical_plan/expr.rs: ## @@ -138,9 +139,33 @@ pub fn create_udaf( /// Create field meta-data from an expression, for use in a result s

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #1588: Add support for nested list arrays from parquet to arrow arrays (#993)

2022-05-06 Thread GitBox
tustvold commented on code in PR #1588: URL: https://github.com/apache/arrow-rs/pull/1588#discussion_r867163842 ## parquet/src/arrow/array_reader/list_array.rs: ## @@ -193,122 +245,267 @@ mod tests { use crate::file::reader::{FileReader, SerializedFileReader}; use crat

[GitHub] [arrow-datafusion] timvw commented on issue #2445: ObjectStore Directory Semantics

2022-05-06 Thread GitBox
timvw commented on issue #2445: URL: https://github.com/apache/arrow-datafusion/issues/2445#issuecomment-1119946378 ListObjects - list the keys with a given prefix => Other key value store supporting glob (and way more complex filters) https://redis.io/commands/keys/ https://docs.a

[GitHub] [arrow] icexelloss commented on pull request #13028: ARROW-16083: [WIP][C++] Implement AsofJoin execution node

2022-05-06 Thread GitBox
icexelloss commented on PR #13028: URL: https://github.com/apache/arrow/pull/13028#issuecomment-1119944070 @westonpace reading back your comments - I wonder if you can explain a bit more "thread-per-core" model here? >The execution engine follows a thread-per-core model so any blockin

[GitHub] [arrow] ursabot commented on pull request #13052: ARROW-16442: [Python][Dataset] Fix fragments of ORC Dataset to use FileFragment class

2022-05-06 Thread GitBox
ursabot commented on PR #13052: URL: https://github.com/apache/arrow/pull/13052#issuecomment-1119943211 ['Python', 'R'] benchmarks have high level of regressions. [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/eac67089489f4d89b2a1c374688af571...c6f2512059c24f9ebe5e3acda68992c0/)

[GitHub] [arrow] ursabot commented on pull request #13052: ARROW-16442: [Python][Dataset] Fix fragments of ORC Dataset to use FileFragment class

2022-05-06 Thread GitBox
ursabot commented on PR #13052: URL: https://github.com/apache/arrow/pull/13052#issuecomment-1119943011 Benchmark runs are scheduled for baseline = 3cf43433125a195f3ab1e1e681340edc83a76dc4 and contender = cf2a35c0826e4e96cf5d8e2321a4e3a0a243. cf2a35c0826e4e96cf5d8e2321a4e3a0a243 is

[GitHub] [arrow] rok commented on a diff in pull request #12528: ARROW-15251: [C++] Temporal floor/ceil/round handle ambiguous/nonexistent local time

2022-05-06 Thread GitBox
rok commented on code in PR #12528: URL: https://github.com/apache/arrow/pull/12528#discussion_r867157279 ## cpp/src/arrow/compute/kernels/scalar_temporal_test.cc: ## @@ -2611,6 +2611,114 @@ TEST_F(ScalarTemporalTest, TestRoundTemporal) { CheckScalarUnary(op, unit, times, uni

[GitHub] [arrow] rok commented on a diff in pull request #12528: ARROW-15251: [C++] Temporal floor/ceil/round handle ambiguous/nonexistent local time

2022-05-06 Thread GitBox
rok commented on code in PR #12528: URL: https://github.com/apache/arrow/pull/12528#discussion_r867156658 ## cpp/src/arrow/compute/kernels/scalar_temporal_test.cc: ## @@ -2611,6 +2611,114 @@ TEST_F(ScalarTemporalTest, TestRoundTemporal) { CheckScalarUnary(op, unit, times, uni

[GitHub] [arrow] rok commented on a diff in pull request #12528: ARROW-15251: [C++] Temporal floor/ceil/round handle ambiguous/nonexistent local time

2022-05-06 Thread GitBox
rok commented on code in PR #12528: URL: https://github.com/apache/arrow/pull/12528#discussion_r867157111 ## cpp/src/arrow/compute/kernels/scalar_temporal_test.cc: ## @@ -2611,6 +2611,114 @@ TEST_F(ScalarTemporalTest, TestRoundTemporal) { CheckScalarUnary(op, unit, times, uni

[GitHub] [arrow] rok commented on a diff in pull request #12528: ARROW-15251: [C++] Temporal floor/ceil/round handle ambiguous/nonexistent local time

2022-05-06 Thread GitBox
rok commented on code in PR #12528: URL: https://github.com/apache/arrow/pull/12528#discussion_r867156433 ## cpp/src/arrow/compute/kernels/scalar_temporal_test.cc: ## @@ -2611,6 +2611,114 @@ TEST_F(ScalarTemporalTest, TestRoundTemporal) { CheckScalarUnary(op, unit, times, uni

[GitHub] [arrow-datafusion] timvw commented on issue #2445: ObjectStore Directory Semantics

2022-05-06 Thread GitBox
timvw commented on issue #2445: URL: https://github.com/apache/arrow-datafusion/issues/2445#issuecomment-1119940868 In case ETL is less important, we should update the project description: DataFusion is used to create modern, fast and efficient data pipelines, **ETL processes**, and

[GitHub] [arrow] nealrichardson closed pull request #13085: MINOR: [R] Don't use warning() in .onLoad()

2022-05-06 Thread GitBox
nealrichardson closed pull request #13085: MINOR: [R] Don't use warning() in .onLoad() URL: https://github.com/apache/arrow/pull/13085 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

[GitHub] [arrow] rok commented on a diff in pull request #12528: ARROW-15251: [C++] Temporal floor/ceil/round handle ambiguous/nonexistent local time

2022-05-06 Thread GitBox
rok commented on code in PR #12528: URL: https://github.com/apache/arrow/pull/12528#discussion_r867155626 ## cpp/src/arrow/compute/kernels/scalar_temporal_test.cc: ## @@ -2611,6 +2611,114 @@ TEST_F(ScalarTemporalTest, TestRoundTemporal) { CheckScalarUnary(op, unit, times, uni

[GitHub] [arrow] github-actions[bot] commented on pull request #13086: ARROW-16414: [R] Remove ARROW_R_WITH_ARROW and arrow_available()

2022-05-06 Thread GitBox
github-actions[bot] commented on PR #13086: URL: https://github.com/apache/arrow/pull/13086#issuecomment-1119940068 https://issues.apache.org/jira/browse/ARROW-16414 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow] nealrichardson opened a new pull request, #13086: ARROW-16414: [R] Remove ARROW_R_WITH_ARROW and arrow_available()

2022-05-06 Thread GitBox
nealrichardson opened a new pull request, #13086: URL: https://github.com/apache/arrow/pull/13086 The diff looks bigger than that because * Sometimes those changes just resulted in reducing indentation * I moved arrow_info() and related functions to their own file, and did the same

[GitHub] [arrow] rok commented on a diff in pull request #12528: ARROW-15251: [C++] Temporal floor/ceil/round handle ambiguous/nonexistent local time

2022-05-06 Thread GitBox
rok commented on code in PR #12528: URL: https://github.com/apache/arrow/pull/12528#discussion_r867154702 ## cpp/src/arrow/compute/kernels/scalar_temporal_test.cc: ## @@ -2611,6 +2611,114 @@ TEST_F(ScalarTemporalTest, TestRoundTemporal) { CheckScalarUnary(op, unit, times, uni

[GitHub] [arrow-rs] tustvold commented on pull request #1662: Remove parquet dictionary converters (#1661)

2022-05-06 Thread GitBox
tustvold commented on PR #1662: URL: https://github.com/apache/arrow-rs/pull/1662#issuecomment-1119939464 No, this module is marked experimental so should be fine -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow-datafusion] timvw commented on issue #2445: ObjectStore Directory Semantics

2022-05-06 Thread GitBox
timvw commented on issue #2445: URL: https://github.com/apache/arrow-datafusion/issues/2445#issuecomment-1119938026 I do agree that the capabilities we actually need are rather limited (compared to a full filesystem spec) and it makes sense to not name those FileSystem then. Should we also

[GitHub] [arrow-rs] viirya commented on pull request #1636: Fix generate_nested_dictionary_case integration test failure for Rust cases

2022-05-06 Thread GitBox
viirya commented on PR #1636: URL: https://github.com/apache/arrow-rs/pull/1636#issuecomment-1119935916 Thank you @alamb. Renamed these names too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow-cookbook] toddfarmer opened a new issue, #191: Clarify prerequisites for building cookbooks

2022-05-06 Thread GitBox
toddfarmer opened a new issue, #191: URL: https://github.com/apache/arrow-cookbook/issues/191 This documents my experience attempting to build the java cookbook for the first time on a new laptop. I suspect the problems encountered apply to other languages as well. ## Python and pip

[GitHub] [arrow-rs] alamb commented on a diff in pull request #1636: Fix generate_nested_dictionary_case integration test failure for Rust cases

2022-05-06 Thread GitBox
alamb commented on code in PR #1636: URL: https://github.com/apache/arrow-rs/pull/1636#discussion_r867140238 ## arrow/src/ipc/reader.rs: ## @@ -457,7 +468,7 @@ pub fn read_record_batch( buf: &[u8], batch: ipc::RecordBatch, schema: SchemaRef, -dictionaries: &[O

[GitHub] [arrow-datafusion] andygrove commented on issue #2360: Cannot have `order by` expression that references complex `group by` expression

2022-05-06 Thread GitBox
andygrove commented on issue #2360: URL: https://github.com/apache/arrow-datafusion/issues/2360#issuecomment-1119923765 I updated this issue. Some of the original bugs are now fixed but this still fails and I am looking into it now. -- This is an automated message from the Apache Git Ser

[GitHub] [arrow-rs] alamb commented on pull request #1647: Exclude `dict_id` and `dict_is_ordered` from equality comparison of `Field`

2022-05-06 Thread GitBox
alamb commented on PR #1647: URL: https://github.com/apache/arrow-rs/pull/1647#issuecomment-1119921265 cc @jhorstmann -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[GitHub] [arrow-rs] alamb merged pull request #1659: Add `DecimalType` support in `new_null_array `

2022-05-06 Thread GitBox
alamb merged PR #1659: URL: https://github.com/apache/arrow-rs/pull/1659 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache

[GitHub] [arrow-rs] alamb commented on pull request #1659: Add `DecimalType` support in `new_null_array `

2022-05-06 Thread GitBox
alamb commented on PR #1659: URL: https://github.com/apache/arrow-rs/pull/1659#issuecomment-1119919766 Thanks @yjshen -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[GitHub] [arrow-rs] alamb commented on pull request #1662: Remove parquet dictionary converters (#1661)

2022-05-06 Thread GitBox
alamb commented on PR #1662: URL: https://github.com/apache/arrow-rs/pull/1662#issuecomment-1119917411 @tustvold can you confirm that this is (technically) an API change (as in it would be possible for someone to have code that relied on the `pub` struct that this removed? -- This is an

[GitHub] [arrow-rs] alamb merged pull request #1662: Remove parquet dictionary converters (#1661)

2022-05-06 Thread GitBox
alamb merged PR #1662: URL: https://github.com/apache/arrow-rs/pull/1662 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache

[GitHub] [arrow-rs] alamb commented on a diff in pull request #1588: Add support for nested list arrays from parquet to arrow arrays (#993)

2022-05-06 Thread GitBox
alamb commented on code in PR #1588: URL: https://github.com/apache/arrow-rs/pull/1588#discussion_r867065942 ## parquet/src/arrow/array_reader/list_array.rs: ## @@ -35,12 +35,12 @@ pub struct ListArrayReader { item_reader: Box, data_type: ArrowType, item_type: Arr

[GitHub] [arrow] icexelloss commented on pull request #13028: ARROW-16083: [WIP][C++] Implement AsofJoin execution node

2022-05-06 Thread GitBox
icexelloss commented on PR #13028: URL: https://github.com/apache/arrow/pull/13028#issuecomment-1119876133 I took a stab at using `arrow::compute::Hashing64::HashMultiColumn` but it seems lot of what I need is added in this PR: https://github.com/apache/arrow/pull/12326 I will prob

[GitHub] [arrow] JabariBooker commented on a diff in pull request #13051: ARROW-14185: [C++] HashJoinNode should validate HashJoinNodeOptions

2022-05-06 Thread GitBox
JabariBooker commented on code in PR #13051: URL: https://github.com/apache/arrow/pull/13051#discussion_r867069250 ## cpp/src/arrow/compute/exec/hash_join_node.cc: ## @@ -453,6 +453,20 @@ Status HashJoinSchema::CollectFilterColumns(std::vector& left_filter, return Status::OK

[GitHub] [arrow] kszucs closed pull request #13081: ARROW-16488: [Archery][Dev] Allow extra message to be sent on chat report

2022-05-06 Thread GitBox
kszucs closed pull request #13081: ARROW-16488: [Archery][Dev] Allow extra message to be sent on chat report URL: https://github.com/apache/arrow/pull/13081 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow] kszucs closed pull request #13074: ARROW-16448: [CI][Archery] Refactor EmailReport to be a JinjaReport

2022-05-06 Thread GitBox
kszucs closed pull request #13074: ARROW-16448: [CI][Archery] Refactor EmailReport to be a JinjaReport URL: https://github.com/apache/arrow/pull/13074 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] zeroshade commented on a diff in pull request #13068: ARROW-16473: [Go] fixing memory leak in serializedPageReader

2022-05-06 Thread GitBox
zeroshade commented on code in PR #13068: URL: https://github.com/apache/arrow/pull/13068#discussion_r867062398 ## go/parquet/file/page_reader.go: ## @@ -611,8 +610,6 @@ func (p *serializedPageReader) Next() bool { // we don't know this page type, we're a

[GitHub] [arrow-datafusion] alamb commented on pull request #2470: minor: remove expr dependency from the row crate, update crate-deps.dot/svg

2022-05-06 Thread GitBox
alamb commented on PR #2470: URL: https://github.com/apache/arrow-datafusion/pull/2470#issuecomment-1119863988 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

[GitHub] [arrow-datafusion] alamb merged pull request #2470: minor: remove expr dependency from the row crate, update crate-deps.dot/svg

2022-05-06 Thread GitBox
alamb merged PR #2470: URL: https://github.com/apache/arrow-datafusion/pull/2470 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

[GitHub] [arrow-datafusion] alamb merged pull request #2471: Minor: Use ExprVisitor to find columns referenced by expr

2022-05-06 Thread GitBox
alamb merged PR #2471: URL: https://github.com/apache/arrow-datafusion/pull/2471 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2454: feat: Support CompoundIdentifier as GetIndexedField access

2022-05-06 Thread GitBox
alamb commented on code in PR #2454: URL: https://github.com/apache/arrow-datafusion/pull/2454#discussion_r867058066 ## datafusion/core/src/sql/planner.rs: ## @@ -1608,11 +1608,19 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { } else { ma

  1   2   3   >