[GitHub] [arrow-rs] HaoYang670 commented on issue #1620: Ensure there is a single zero in the offsets buffer for an empty ListArray.

2022-05-11 Thread GitBox
HaoYang670 commented on issue #1620: URL: https://github.com/apache/arrow-rs/issues/1620#issuecomment-1123263695 Could we close this issue, or open it for more discussions? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[GitHub] [arrow-rs] viirya commented on a diff in pull request #1684: simplify offsets checking

2022-05-11 Thread GitBox
viirya commented on code in PR #1684: URL: https://github.com/apache/arrow-rs/pull/1684#discussion_r869952421 ## arrow/src/array/data.rs: ## @@ -1033,76 +1035,64 @@ impl ArrayData { } /// Calls the `validate(item_index, range)` function for each of -/// the range

[GitHub] [arrow-rs] tustvold commented on pull request #1682: Fix Parquet Arrow Schema Inference

2022-05-11 Thread GitBox
tustvold commented on PR #1682: URL: https://github.com/apache/arrow-rs/pull/1682#issuecomment-1123274440 Ok I've backed out the changes related to #1666 from this PR, so this should preserve the existing schema inference behaviour. I'm confident that this PR will lay the ground work

[GitHub] [arrow-rs] viirya commented on a diff in pull request #1684: simplify offsets checking

2022-05-11 Thread GitBox
viirya commented on code in PR #1684: URL: https://github.com/apache/arrow-rs/pull/1684#discussion_r869953424 ## arrow/src/array/data.rs: ## @@ -1033,76 +1035,64 @@ impl ArrayData { } /// Calls the `validate(item_index, range)` function for each of -/// the range

[GitHub] [arrow-rs] HaoYang670 commented on a diff in pull request #1684: simplify offsets checking

2022-05-11 Thread GitBox
HaoYang670 commented on code in PR #1684: URL: https://github.com/apache/arrow-rs/pull/1684#discussion_r869955221 ## arrow/src/array/data.rs: ## @@ -1033,76 +1035,64 @@ impl ArrayData { } /// Calls the `validate(item_index, range)` function for each of -/// the r

[GitHub] [arrow-rs] viirya commented on pull request #1684: simplify offsets checking

2022-05-11 Thread GitBox
viirya commented on PR #1684: URL: https://github.com/apache/arrow-rs/pull/1684#issuecomment-1123279715 Thanks @HaoYang670. The performance numbers look good! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[GitHub] [arrow-rs] HaoYang670 commented on pull request #1684: simplify offsets checking

2022-05-11 Thread GitBox
HaoYang670 commented on PR #1684: URL: https://github.com/apache/arrow-rs/pull/1684#issuecomment-1123283421 Actually, I think we could do more. Because there is some redundant checking (such as comparing each offset with `offset_limit`). However, as a number of tests rely on that checking,

[GitHub] [arrow] AlvinJ15 commented on a diff in pull request #13009: ARROW-602: [C++] Provide iterator access to primitive elements inside an Array

2022-05-11 Thread GitBox
AlvinJ15 commented on code in PR #13009: URL: https://github.com/apache/arrow/pull/13009#discussion_r869971878 ## cpp/src/arrow/stl_iterator.h: ## @@ -128,6 +131,162 @@ class ArrayIterator { int64_t index_; }; +template > +class ChunkedArrayIterator { + public: + using va

[GitHub] [arrow] pitrou commented on a diff in pull request #13009: ARROW-602: [C++] Provide iterator access to primitive elements inside an Array

2022-05-11 Thread GitBox
pitrou commented on code in PR #13009: URL: https://github.com/apache/arrow/pull/13009#discussion_r869972999 ## cpp/src/arrow/stl_iterator.h: ## @@ -128,6 +131,162 @@ class ArrayIterator { int64_t index_; }; +template > +class ChunkedArrayIterator { + public: + using valu

[GitHub] [arrow] AlvinJ15 commented on a diff in pull request #13009: ARROW-602: [C++] Provide iterator access to primitive elements inside an Array

2022-05-11 Thread GitBox
AlvinJ15 commented on code in PR #13009: URL: https://github.com/apache/arrow/pull/13009#discussion_r869971878 ## cpp/src/arrow/stl_iterator.h: ## @@ -128,6 +131,162 @@ class ArrayIterator { int64_t index_; }; +template > +class ChunkedArrayIterator { + public: + using va

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #1682: Fix Parquet Arrow Schema Inference

2022-05-11 Thread GitBox
tustvold commented on code in PR #1682: URL: https://github.com/apache/arrow-rs/pull/1682#discussion_r869953255 ## parquet/src/arrow/schema.rs: ## @@ -544,502 +525,6 @@ fn arrow_to_parquet_type(field: &Field) -> Result { } } } -/// This struct is used to group met

[GitHub] [arrow-rs] codecov-commenter commented on pull request #1682: Fix Parquet Arrow Schema Inference

2022-05-11 Thread GitBox
codecov-commenter commented on PR #1682: URL: https://github.com/apache/arrow-rs/pull/1682#issuecomment-1123301035 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1682?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+S

[GitHub] [arrow] pitrou commented on a diff in pull request #13009: ARROW-602: [C++] Provide iterator access to primitive elements inside an Array

2022-05-11 Thread GitBox
pitrou commented on code in PR #13009: URL: https://github.com/apache/arrow/pull/13009#discussion_r869975583 ## cpp/src/arrow/stl_iterator.h: ## @@ -128,6 +131,162 @@ class ArrayIterator { int64_t index_; }; +template > +class ChunkedArrayIterator { + public: + using valu

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #1682: Fix Parquet Arrow Schema Inference

2022-05-11 Thread GitBox
tustvold commented on code in PR #1682: URL: https://github.com/apache/arrow-rs/pull/1682#discussion_r869976905 ## parquet/src/arrow/array_reader/builder.rs: ## @@ -52,657 +50,278 @@ pub fn build_array_reader( where T: IntoIterator, { -let mut leaves = HashMap::<*cons

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #1682: Fix Parquet Arrow Schema Inference

2022-05-11 Thread GitBox
tustvold commented on code in PR #1682: URL: https://github.com/apache/arrow-rs/pull/1682#discussion_r869976905 ## parquet/src/arrow/array_reader/builder.rs: ## @@ -52,657 +50,278 @@ pub fn build_array_reader( where T: IntoIterator, { -let mut leaves = HashMap::<*cons

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #1682: Fix Parquet Arrow Schema Inference

2022-05-11 Thread GitBox
tustvold commented on code in PR #1682: URL: https://github.com/apache/arrow-rs/pull/1682#discussion_r869957607 ## parquet/src/arrow/schema/complex.rs: ## @@ -0,0 +1,563 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

[GitHub] [arrow-datafusion] Ted-Jiang commented on a diff in pull request #2499: Add metrics for ParquetExec

2022-05-11 Thread GitBox
Ted-Jiang commented on code in PR #2499: URL: https://github.com/apache/arrow-datafusion/pull/2499#discussion_r87794 ## datafusion/core/src/physical_plan/file_format/parquet.rs: ## @@ -227,6 +228,7 @@ impl ExecutionPlan for ParquetExec { files: self.base_config

[GitHub] [arrow] ursabot commented on pull request #13088: ARROW-16085: [C++][R] InMemoryDataset::ReplaceSchema does not alter scan output

2022-05-11 Thread GitBox
ursabot commented on PR #13088: URL: https://github.com/apache/arrow/pull/13088#issuecomment-1123328016 Benchmark runs are scheduled for baseline = 35119f29b0e0de68b1ccc5f2066e0cc7d27fddd0 and contender = 5b653ee27b13c99af08d8a24fdce1ceece0ab91a. 5b653ee27b13c99af08d8a24fdce1ceece0ab91a is

[GitHub] [arrow-datafusion] Ted-Jiang commented on a diff in pull request #2499: Add metrics for ParquetExec

2022-05-11 Thread GitBox
Ted-Jiang commented on code in PR #2499: URL: https://github.com/apache/arrow-datafusion/pull/2499#discussion_r870003789 ## datafusion/core/src/physical_plan/file_format/parquet.rs: ## @@ -425,7 +437,8 @@ impl Stream for ParquetExecStream { mut self: Pin<&mut Self>,

[GitHub] [arrow-datafusion] yjshen opened a new issue, #2509: Support optional filter in Join

2022-05-11 Thread GitBox
yjshen opened a new issue, #2509: URL: https://github.com/apache/arrow-datafusion/issues/2509 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** It would be necessary to support filters in the join operator, instead of a join ope

[GitHub] [arrow] AlvinJ15 commented on a diff in pull request #13009: ARROW-602: [C++] Provide iterator access to primitive elements inside an Array

2022-05-11 Thread GitBox
AlvinJ15 commented on code in PR #13009: URL: https://github.com/apache/arrow/pull/13009#discussion_r870017986 ## cpp/src/arrow/stl_iterator.h: ## @@ -128,6 +131,162 @@ class ArrayIterator { int64_t index_; }; +template > +class ChunkedArrayIterator { + public: + using va

[GitHub] [arrow] amol- commented on pull request #13099: ARROW-16468: [Python] Test Table filter feature with complex exprs and add Expression.apply method

2022-05-11 Thread GitBox
amol- commented on PR #13099: URL: https://github.com/apache/arrow/pull/13099#issuecomment-1123364801 > But we can probably improve that, instead of adding the `apply` method? Allowing "non expressions" would be risky by the way, as the current check to decide to return back an expres

[GitHub] [arrow] amol- commented on a diff in pull request #13075: ARROW-16467: [Python] Add helper function _exec_plan._filter_table to filter tables based on Expression

2022-05-11 Thread GitBox
amol- commented on code in PR #13075: URL: https://github.com/apache/arrow/pull/13075#discussion_r870030999 ## python/pyarrow/tests/test_exec_plan.py: ## @@ -190,3 +191,35 @@ def test_table_join_keys_order(): "colVals_l": ["a", "b", "f", None], "colVals_r": ["A

[GitHub] [arrow-datafusion] yjshen commented on issue #2509: Support optional filter in Join

2022-05-11 Thread GitBox
yjshen commented on issue #2509: URL: https://github.com/apache/arrow-datafusion/issues/2509#issuecomment-1123417465 A related issue #2496. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [arrow] github-actions[bot] commented on pull request #13118: ARROW-16394: [R] Implement lubridate's parsers with year, month and date components

2022-05-11 Thread GitBox
github-actions[bot] commented on PR #13118: URL: https://github.com/apache/arrow/pull/13118#issuecomment-112344 https://issues.apache.org/jira/browse/ARROW-16394 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow] github-actions[bot] commented on pull request #13118: ARROW-16394: [R] Implement lubridate's parsers with year, month and date components

2022-05-11 Thread GitBox
github-actions[bot] commented on PR #13118: URL: https://github.com/apache/arrow/pull/13118#issuecomment-1123433366 :warning: Ticket **has not been started in JIRA**, please click 'Start Progress'. -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [arrow] dragosmg commented on pull request #13055: ARROW-16253: [R] Helper function for casting from float to duration via int64()

2022-05-11 Thread GitBox
dragosmg commented on PR #13055: URL: https://github.com/apache/arrow/pull/13055#issuecomment-1123448373 I haven't found more places in which to use this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [arrow] thisisnic closed pull request #13055: ARROW-16253: [R] Helper function for casting from float to duration via int64()

2022-05-11 Thread GitBox
thisisnic closed pull request #13055: ARROW-16253: [R] Helper function for casting from float to duration via int64() URL: https://github.com/apache/arrow/pull/13055 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow] thisisnic closed pull request #13106: MINOR: [R] correct NEWS heading

2022-05-11 Thread GitBox
thisisnic closed pull request #13106: MINOR: [R] correct NEWS heading URL: https://github.com/apache/arrow/pull/13106 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

[GitHub] [arrow] rtpsw commented on pull request #13117: ARROW-16525: [C++] Tee node not properly marking node finished

2022-05-11 Thread GitBox
rtpsw commented on PR #13117: URL: https://github.com/apache/arrow/pull/13117#issuecomment-1123461116 LGTM. Just a few small suggestions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[GitHub] [arrow] raulcd opened a new pull request, #13119: MINOR: Add feedback information output when building in case of skipping pyarrow build

2022-05-11 Thread GitBox
raulcd opened a new pull request, #13119: URL: https://github.com/apache/arrow/pull/13119 This minor PR tries to give the user a better hint on what is happening in case of their build being skipped because `cachedir != build_temp`. I faced the issue and had to ask and debug to understand t

[GitHub] [arrow] amol- commented on a diff in pull request #10162: ARROW-12506: [Python] Improve modularity of pyarrow codebase: _ipc module

2022-05-11 Thread GitBox
amol- commented on code in PR #10162: URL: https://github.com/apache/arrow/pull/10162#discussion_r870114751 ## python/pyarrow/_ipc.pyx: ## @@ -781,6 +805,25 @@ cdef class RecordBatchReader(_Weakrefable): self.reader = c_reader return self +@staticmethod +

[GitHub] [arrow] pravindra closed pull request #13015: ARROW-13052: [Gandiva][C++] Add regexp_extract function

2022-05-11 Thread GitBox
pravindra closed pull request #13015: ARROW-13052: [Gandiva][C++] Add regexp_extract function URL: https://github.com/apache/arrow/pull/13015 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [arrow-rs] alamb commented on pull request #1683: Use bytes in parquet rather than custom Buffer implementation (#1474)

2022-05-11 Thread GitBox
alamb commented on PR #1683: URL: https://github.com/apache/arrow-rs/pull/1683#issuecomment-1123500810 > My reasoning for not worrying about removing Memtracker is I made the API experimental a while back, and there haven't been any complaints. I am all for deleting it, especially wit

[GitHub] [arrow-rs] alamb commented on a diff in pull request #1677: Fix generate_unions_case for Rust case

2022-05-11 Thread GitBox
alamb commented on code in PR #1677: URL: https://github.com/apache/arrow-rs/pull/1677#discussion_r870122518 ## arrow/src/datatypes/datatype.rs: ## @@ -499,6 +499,52 @@ impl DataType { )) } } +Some(s)

[GitHub] [arrow] dragosmg commented on pull request #12980: ARROW-16281: [R] [CI] Bump versions with the release of 4.2

2022-05-11 Thread GitBox
dragosmg commented on PR #12980: URL: https://github.com/apache/arrow/pull/12980#issuecomment-1123523658 @github-actions crossbow submit -g r -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2481: Numeric, String, Boolean comparisons with literal `NULL`

2022-05-11 Thread GitBox
alamb commented on code in PR #2481: URL: https://github.com/apache/arrow-datafusion/pull/2481#discussion_r870145839 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -657,17 +657,15 @@ macro_rules! compute_utf8_op_scalar { /// Invoke a compute kernel on a data arr

[GitHub] [arrow] raulcd commented on pull request #13113: ARROW-15893: [CI][Python] Add python minimal builds to nightly builds

2022-05-11 Thread GitBox
raulcd commented on PR #13113: URL: https://github.com/apache/arrow/pull/13113#issuecomment-1123573201 @github-actions crossbow submit example-python-minimal-build* -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [arrow] ursabot commented on pull request #13104: MINOR: [R] Move tzdb loading out of .onLoad() to avoid a check NOTE

2022-05-11 Thread GitBox
ursabot commented on PR #13104: URL: https://github.com/apache/arrow/pull/13104#issuecomment-1123579724 Benchmark runs are scheduled for baseline = 5b653ee27b13c99af08d8a24fdce1ceece0ab91a and contender = b264dca5a00cb889e1caf24f41a5f018c96cec4d. b264dca5a00cb889e1caf24f41a5f018c96cec4d is

[GitHub] [arrow] github-actions[bot] commented on pull request #12980: ARROW-16281: [R] [CI] Bump versions with the release of 4.2

2022-05-11 Thread GitBox
github-actions[bot] commented on PR #12980: URL: https://github.com/apache/arrow/pull/12980#issuecomment-1123586415 Revision: 5e2fc5642beb79c99c5e672ce5d2a0d77369200b Submitted crossbow builds: [ursacomputing/crossbow @ actions-2062](https://github.com/ursacomputing/crossbow/branches/

[GitHub] [arrow] projjal commented on a diff in pull request #13073: [Gandiva][C++] Add binary functions

2022-05-11 Thread GitBox
projjal commented on code in PR #13073: URL: https://github.com/apache/arrow/pull/13073#discussion_r870171291 ## cpp/src/gandiva/precompiled/string_ops.cc: ## @@ -705,6 +705,25 @@ CAST_VARCHAR_FROM_VARLEN_TYPE(binary) CAST_VARBINARY_FROM_STRING_AND_BINARY(utf8) CAST_VARBINARY_

[GitHub] [arrow] github-actions[bot] commented on pull request #13113: ARROW-15893: [CI][Python] Add python minimal builds to nightly builds

2022-05-11 Thread GitBox
github-actions[bot] commented on PR #13113: URL: https://github.com/apache/arrow/pull/13113#issuecomment-1123591837 Revision: 3a5ff6c9b84706bc629626be8a7cf69432464e68 Submitted crossbow builds: [ursacomputing/crossbow @ actions-2063](https://github.com/ursacomputing/crossbow/branches/

[GitHub] [arrow-datafusion] alamb opened a new pull request, #2510: fix `NULL column` evaluation, tests for same

2022-05-11 Thread GitBox
alamb opened a new pull request, #2510: URL: https://github.com/apache/arrow-datafusion/pull/2510 # Which issue does this PR close? re https://github.com/apache/arrow-datafusion/issues/1179 and https://github.com/apache/arrow-datafusion/issues/2482 . # Rationale for this chang

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2510: fix `NULL column` evaluation, tests for same

2022-05-11 Thread GitBox
alamb commented on code in PR #2510: URL: https://github.com/apache/arrow-datafusion/pull/2510#discussion_r870180841 ## datafusion/core/tests/sql/expr.rs: ## @@ -1203,121 +1203,54 @@ async fn nested_subquery() -> Result<()> { } #[tokio::test] -async fn comparisons_with_null(

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2481: Numeric, String, Boolean comparisons with literal `NULL`

2022-05-11 Thread GitBox
alamb commented on code in PR #2481: URL: https://github.com/apache/arrow-datafusion/pull/2481#discussion_r870182921 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -657,17 +657,15 @@ macro_rules! compute_utf8_op_scalar { /// Invoke a compute kernel on a data arr

[GitHub] [arrow] Johnnathanalmeida commented on a diff in pull request #13073: [Gandiva][C++] Add binary functions

2022-05-11 Thread GitBox
Johnnathanalmeida commented on code in PR #13073: URL: https://github.com/apache/arrow/pull/13073#discussion_r870195336 ## cpp/src/gandiva/precompiled/string_ops.cc: ## @@ -705,6 +705,25 @@ CAST_VARCHAR_FROM_VARLEN_TYPE(binary) CAST_VARBINARY_FROM_STRING_AND_BINARY(utf8) CAST_

[GitHub] [arrow] jorisvandenbossche commented on pull request #13119: MINOR: Add feedback information output when building in case of skipping pyarrow build

2022-05-11 Thread GitBox
jorisvandenbossche commented on PR #13119: URL: https://github.com/apache/arrow/pull/13119#issuecomment-1123640611 Thanks! While it's certainly useful to get some message printed instead of just the "running build_ext" without anything else, I am wondering if we can give an additional hint?

[GitHub] [arrow] github-actions[bot] commented on pull request #13073: ARROW-16527: [Gandiva][C++] Add binary functions

2022-05-11 Thread GitBox
github-actions[bot] commented on PR #13073: URL: https://github.com/apache/arrow/pull/13073#issuecomment-1123647346 https://issues.apache.org/jira/browse/ARROW-16527 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow] jorisvandenbossche commented on pull request #13075: ARROW-16467: [Python] Add helper function _exec_plan._filter_table to filter tables based on Expression

2022-05-11 Thread GitBox
jorisvandenbossche commented on PR #13075: URL: https://github.com/apache/arrow/pull/13075#issuecomment-1123647353 Yes, that will indeed require some non-trivial work in the engine. Leaving it as is for now is good for me! -- This is an automated message from the Apache Git Service. To re

[GitHub] [arrow] github-actions[bot] commented on pull request #13073: ARROW-16527: [Gandiva][C++] Add binary functions

2022-05-11 Thread GitBox
github-actions[bot] commented on PR #13073: URL: https://github.com/apache/arrow/pull/13073#issuecomment-1123647366 :warning: Ticket **has not been started in JIRA**, please click 'Start Progress'. -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [arrow] jorisvandenbossche closed pull request #13075: ARROW-16467: [Python] Add helper function _exec_plan._filter_table to filter tables based on Expression

2022-05-11 Thread GitBox
jorisvandenbossche closed pull request #13075: ARROW-16467: [Python] Add helper function _exec_plan._filter_table to filter tables based on Expression URL: https://github.com/apache/arrow/pull/13075 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [arrow] projjal commented on a diff in pull request #13073: ARROW-16527: [Gandiva][C++] Add binary functions

2022-05-11 Thread GitBox
projjal commented on code in PR #13073: URL: https://github.com/apache/arrow/pull/13073#discussion_r870212747 ## cpp/src/gandiva/function_registry_string.cc: ## @@ -439,6 +439,14 @@ std::vector GetStringFunctionRegistry() { kResultNullIfNull, "right_utf8_in

[GitHub] [arrow-datafusion] matthewmturner commented on a diff in pull request #2279: Add `CREATE VIEW`

2022-05-11 Thread GitBox
matthewmturner commented on code in PR #2279: URL: https://github.com/apache/arrow-datafusion/pull/2279#discussion_r870230271 ## datafusion/core/src/datasource/view.rs: ## @@ -0,0 +1,364 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

[GitHub] [arrow] lidavidm commented on pull request #13116: MINOR: [Python] Python dataset test might try to run without datasets module

2022-05-11 Thread GitBox
lidavidm commented on PR #13116: URL: https://github.com/apache/arrow/pull/13116#issuecomment-1123682354 Looks like this got reported elsewhere: https://issues.apache.org/jira/browse/ARROW-16526 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2499: Add metrics for ParquetExec

2022-05-11 Thread GitBox
alamb commented on code in PR #2499: URL: https://github.com/apache/arrow-datafusion/pull/2499#discussion_r870249201 ## datafusion/core/src/physical_plan/file_format/parquet.rs: ## @@ -425,7 +437,8 @@ impl Stream for ParquetExecStream { mut self: Pin<&mut Self>,

[GitHub] [arrow-datafusion] alamb merged pull request #2499: Add metrics for ParquetExec

2022-05-11 Thread GitBox
alamb merged PR #2499: URL: https://github.com/apache/arrow-datafusion/pull/2499 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

[GitHub] [arrow-datafusion] alamb closed issue #2497: Add metrics for ParquetExec.

2022-05-11 Thread GitBox
alamb closed issue #2497: Add metrics for ParquetExec. URL: https://github.com/apache/arrow-datafusion/issues/2497 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

[GitHub] [arrow] lidavidm commented on a diff in pull request #13114: MINOR: [Java] Indicate absolute path is required in docs

2022-05-11 Thread GitBox
lidavidm commented on code in PR #13114: URL: https://github.com/apache/arrow/pull/13114#discussion_r870249432 ## docs/source/developers/java/building.rst: ## @@ -78,6 +78,7 @@ We can build these manually or we can use `Archery`_ to build them using a Docke Building JNI Libr

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2279: Add `CREATE VIEW`

2022-05-11 Thread GitBox
alamb commented on code in PR #2279: URL: https://github.com/apache/arrow-datafusion/pull/2279#discussion_r870249996 ## datafusion/core/src/datasource/view.rs: ## @@ -0,0 +1,364 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agr

[GitHub] [arrow-datafusion] matthewmturner commented on pull request #2279: Add `CREATE VIEW`

2022-05-11 Thread GitBox
matthewmturner commented on PR #2279: URL: https://github.com/apache/arrow-datafusion/pull/2279#issuecomment-1123711004 Ah - i believe the tpch tests use a published version of ballista. so i dont think we can run Q15 until our next release. let me know if im misunderstanding, i didnt hav

[GitHub] [arrow-datafusion] alamb opened a new pull request, #2511: Minor: remove code that is now in arrow

2022-05-11 Thread GitBox
alamb opened a new pull request, #2511: URL: https://github.com/apache/arrow-datafusion/pull/2511 This code https://github.com/apache/arrow-rs/issues/1312 was moved to arrow, so use the copy there # Rationale for this change Less code == better # What changes are includ

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2511: Minor: remove code that is now in arrow

2022-05-11 Thread GitBox
alamb commented on code in PR #2511: URL: https://github.com/apache/arrow-datafusion/pull/2511#discussion_r870255936 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -61,14 +61,6 @@ use datafusion_common::{DataFusionError, Result}; use datafusion_expr::binary_rule::

[GitHub] [arrow-datafusion] alamb commented on pull request #2495: Add new `ballista-cli` crate

2022-05-11 Thread GitBox
alamb commented on PR #2495: URL: https://github.com/apache/arrow-datafusion/pull/2495#issuecomment-1123715854 Thanks @andygrove looks much better to me -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2495: Add new `ballista-cli` crate

2022-05-11 Thread GitBox
alamb commented on code in PR #2495: URL: https://github.com/apache/arrow-datafusion/pull/2495#discussion_r870257940 ## ballista-cli/src/context.rs: ## @@ -0,0 +1,90 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. Se

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2490: WIP: Prepare for 8.0.0 release

2022-05-11 Thread GitBox
alamb commented on code in PR #2490: URL: https://github.com/apache/arrow-datafusion/pull/2490#discussion_r870259245 ## datafusion/CHANGELOG.md: ## @@ -19,6 +19,328 @@ # Changelog +## [8.0.0](https://github.com/apache/arrow-datafusion/tree/8.0.0) (2022-05-08) + +[Full Chan

[GitHub] [arrow-datafusion] alamb commented on issue #2502: [EPIC] Move Ballista to new arrow-ballista repo

2022-05-11 Thread GitBox
alamb commented on issue #2502: URL: https://github.com/apache/arrow-datafusion/issues/2502#issuecomment-1123720755 I agree this would be a good step -- and help Ballista and DataFusion both to mature. I am fully supportive. Thank you for the offer @thinkharderdev -- This is an a

[GitHub] [arrow-rs] alamb commented on issue #1620: Ensure there is a single zero in the offsets buffer for an empty ListArray.

2022-05-11 Thread GitBox
alamb commented on issue #1620: URL: https://github.com/apache/arrow-rs/issues/1620#issuecomment-1123722384 I think the conclusion is "we should keep the current implementation, as imperfect as it may be, for backwards compatibility reasons" Agree with closing -- anyone who disagrees

[GitHub] [arrow-rs] alamb closed issue #1620: Ensure there is a single zero in the offsets buffer for an empty ListArray.

2022-05-11 Thread GitBox
alamb closed issue #1620: Ensure there is a single zero in the offsets buffer for an empty ListArray. URL: https://github.com/apache/arrow-rs/issues/1620 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] raulcd commented on pull request #13119: MINOR: Add feedback information output when building in case of skipping pyarrow build

2022-05-11 Thread GitBox
raulcd commented on PR #13119: URL: https://github.com/apache/arrow/pull/13119#issuecomment-1123723199 > Thanks! While it's certainly useful to get some message printed instead of just the "running build_ext" without anything else, I am wondering if we can give an additional hint? Like sugg

[GitHub] [arrow-rs] alamb commented on pull request #1682: Fix Parquet Arrow Schema Inference

2022-05-11 Thread GitBox
alamb commented on PR #1682: URL: https://github.com/apache/arrow-rs/pull/1682#issuecomment-1123723375 I will review this later today -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [arrow-datafusion] andygrove commented on pull request #2279: Add `CREATE VIEW`

2022-05-11 Thread GitBox
andygrove commented on PR #2279: URL: https://github.com/apache/arrow-datafusion/pull/2279#issuecomment-1123737439 > Ah - i believe the tpch tests use a published version of ballista. so i dont think we can run Q15 until our next release. let me know if im misunderstanding, i didnt have ti

[GitHub] [arrow] raulcd commented on pull request #13113: ARROW-15893: [CI][Python] Add python minimal builds to nightly builds

2022-05-11 Thread GitBox
raulcd commented on PR #13113: URL: https://github.com/apache/arrow/pull/13113#issuecomment-1123738823 @github-actions crossbow submit example-python-minimal-build* -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [arrow-datafusion] andygrove commented on pull request #2279: Add `CREATE VIEW`

2022-05-11 Thread GitBox
andygrove commented on PR #2279: URL: https://github.com/apache/arrow-datafusion/pull/2279#issuecomment-1123739280 RAT is failing due to an empty file @ datafusion/core/src/physical_plan/view.rs -- This is an automated message from the Apache Git Service. To respond to the message, pleas

[GitHub] [arrow] github-actions[bot] commented on pull request #13113: ARROW-15893: [CI][Python] Add python minimal builds to nightly builds

2022-05-11 Thread GitBox
github-actions[bot] commented on PR #13113: URL: https://github.com/apache/arrow/pull/13113#issuecomment-1123740346 Revision: 46455a76fddf402232e1e0bd111b19ecd5d867ea Submitted crossbow builds: [ursacomputing/crossbow @ actions-2064](https://github.com/ursacomputing/crossbow/branches/

[GitHub] [arrow] edponce commented on a diff in pull request #13009: ARROW-602: [C++] Provide iterator access to primitive elements inside an Array

2022-05-11 Thread GitBox
edponce commented on code in PR #13009: URL: https://github.com/apache/arrow/pull/13009#discussion_r870292161 ## cpp/src/arrow/stl_iterator.h: ## @@ -128,6 +131,162 @@ class ArrayIterator { int64_t index_; }; +template > +class ChunkedArrayIterator { + public: + using val

[GitHub] [arrow] edponce commented on a diff in pull request #13009: ARROW-602: [C++] Provide iterator access to primitive elements inside an Array

2022-05-11 Thread GitBox
edponce commented on code in PR #13009: URL: https://github.com/apache/arrow/pull/13009#discussion_r870292161 ## cpp/src/arrow/stl_iterator.h: ## @@ -128,6 +131,162 @@ class ArrayIterator { int64_t index_; }; +template > +class ChunkedArrayIterator { + public: + using val

[GitHub] [arrow] edponce commented on a diff in pull request #13009: ARROW-602: [C++] Provide iterator access to primitive elements inside an Array

2022-05-11 Thread GitBox
edponce commented on code in PR #13009: URL: https://github.com/apache/arrow/pull/13009#discussion_r869635511 ## cpp/src/arrow/stl_iterator.h: ## @@ -128,6 +131,162 @@ class ArrayIterator { int64_t index_; }; +template > +class ChunkedArrayIterator { + public: + using val

[GitHub] [arrow-datafusion] alamb opened a new pull request, #2512: Remove remaining uses of `binary_array_op_scalar!` in binary.rs

2022-05-11 Thread GitBox
alamb opened a new pull request, #2512: URL: https://github.com/apache/arrow-datafusion/pull/2512 Draft until https://github.com/apache/arrow-datafusion/pull/2510 is merged # Which issue does this PR close? re https://github.com/apache/arrow-datafusion/issues/1179 and https://g

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2512: Remove remaining uses of `binary_array_op_scalar!` in binary.rs

2022-05-11 Thread GitBox
alamb commented on code in PR #2512: URL: https://github.com/apache/arrow-datafusion/pull/2512#discussion_r870302946 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -1142,6 +1079,20 @@ macro_rules! binary_array_op_dyn_scalar { }} } +/// Compares the array wi

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2512: Remove remaining uses of `binary_array_op_scalar!` in binary.rs

2022-05-11 Thread GitBox
alamb commented on code in PR #2512: URL: https://github.com/apache/arrow-datafusion/pull/2512#discussion_r870303487 ## datafusion/physical-expr/src/expressions/nullif.rs: ## @@ -82,18 +80,18 @@ pub fn nullif_func(args: &[ColumnarValue]) -> Result { match (lhs, rhs) {

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2511: Minor: remove code that is not included in arrow-rs

2022-05-11 Thread GitBox
alamb commented on code in PR #2511: URL: https://github.com/apache/arrow-datafusion/pull/2511#discussion_r870255936 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -61,14 +61,6 @@ use datafusion_common::{DataFusionError, Result}; use datafusion_expr::binary_rule::

[GitHub] [arrow-datafusion] andygrove opened a new issue, #2513: Change java package names in protobuf files

2022-05-11 Thread GitBox
andygrove opened a new issue, #2513: URL: https://github.com/apache/arrow-datafusion/issues/2513 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** We are using non-standard Java package names `org.datafusioncompute` and `org.ballist

[GitHub] [arrow-datafusion] alamb commented on issue #172: TPC-H Query 21

2022-05-11 Thread GitBox
alamb commented on issue #172: URL: https://github.com/apache/arrow-datafusion/issues/172#issuecomment-1123770525 Possibly fixed in https://github.com/apache/arrow-datafusion/pull/2451 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow-datafusion] andygrove opened a new issue, #2514: There are duplicate and inconsistent copies of `datafusion.proto`

2022-05-11 Thread GitBox
andygrove opened a new issue, #2514: URL: https://github.com/apache/arrow-datafusion/issues/2514 **Describe the bug** Ballista has a separate copy of `datafusion.proto` that is not the same as the version in DataFusion. **To Reproduce** diff the files **Expected behavior*

[GitHub] [arrow-datafusion] alamb commented on pull request #2492: Optimize MergeJoin by storing joined indices instead of creating small record batches for each match

2022-05-11 Thread GitBox
alamb commented on PR #2492: URL: https://github.com/apache/arrow-datafusion/pull/2492#issuecomment-1123772840 Nice -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

[GitHub] [arrow] raulcd commented on pull request #13113: ARROW-15893: [CI][Python] Add python minimal builds to nightly builds

2022-05-11 Thread GitBox
raulcd commented on PR #13113: URL: https://github.com/apache/arrow/pull/13113#issuecomment-1123776016 @github-actions crossbow submit example-python-minimal-build* -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [arrow] github-actions[bot] commented on pull request #13113: ARROW-15893: [CI][Python] Add python minimal builds to nightly builds

2022-05-11 Thread GitBox
github-actions[bot] commented on PR #13113: URL: https://github.com/apache/arrow/pull/13113#issuecomment-1123780482 Revision: 783bdf68526e43fa720bc5f6874d8725f21178bd Submitted crossbow builds: [ursacomputing/crossbow @ actions-2065](https://github.com/ursacomputing/crossbow/branches/

[GitHub] [arrow-datafusion] matthewmturner commented on pull request #2279: Add `CREATE VIEW`

2022-05-11 Thread GitBox
matthewmturner commented on PR #2279: URL: https://github.com/apache/arrow-datafusion/pull/2279#issuecomment-1123781227 @andygrove thanks - i removed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [arrow-datafusion] andygrove commented on issue #2502: [EPIC] Move Ballista to new arrow-ballista repo

2022-05-11 Thread GitBox
andygrove commented on issue #2502: URL: https://github.com/apache/arrow-datafusion/issues/2502#issuecomment-1123782492 Thanks for the encouraging feedback. I started a design doc where we can discuss the finer details. https://docs.google.com/document/d/1jNRbadyStSrV5kifwn0khufAwq6O

[GitHub] [arrow] Johnnathanalmeida commented on a diff in pull request #13073: ARROW-16527: [Gandiva][C++] Add binary functions

2022-05-11 Thread GitBox
Johnnathanalmeida commented on code in PR #13073: URL: https://github.com/apache/arrow/pull/13073#discussion_r870327545 ## cpp/src/gandiva/function_registry_string.cc: ## @@ -439,6 +439,14 @@ std::vector GetStringFunctionRegistry() { kResultNullIfNull, "rig

[GitHub] [arrow] ursabot commented on pull request #12589: ARROW-14848: [R] Implement bindings for lubridate's parse_date_time

2022-05-11 Thread GitBox
ursabot commented on PR #12589: URL: https://github.com/apache/arrow/pull/12589#issuecomment-1123790360 Benchmark runs are scheduled for baseline = b264dca5a00cb889e1caf24f41a5f018c96cec4d and contender = 214135d8ceb86bc16f487d08351f66c8fdf76638. 214135d8ceb86bc16f487d08351f66c8fdf76638 is

[GitHub] [arrow] ursabot commented on pull request #12589: ARROW-14848: [R] Implement bindings for lubridate's parse_date_time

2022-05-11 Thread GitBox
ursabot commented on PR #12589: URL: https://github.com/apache/arrow/pull/12589#issuecomment-1123790541 ['Python', 'R'] benchmarks have high level of regressions. [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/c38e57356d88469cb06b6c9675069e7c...3ac9c5b7b624409ca76405dc253a9958/)

[GitHub] [arrow-cookbook] lidavidm commented on a diff in pull request #193: Toddfarmer/contributing java dependencies

2022-05-11 Thread GitBox
lidavidm commented on code in PR #193: URL: https://github.com/apache/arrow-cookbook/pull/193#discussion_r870332568 ## java/CONTRIBUTING.rst: ## @@ -1,13 +1,27 @@ Building the Java Cookbook = - The Java cookbook uses the Sphinx documentation system.

[GitHub] [arrow] jonkeane commented on a diff in pull request #13070: ARROW-15804: [R] Update as.Date() to support several tryFormats

2022-05-11 Thread GitBox
jonkeane commented on code in PR #13070: URL: https://github.com/apache/arrow/pull/13070#discussion_r870332856 ## r/tests/testthat/test-dplyr-funcs-datetime.R: ## @@ -1490,8 +1490,10 @@ test_that("`as.Date()` and `as_date()`", { dt_utc = ymd_hms("2010-08-03 00:50:50"),

[GitHub] [arrow] icexelloss commented on pull request #12672: ARROW-15779: [Python] Create python bindings for Substrait consumer

2022-05-11 Thread GitBox
icexelloss commented on PR #12672: URL: https://github.com/apache/arrow/pull/12672#issuecomment-1123801306 cc @rtpsw we should be test this with ibis-substrait once this is merged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [arrow-datafusion] andygrove opened a new pull request, #2515: Protobuf dedupe

2022-05-11 Thread GitBox
andygrove opened a new pull request, #2515: URL: https://github.com/apache/arrow-datafusion/pull/2515 # Which issue does this PR close? Closes https://github.com/apache/arrow-datafusion/issues/2513 and https://github.com/apache/arrow-datafusion/issues/2514 # Rationale

[GitHub] [arrow] nealrichardson commented on pull request #12980: ARROW-16281: [R] [CI] Bump versions with the release of 4.2

2022-05-11 Thread GitBox
nealrichardson commented on PR #12980: URL: https://github.com/apache/arrow/pull/12980#issuecomment-1123829866 @github-actions crossbow submit test-r-install-local -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[GitHub] [arrow] github-actions[bot] commented on pull request #12980: ARROW-16281: [R] [CI] Bump versions with the release of 4.2

2022-05-11 Thread GitBox
github-actions[bot] commented on PR #12980: URL: https://github.com/apache/arrow/pull/12980#issuecomment-1123832382 Revision: 5e2fc5642beb79c99c5e672ce5d2a0d77369200b Submitted crossbow builds: [ursacomputing/crossbow @ actions-2066](https://github.com/ursacomputing/crossbow/branches/

[GitHub] [arrow-rs] alamb opened a new pull request, #1686: Add gRPC service reflection support to arrow flight

2022-05-11 Thread GitBox
alamb opened a new pull request, #1686: URL: https://github.com/apache/arrow-rs/pull/1686 # Which issue does this PR close? TBF Closes #. # Rationale for this change We like to use the gRPC service reflection, via `grpcurl --plaintext localhost:8082 list`, to list gRPC s

[GitHub] [arrow] bkietz commented on a diff in pull request #13009: ARROW-602: [C++] Provide iterator access to primitive elements inside an Array

2022-05-11 Thread GitBox
bkietz commented on code in PR #13009: URL: https://github.com/apache/arrow/pull/13009#discussion_r870385124 ## cpp/src/arrow/stl_iterator.h: ## @@ -128,6 +131,162 @@ class ArrayIterator { int64_t index_; }; +template > +class ChunkedArrayIterator { + public: + using valu

  1   2   3   >