[GitHub] [arrow] westonpace commented on pull request #14131: ARROW-17517: [C++] Remove internal headers from substrait API

2022-09-14 Thread GitBox
westonpace commented on PR #14131: URL: https://github.com/apache/arrow/pull/14131#issuecomment-1247651610 I'm a little stumped by this Windows error. MapNode is an abstract base class. It is used by project_node.cc, filter_node.cc (both in arrow compute) and also by file_base.cc (in data

[GitHub] [arrow] ursabot commented on pull request #14120: MINOR: [Doc] Fix minor typo on conda-recipes README

2022-09-14 Thread GitBox
ursabot commented on PR #14120: URL: https://github.com/apache/arrow/pull/14120#issuecomment-1247650420 Benchmark runs are scheduled for baseline = bda98aabe6b91a36e97f508a00d7580f9eb0a009 and contender = 29225accecb74b4974920a8acaca55578a44254e. 29225accecb74b4974920a8acaca55578a44254e is

[GitHub] [arrow-ballista] r4ntix opened a new issue, #214: Run example fails via PushStaged mode

2022-09-14 Thread GitBox
r4ntix opened a new issue, #214: URL: https://github.com/apache/arrow-ballista/issues/214 **Describe the bug** I start the scheduler and executor service in localhost: ```shell ./target/debug/ballista-scheduler -s push-staged --log-level-setting INFO,ballista_scheduler=DEBUG

[GitHub] [arrow] westonpace commented on a diff in pull request #14123: ARROW-17061: [Python][Substrait] Acero consumer is unable to consume count function from substrait query plan

2022-09-14 Thread GitBox
westonpace commented on code in PR #14123: URL: https://github.com/apache/arrow/pull/14123#discussion_r971568991 ## python/pyarrow/tests/test_substrait.py: ## @@ -165,3 +169,108 @@ def test_get_supported_functions(): 'functions_arithmetic.yaml', 'add')

[GitHub] [arrow] AlenkaF commented on pull request #14044: ARROW-16652: [Python] Cast compute kernel segfaults when called with a Table

2022-09-14 Thread GitBox
AlenkaF commented on PR #14044: URL: https://github.com/apache/arrow/pull/14044#issuecomment-1247630374 Thank you @westonpace for the review! @kshitij12345 do you want to add a check for a record batch or table as Weston suggested or you want to leave it as is? I would maybe try re

[GitHub] [arrow] westonpace commented on a diff in pull request #14118: ARROW-16857: [C++] Adding Project Relation ToProto

2022-09-14 Thread GitBox
westonpace commented on code in PR #14118: URL: https://github.com/apache/arrow/pull/14118#discussion_r971563214 ## cpp/src/arrow/engine/substrait/serde_test.cc: ## @@ -97,6 +97,78 @@ Result> GetTableFromPlan( return arrow::Table::FromRecordBatchReader(sink_reader.get()); }

[GitHub] [arrow] westonpace commented on a diff in pull request #14118: ARROW-16857: [C++] Adding Project Relation ToProto

2022-09-14 Thread GitBox
westonpace commented on code in PR #14118: URL: https://github.com/apache/arrow/pull/14118#discussion_r971562543 ## cpp/src/arrow/engine/substrait/serde_test.cc: ## @@ -97,6 +97,78 @@ Result> GetTableFromPlan( return arrow::Table::FromRecordBatchReader(sink_reader.get()); }

[GitHub] [arrow] westonpace commented on a diff in pull request #14118: ARROW-16857: [C++] Adding Project Relation ToProto

2022-09-14 Thread GitBox
westonpace commented on code in PR #14118: URL: https://github.com/apache/arrow/pull/14118#discussion_r971562107 ## cpp/src/arrow/engine/substrait/serde_test.cc: ## @@ -97,6 +97,78 @@ Result> GetTableFromPlan( return arrow::Table::FromRecordBatchReader(sink_reader.get()); }

[GitHub] [arrow] westonpace commented on a diff in pull request #14118: ARROW-16857: [C++] Adding Project Relation ToProto

2022-09-14 Thread GitBox
westonpace commented on code in PR #14118: URL: https://github.com/apache/arrow/pull/14118#discussion_r971560947 ## cpp/src/arrow/engine/substrait/serde_test.cc: ## @@ -97,6 +97,78 @@ Result> GetTableFromPlan( return arrow::Table::FromRecordBatchReader(sink_reader.get()); }

[GitHub] [arrow-datafusion] yjshen commented on pull request #3493: Update quarterly_roadmap.md

2022-09-14 Thread GitBox
yjshen commented on PR #3493: URL: https://github.com/apache/arrow-datafusion/pull/3493#issuecomment-1247587017 We have implemented file-based shuffle with memory management in Blaze. I could have that PRed if it aligns with the goal of Ballista. -- This is an automated message from the

[GitHub] [arrow] h-vetinari commented on pull request #14102: ARROW-17635: [Python][CI] Sync conda recipe with the arrow-cpp feedstock

2022-09-14 Thread GitBox
h-vetinari commented on PR #14102: URL: https://github.com/apache/arrow/pull/14102#issuecomment-1247579632 > We need LLVM headers to use Gandiva. (Some public Gandiva headers use LLVM headers.) If we build PyArrow with Gandiva support, we need LLVM as a build-time dependency. Yes, an

[GitHub] [arrow] ursabot commented on pull request #14038: ARROW-6772: [C++] Add operator== for interfaces with an Equals() method

2022-09-14 Thread GitBox
ursabot commented on PR #14038: URL: https://github.com/apache/arrow/pull/14038#issuecomment-1247569341 Benchmark runs are scheduled for baseline = bc8a608237424fa0a575e5226dd2dc4508fd8333 and contender = bda98aabe6b91a36e97f508a00d7580f9eb0a009. bda98aabe6b91a36e97f508a00d7580f9eb0a009 is

[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #3493: Update quarterly_roadmap.md

2022-09-14 Thread GitBox
codecov-commenter commented on PR #3493: URL: https://github.com/apache/arrow-datafusion/pull/3493#issuecomment-1247562745 # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/3493?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_

[GitHub] [arrow-datafusion] yahoNanJing opened a new pull request, #3493: Update quarterly_roadmap.md

2022-09-14 Thread GitBox
yahoNanJing opened a new pull request, #3493: URL: https://github.com/apache/arrow-datafusion/pull/3493 # Which issue does this PR close? Closes #. # Rationale for this change # What changes are included in this PR? # Are there any user-facing chan

[GitHub] [arrow-ballista] andygrove opened a new issue, #213: Config settings in BallistaContext do not get passed to DataFusion context

2022-09-14 Thread GitBox
andygrove opened a new issue, #213: URL: https://github.com/apache/arrow-ballista/issues/213 **Describe the bug** I would like to set `datafusion.execution.coalesce_batches=false` in BallistaContext and have it passed through to the DataFusion context so that it affects the plan that get

[GitHub] [arrow-datafusion] Ted-Jiang commented on a diff in pull request #3477: [feat] Support `columns_sorted` in row_filters with pageIndex

2022-09-14 Thread GitBox
Ted-Jiang commented on code in PR #3477: URL: https://github.com/apache/arrow-datafusion/pull/3477#discussion_r971474454 ## datafusion/core/src/physical_plan/file_format/row_filter.rs: ## @@ -228,11 +230,32 @@ fn size_of_columns(columns: &[usize], metadata: &ParquetMetaData) ->

[GitHub] [arrow-ballista] yahoNanJing commented on issue #209: Add LaunchMultiTask rpc interface for executor

2022-09-14 Thread GitBox
yahoNanJing commented on issue #209: URL: https://github.com/apache/arrow-ballista/issues/209#issuecomment-1247508351 Hi @askoa, yes currently we can launch a bunch of tasks in a single grpc request. However, for the same stage, they don't share the information, like execution plan, etc.

[GitHub] [arrow-datafusion] Ted-Jiang commented on a diff in pull request #3477: [feat] Support `columns_sorted` in row_filters with pageIndex

2022-09-14 Thread GitBox
Ted-Jiang commented on code in PR #3477: URL: https://github.com/apache/arrow-datafusion/pull/3477#discussion_r971462528 ## datafusion/core/Cargo.toml: ## @@ -79,6 +79,7 @@ object_store = "0.5.0" ordered-float = "3.0" parking_lot = "0.12" parquet = { version = "22.0.0", featu

[GitHub] [arrow] westonpace commented on a diff in pull request #13965: ARROW-17517: [C++] Avoid exposing internal headers from engine/api.h

2022-09-14 Thread GitBox
westonpace commented on code in PR #13965: URL: https://github.com/apache/arrow/pull/13965#discussion_r971458333 ## cpp/src/arrow/engine/api.h: ## @@ -19,6 +19,4 @@ #pragma once -#include "arrow/engine/substrait/extension_set.h" #include "arrow/engine/substrait/extension_t

[GitHub] [arrow] richtia commented on a diff in pull request #14123: ARROW-17061: [Python][Substrait] Acero consumer is unable to consume count function from substrait query plan

2022-09-14 Thread GitBox
richtia commented on code in PR #14123: URL: https://github.com/apache/arrow/pull/14123#discussion_r971456421 ## python/pyarrow/tests/test_substrait.py: ## @@ -165,3 +169,108 @@ def test_get_supported_functions(): 'functions_arithmetic.yaml', 'add')

[GitHub] [arrow-ballista] andygrove commented on pull request #212: Make shuffle reader exec less verbose

2022-09-14 Thread GitBox
andygrove commented on PR #212: URL: https://github.com/apache/arrow-ballista/pull/212#issuecomment-1247488090 @yahoNanJing @thinkharderdev would be good to get your opinions on this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[GitHub] [arrow-ballista] andygrove opened a new pull request, #212: Make shuffle reader exec less verbose

2022-09-14 Thread GitBox
andygrove opened a new pull request, #212: URL: https://github.com/apache/arrow-ballista/pull/212 # Which issue does this PR close? N/A # Rationale for this change I am starting to do some performance profiling and tuning and I cannot read the plans because

[GitHub] [arrow] ursabot commented on pull request #14119: ARROW-17725: [CI][Python] Fix test collection in case of Arrow built without parquet

2022-09-14 Thread GitBox
ursabot commented on PR #14119: URL: https://github.com/apache/arrow/pull/14119#issuecomment-1247487001 Benchmark runs are scheduled for baseline = 03cf0dd37287d776ae2ef3724038e0b70baa2eed and contender = bc8a608237424fa0a575e5226dd2dc4508fd8333. bc8a608237424fa0a575e5226dd2dc4508fd8333 is

[GitHub] [arrow-ballista] andygrove opened a new issue, #211: Make ShuffleReaderExec/ShuffleWriterExec output less verbose

2022-09-14 Thread GitBox
andygrove opened a new issue, #211: URL: https://github.com/apache/arrow-ballista/issues/211 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** It is hard to review metrics in the scheduler output whtn there are many shuffler partiti

[GitHub] [arrow-datafusion] ursabot commented on pull request #3490: Make the function from_proto_binary_op public

2022-09-14 Thread GitBox
ursabot commented on PR #3490: URL: https://github.com/apache/arrow-datafusion/pull/3490#issuecomment-1247476363 Benchmark runs are scheduled for baseline = f52d8aff2e7975403699268225bca0a9efbe7076 and contender = 84bee899958aaf70372ef84811c6787f53fa25eb. 84bee899958aaf70372ef84811c6787f5

[GitHub] [arrow-datafusion] ursabot commented on pull request #3486: minor: fix bug in `downcast_value!` macro (`T` --> `$T`)

2022-09-14 Thread GitBox
ursabot commented on PR #3486: URL: https://github.com/apache/arrow-datafusion/pull/3486#issuecomment-1247476350 Benchmark runs are scheduled for baseline = 06f01eaf65b78d73ffc7fb0bf95619e1e884e850 and contender = f52d8aff2e7975403699268225bca0a9efbe7076. f52d8aff2e7975403699268225bca0a9e

[GitHub] [arrow-datafusion] yjshen merged pull request #3490: Make the function from_proto_binary_op public

2022-09-14 Thread GitBox
yjshen merged PR #3490: URL: https://github.com/apache/arrow-datafusion/pull/3490 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arr

[GitHub] [arrow-datafusion] yjshen closed issue #3489: Make `from_proto_binary_op` public

2022-09-14 Thread GitBox
yjshen closed issue #3489: Make `from_proto_binary_op` public URL: https://github.com/apache/arrow-datafusion/issues/3489 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

[GitHub] [arrow-datafusion] yjshen merged pull request #3486: minor: fix bug in `downcast_value!` macro (`T` --> `$T`)

2022-09-14 Thread GitBox
yjshen merged PR #3486: URL: https://github.com/apache/arrow-datafusion/pull/3486 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arr

[GitHub] [arrow] github-actions[bot] commented on pull request #14131: ARROW-17517: [C++] Remove internal headers from substrait API

2022-09-14 Thread GitBox
github-actions[bot] commented on PR #14131: URL: https://github.com/apache/arrow/pull/14131#issuecomment-1247464958 https://issues.apache.org/jira/browse/ARROW-17517 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow] github-actions[bot] commented on pull request #14131: ARROW-17517: [C++] Remove internal headers from substrait API

2022-09-14 Thread GitBox
github-actions[bot] commented on PR #14131: URL: https://github.com/apache/arrow/pull/14131#issuecomment-1247464973 :warning: Ticket **has not been started in JIRA**, please click 'Start Progress'. -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [arrow] westonpace commented on pull request #14131: ARROW-17517: [C++] Remove internal headers from substrait API

2022-09-14 Thread GitBox
westonpace commented on PR #14131: URL: https://github.com/apache/arrow/pull/14131#issuecomment-1247464907 > It seems that you refer wrong Jira issue. Oops, thanks :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] paleolimbot commented on a diff in pull request #13706: ARROW-17178: [R] Support head() in arrow_dplyr_query with user-defined function

2022-09-14 Thread GitBox
paleolimbot commented on code in PR #13706: URL: https://github.com/apache/arrow/pull/13706#discussion_r971430832 ## r/src/compute-exec.cpp: ## @@ -56,118 +56,151 @@ std::shared_ptr MakeExecNodeOrStop( }); } -std::pair, std::shared_ptr> -ExecPlan_prepare(const std::sh

[GitHub] [arrow] kou commented on pull request #14131: ARROW-17157: [C++] Remove internal headers from substrait API

2022-09-14 Thread GitBox
kou commented on PR #14131: URL: https://github.com/apache/arrow/pull/14131#issuecomment-1247459284 It seems that you refer wrong Jira issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [arrow] cyb70289 commented on pull request #14101: ARROW-17691: [Go] Implement Take for Primitive Types

2022-09-14 Thread GitBox
cyb70289 commented on PR #14101: URL: https://github.com/apache/arrow/pull/14101#issuecomment-1247455193 > Thanks for taking a look @cyb70289! BTW: when you get a chance, if you could describe the process you went through for implementing the ARM assembly versions of the SIMD stuff since c2

[GitHub] [arrow] westonpace commented on pull request #14131: ARROW-17157: [C++] Remove internal headers from substrait API

2022-09-14 Thread GitBox
westonpace commented on PR #14131: URL: https://github.com/apache/arrow/pull/14131#issuecomment-1247454615 I figured I should tackle this to brush up on the rules for includes myself to try and help keep things clean going forwards. While I was at it I also did a bit of general header clea

[GitHub] [arrow] github-actions[bot] commented on pull request #14131: ARROW-17157: [C++] Remove internal headers from substrait API

2022-09-14 Thread GitBox
github-actions[bot] commented on PR #14131: URL: https://github.com/apache/arrow/pull/14131#issuecomment-1247452213 https://issues.apache.org/jira/browse/ARROW-17157 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow-datafusion] ursabot commented on pull request #3470: Use the column data type as the NULL data type in the row filter

2022-09-14 Thread GitBox
ursabot commented on PR #3470: URL: https://github.com/apache/arrow-datafusion/pull/3470#issuecomment-1247450891 Benchmark runs are scheduled for baseline = 5f029cc73755b6800217e370ebe0a7b5e8a6a224 and contender = 06f01eaf65b78d73ffc7fb0bf95619e1e884e850. 06f01eaf65b78d73ffc7fb0bf95619e1e

[GitHub] [arrow] kou commented on a diff in pull request #13556: ARROW-17021:[C++][R][CI] Enable use of sccache in crossbow

2022-09-14 Thread GitBox
kou commented on code in PR #13556: URL: https://github.com/apache/arrow/pull/13556#discussion_r971414553 ## ci/docker/centos-7-cpp.dockerfile: ## @@ -20,6 +20,7 @@ FROM centos:centos7 RUN yum install -y \ diffutils \ gcc-c++ \ +curl \ Review Comment:

[GitHub] [arrow-datafusion] liukun4515 commented on issue #3469: The data type of predicate in the row filter should be same in the binary expr

2022-09-14 Thread GitBox
liukun4515 commented on issue #3469: URL: https://github.com/apache/arrow-datafusion/issues/3469#issuecomment-1247449656 > If the table schema is c1,c2,c3, the file schema is c1,c3 with filter c2 = lit. > > Why not push the `predicate = false` to the parquet file directly to replace

[GitHub] [arrow-datafusion] liukun4515 commented on pull request #3470: Use the column data type as the NULL data type in the row filter

2022-09-14 Thread GitBox
liukun4515 commented on PR #3470: URL: https://github.com/apache/arrow-datafusion/pull/3470#issuecomment-1247448285 > Thanks for all the good work @liukun4515 and @thinkharderdev > > I would be fine with merging this PR if we change it according to @thinkharderdev 's suggestion here

[GitHub] [arrow-datafusion] liukun4515 commented on a diff in pull request #3470: Use the column data type as the NULL data type in the row filter

2022-09-14 Thread GitBox
liukun4515 commented on code in PR #3470: URL: https://github.com/apache/arrow-datafusion/pull/3470#discussion_r971419180 ## datafusion/core/src/physical_plan/file_format/row_filter.rs: ## @@ -202,7 +202,18 @@ impl<'a> ExprRewriter for FilterCandidateBuilder<'a> { fn mutate

[GitHub] [arrow-datafusion] liukun4515 merged pull request #3470: Use the column data type as the NULL data type in the row filter

2022-09-14 Thread GitBox
liukun4515 merged PR #3470: URL: https://github.com/apache/arrow-datafusion/pull/3470 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

[GitHub] [arrow-datafusion] liukun4515 closed issue #3469: The data type of predicate in the row filter should be same in the binary expr

2022-09-14 Thread GitBox
liukun4515 closed issue #3469: The data type of predicate in the row filter should be same in the binary expr URL: https://github.com/apache/arrow-datafusion/issues/3469 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow-datafusion] liukun4515 opened a new issue, #3492: Do't throw the error when projected columns are not in the table schema

2022-09-14 Thread GitBox
liukun4515 opened a new issue, #3492: URL: https://github.com/apache/arrow-datafusion/issues/3492 In the case where the table has projected columns (like in `parquet_multiple_partitions` where we project columns from the hive partition directory structure) then the projected columns are not

[GitHub] [arrow] jrbourbeau commented on pull request #14080: ARROW-16838: [Python] Improve schema inference for pandas indexes with extension dtypes

2022-09-14 Thread GitBox
jrbourbeau commented on PR #14080: URL: https://github.com/apache/arrow/pull/14080#issuecomment-1247447396 The Travis CI failures is, I think, unrelated to the changes in this PR. @jorisvandenbossche let me know if there are any other changes you'd like to see -- This is an automated me

[GitHub] [arrow-datafusion] liukun4515 commented on a diff in pull request #3472: inlist: move type coercion to logical phase

2022-09-14 Thread GitBox
liukun4515 commented on code in PR #3472: URL: https://github.com/apache/arrow-datafusion/pull/3472#discussion_r971417133 ## datafusion/optimizer/src/type_coercion.rs: ## @@ -164,11 +163,61 @@ impl ExprRewriter for TypeCoercionRewriter<'_> { };

[GitHub] [arrow-datafusion] liukun4515 commented on a diff in pull request #3472: inlist: move type coercion to logical phase

2022-09-14 Thread GitBox
liukun4515 commented on code in PR #3472: URL: https://github.com/apache/arrow-datafusion/pull/3472#discussion_r971416948 ## datafusion/core/src/physical_plan/planner.rs: ## @@ -1712,12 +1714,12 @@ mod tests { .limit(3, Some(10))? .build()?; -

[GitHub] [arrow-datafusion] liukun4515 commented on a diff in pull request #3472: inlist: move type coercion to logical phase

2022-09-14 Thread GitBox
liukun4515 commented on code in PR #3472: URL: https://github.com/apache/arrow-datafusion/pull/3472#discussion_r971415800 ## datafusion/core/src/physical_plan/planner.rs: ## @@ -1994,7 +1994,10 @@ mod tests { .project(vec![col("c1").in_list(list, false)])?

[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #3490: Make the function from_proto_binary_op public

2022-09-14 Thread GitBox
codecov-commenter commented on PR #3490: URL: https://github.com/apache/arrow-datafusion/pull/3490#issuecomment-1247443214 # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/3490?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_

[GitHub] [arrow-datafusion] liukun4515 opened a new issue, #3491: centralize the type coercion code for InList rule

2022-09-14 Thread GitBox
liukun4515 opened a new issue, #3491: URL: https://github.com/apache/arrow-datafusion/issues/3491 Do we need a common function in a new file like `in_list_rule.rs` to do the `coerce_types`? cc @alamb @andygrove _Originally posted by @liukun4515 in https://github.com/apache/arrow-d

[GitHub] [arrow-datafusion] liukun4515 commented on a diff in pull request #3485: add time_zone into ConfigOptions

2022-09-14 Thread GitBox
liukun4515 commented on code in PR #3485: URL: https://github.com/apache/arrow-datafusion/pull/3485#discussion_r971414223 ## datafusion/core/src/config.rs: ## @@ -102,6 +105,20 @@ impl ConfigDefinition { ScalarValue::UInt64(Some(default_value)), ) } +

[GitHub] [arrow-datafusion] liukun4515 commented on a diff in pull request #3485: add time_zone into ConfigOptions

2022-09-14 Thread GitBox
liukun4515 commented on code in PR #3485: URL: https://github.com/apache/arrow-datafusion/pull/3485#discussion_r971413944 ## datafusion/core/src/config.rs: ## @@ -102,6 +105,20 @@ impl ConfigDefinition { ScalarValue::UInt64(Some(default_value)), ) } +

[GitHub] [arrow-datafusion] liukun4515 commented on a diff in pull request #3485: add time_zone into ConfigOptions

2022-09-14 Thread GitBox
liukun4515 commented on code in PR #3485: URL: https://github.com/apache/arrow-datafusion/pull/3485#discussion_r971412997 ## datafusion/core/src/config.rs: ## @@ -102,6 +105,20 @@ impl ConfigDefinition { ScalarValue::UInt64(Some(default_value)), ) } +

[GitHub] [arrow] vibhatha commented on pull request #14071: ARROW-16424: [C++] Use Uri to parse substrait ReadRel file path

2022-09-14 Thread GitBox
vibhatha commented on PR #14071: URL: https://github.com/apache/arrow/pull/14071#issuecomment-1247436161 One more thing, regarding https://github.com/apache/arrow/actions/runs/3055769050/jobs/4929199537#step:5:916 Here the linter fails because of an unused import. Let's remove that. A

[GitHub] [arrow] kou commented on pull request #14102: ARROW-17635: [Python][CI] Sync conda recipe with the arrow-cpp feedstock

2022-09-14 Thread GitBox
kou commented on PR #14102: URL: https://github.com/apache/arrow/pull/14102#issuecomment-1247435937 We need LLVM headers to use Gandiva. (Some public Gandiva headers use LLVM headers.) If we build PyArrow with Gandiva support, we need LLVM as a build-time dependency. -- This is an aut

[GitHub] [arrow] vibhatha commented on pull request #14071: ARROW-16424: [C++] Use Uri to parse substrait ReadRel file path

2022-09-14 Thread GitBox
vibhatha commented on PR #14071: URL: https://github.com/apache/arrow/pull/14071#issuecomment-1247434494 > I can enable those tests too... though, I guess I don't know if it's better to keep C++ updates and python updates separate? I can just add the [Python] tag to the name.. Some o

[GitHub] [arrow] vibhatha commented on a diff in pull request #14123: ARROW-17061: [Python][Substrait] Acero consumer is unable to consume count function from substrait query plan

2022-09-14 Thread GitBox
vibhatha commented on code in PR #14123: URL: https://github.com/apache/arrow/pull/14123#discussion_r971406616 ## python/pyarrow/tests/test_substrait.py: ## @@ -165,3 +169,108 @@ def test_get_supported_functions(): 'functions_arithmetic.yaml', 'add')

[GitHub] [arrow-datafusion] liukun4515 commented on issue #3479: coercion between decimal and other types lacking, compared to other numeric types

2022-09-14 Thread GitBox
liukun4515 commented on issue #3479: URL: https://github.com/apache/arrow-datafusion/issues/3479#issuecomment-1247428363 For `select 'a' || 1.1::decimal;` I think this should be support through the casting from decimal to utf8 https://github.com/apache/arrow-datafusion/issues/3478 -- Th

[GitHub] [arrow] kou commented on a diff in pull request #14105: ARROW-17694: [C++] Remove std::optional backport

2022-09-14 Thread GitBox
kou commented on code in PR #14105: URL: https://github.com/apache/arrow/pull/14105#discussion_r971405623 ## dev/tasks/homebrew-formulae/autobrew/apache-arrow.rb: ## @@ -42,6 +42,7 @@ class ApacheArrow < Formula def install ENV.cxx11 args = %W[ + -DARROW_CXXFLA

[GitHub] [arrow-datafusion] liukun4515 commented on issue #3479: coercion between decimal and other types lacking, compared to other numeric types

2022-09-14 Thread GitBox
liukun4515 commented on issue #3479: URL: https://github.com/apache/arrow-datafusion/issues/3479#issuecomment-1247427294 The `NULL` in the SQL syntax is associated with the `DataType::NULL`. Now we don't support NULL data type with decimal data type yet. But I think it is easy to s

[GitHub] [arrow-datafusion] askoa opened a new pull request, #3490: Make the function from_proto_binary_op public

2022-09-14 Thread GitBox
askoa opened a new pull request, #3490: URL: https://github.com/apache/arrow-datafusion/pull/3490 # Which issue does this PR close? Closes #3489 # Are there any user-facing changes? No -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow-datafusion] liukun4515 commented on issue #3478: add support for casting string to decimal

2022-09-14 Thread GitBox
liukun4515 commented on issue #3478: URL: https://github.com/apache/arrow-datafusion/issues/3478#issuecomment-1247424923 about the cast, you can refer the https://github.com/apache/arrow-rs/issues/1043 https://github.com/apache/arrow-datafusion/issues/1443 we don't support the cas

[GitHub] [arrow-rs] ursabot commented on pull request #2713: Add overflow-checking variants of arithmetic scalar dyn kernels

2022-09-14 Thread GitBox
ursabot commented on PR #2713: URL: https://github.com/apache/arrow-rs/pull/2713#issuecomment-1247417574 Benchmark runs are scheduled for baseline = 2a0fc7703420f99d28141516cabdd0408a583dfc and contender = 7594db6367515473efdb130e7de91060079a4d88. 7594db6367515473efdb130e7de91060079a4d88 i

[GitHub] [arrow] ursabot commented on pull request #14112: ARROW-17716: [Docs] Remove IR documentation page

2022-09-14 Thread GitBox
ursabot commented on PR #14112: URL: https://github.com/apache/arrow/pull/14112#issuecomment-1247417636 Benchmark runs are scheduled for baseline = 9d33df19d9a98df5caf134f2792e5c81bca90ae3 and contender = 03cf0dd37287d776ae2ef3724038e0b70baa2eed. 03cf0dd37287d776ae2ef3724038e0b70baa2eed is

[GitHub] [arrow] kou commented on pull request #14125: ARROW-17728: [C++][Gandiva] Accept LLVM 15.0

2022-09-14 Thread GitBox
kou commented on PR #14125: URL: https://github.com/apache/arrow/pull/14125#issuecomment-1247415127 It seems that the Zstd's official CMake package uses `zstd::libzstd_shared` and `zstd::libzstd_static`. So our CMake configurations should follow it. (And our `Findzstd.cmake` should suppo

[GitHub] [arrow-rs] viirya closed issue #2712: Add overflow-checking variants of arithmetic scalar dyn kernels

2022-09-14 Thread GitBox
viirya closed issue #2712: Add overflow-checking variants of arithmetic scalar dyn kernels URL: https://github.com/apache/arrow-rs/issues/2712 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [arrow-rs] viirya merged pull request #2713: Add overflow-checking variants of arithmetic scalar dyn kernels

2022-09-14 Thread GitBox
viirya merged PR #2713: URL: https://github.com/apache/arrow-rs/pull/2713 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apach

[GitHub] [arrow-rs] viirya commented on pull request #2713: Add overflow-checking variants of arithmetic scalar dyn kernels

2022-09-14 Thread GitBox
viirya commented on PR #2713: URL: https://github.com/apache/arrow-rs/pull/2713#issuecomment-1247412914 Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

[GitHub] [arrow] kou merged pull request #14124: MINOR: [CI][Conan] Fix a typo

2022-09-14 Thread GitBox
kou merged PR #14124: URL: https://github.com/apache/arrow/pull/14124 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

[GitHub] [arrow-ballista] thinkharderdev commented on a diff in pull request #202: MINOR: Add tuning guide to user guide

2022-09-14 Thread GitBox
thinkharderdev commented on code in PR #202: URL: https://github.com/apache/arrow-ballista/pull/202#discussion_r971375987 ## docs/source/user-guide/tuning-guide.md: ## @@ -0,0 +1,61 @@ + + +# Tuning Guide + +## Partitions and Parallelism + +The goal of any distributed compute en

[GitHub] [arrow-rs] viirya commented on a diff in pull request #2713: Add overflow-checking variants of arithmetic scalar dyn kernels

2022-09-14 Thread GitBox
viirya commented on code in PR #2713: URL: https://github.com/apache/arrow-rs/pull/2713#discussion_r971363755 ## arrow/src/compute/kernels/arity.rs: ## @@ -162,6 +177,30 @@ where } } +/// Applies a fallible unary function to an array with primitive values. +pub fn try_un

[GitHub] [arrow-datafusion] askoa opened a new issue, #3489: Make `from_proto_binary_op` public

2022-09-14 Thread GitBox
askoa opened a new issue, #3489: URL: https://github.com/apache/arrow-datafusion/issues/3489 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** The function `from_proto_binary_op` converts `string` to `Operator`. This function is req

[GitHub] [arrow-rs] askoa opened a new pull request, #2729: include builder for RecordBatchOptions

2022-09-14 Thread GitBox
askoa opened a new pull request, #2729: URL: https://github.com/apache/arrow-rs/pull/2729 # Which issue does this PR close? Closes #2728 # What changes are included in this PR? Removed the current option of initializing using `Default` as the approach will work only within

[GitHub] [arrow-ballista] askoa commented on issue #201: Unsupported binary operator `StringConcat`

2022-09-14 Thread GitBox
askoa commented on issue #201: URL: https://github.com/apache/arrow-ballista/issues/201#issuecomment-1247356299 @andygrove Thanks for the response. I'll pick this one up. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [arrow-ballista] andygrove commented on issue #201: Unsupported binary operator `StringConcat`

2022-09-14 Thread GitBox
andygrove commented on issue #201: URL: https://github.com/apache/arrow-ballista/issues/201#issuecomment-1247354664 @askoa, I created a PR to add the missing operators, but I agree with your comments that we could leverage DataFusion rather than have duplicate code here. DataFusion has a `f

[GitHub] [arrow-ballista] andygrove commented on a diff in pull request #210: Add serde support for all binary operators in physical plan, including `StringConcat`

2022-09-14 Thread GitBox
andygrove commented on code in PR #210: URL: https://github.com/apache/arrow-ballista/pull/210#discussion_r971341022 ## ballista/rust/core/src/serde/mod.rs: ## @@ -213,13 +213,23 @@ pub(crate) fn from_proto_binary_op(op: &str) -> Result "Modulo" => Ok(Operator::Modulo)

[GitHub] [arrow-ballista] andygrove opened a new pull request, #210: MINOR: Add all binary ops in serde

2022-09-14 Thread GitBox
andygrove opened a new pull request, #210: URL: https://github.com/apache/arrow-ballista/pull/210 # Which issue does this PR close? Closes #. # Rationale for this change # What changes are included in this PR? # Are there any user-facing changes?

[GitHub] [arrow-ballista] andygrove commented on issue #201: Unsupported binary operator `StringConcat`

2022-09-14 Thread GitBox
andygrove commented on issue #201: URL: https://github.com/apache/arrow-ballista/issues/201#issuecomment-1247346993 Hi @askoa. The bug that I fixed was related to logical plan serialization. This is part of the `datafusion-proto` crate that Ballista uses. You are right though .. we also nee

[GitHub] [arrow] github-actions[bot] commented on pull request #14130: ARROW-17734: [Go] Implement Take for Lists and Dense Union

2022-09-14 Thread GitBox
github-actions[bot] commented on PR #14130: URL: https://github.com/apache/arrow/pull/14130#issuecomment-1247343931 :warning: Ticket **has no components in JIRA**, make sure you assign one. -- This is an automated message from the Apache Git Service. To respond to the message, please log o

[GitHub] [arrow] github-actions[bot] commented on pull request #14130: ARROW-17734: [Go] Implement Take for Lists and Dense Union

2022-09-14 Thread GitBox
github-actions[bot] commented on PR #14130: URL: https://github.com/apache/arrow/pull/14130#issuecomment-1247343907 https://issues.apache.org/jira/browse/ARROW-17734 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow] zeroshade commented on pull request #14130: ARROW-17734: [Go] Implement Take for Lists and Dense Union

2022-09-14 Thread GitBox
zeroshade commented on PR #14130: URL: https://github.com/apache/arrow/pull/14130#issuecomment-1247343845 This relies on #14127 which must be merged first -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [arrow] oleksandr-yatsuk commented on issue #14116: Python dict to map

2022-09-14 Thread GitBox
oleksandr-yatsuk commented on issue #14116: URL: https://github.com/apache/arrow/issues/14116#issuecomment-1247341574 @drin thank you for the response. Our input is JSON which is deserialized to python `dict` as I posted, it is not an array of tuples. Changing the `pyarrow_schema` to '

[GitHub] [arrow] rasnjo commented on a diff in pull request #14129: ARROW-17733: [C++] Take index_width into account when filling nulls in index buffer

2022-09-14 Thread GitBox
rasnjo commented on code in PR #14129: URL: https://github.com/apache/arrow/pull/14129#discussion_r971330457 ## cpp/src/arrow/array/concatenate_test.cc: ## @@ -539,4 +539,13 @@ TEST_F(ConcatenateTest, OffsetOverflow) { ASSERT_RAISES(Invalid, Concatenate({fake_long, fake_long}

[GitHub] [arrow] ursabot commented on pull request #14081: ARROW-17631: [Java] Propagate table/columns comments into Arrow Schema

2022-09-14 Thread GitBox
ursabot commented on PR #14081: URL: https://github.com/apache/arrow/pull/14081#issuecomment-1247338976 Benchmark runs are scheduled for baseline = fac08404aa7018e6bcab515125ca99856d624d89 and contender = 9d33df19d9a98df5caf134f2792e5c81bca90ae3. 9d33df19d9a98df5caf134f2792e5c81bca90ae3 is

[GitHub] [arrow-rs] tustvold commented on pull request #2693: Split out arrow-buffer crate (#2594)

2022-09-14 Thread GitBox
tustvold commented on PR #2693: URL: https://github.com/apache/arrow-rs/pull/2693#issuecomment-1247334083 At least in theory LTO is only needed for cross-crate inlining of non-inline annotated functions, and even then generic functions should be inlined anyway as a consequence of monomorphi

[GitHub] [arrow] tschaub commented on pull request #14026: ARROW-17584: [Go] Use unsafe.Slice from Go 1.17

2022-09-14 Thread GitBox
tschaub commented on PR #14026: URL: https://github.com/apache/arrow/pull/14026#issuecomment-1247332219 @zeroshade - Yeah, I'll stick with the `go` build for now. Hoping at some point to get a more web-friendly build out of TinyGo. -- This is an automated message from the Apache Git Serv

[GitHub] [arrow] lidavidm commented on a diff in pull request #14129: ARROW-17733: [C++] Take index_width into account when filling nulls in index buffer

2022-09-14 Thread GitBox
lidavidm commented on code in PR #14129: URL: https://github.com/apache/arrow/pull/14129#discussion_r971323100 ## cpp/src/arrow/array/concatenate_test.cc: ## @@ -539,4 +539,13 @@ TEST_F(ConcatenateTest, OffsetOverflow) { ASSERT_RAISES(Invalid, Concatenate({fake_long, fake_lon

[GitHub] [arrow-rs] sunchao commented on a diff in pull request #2713: Add overflow-checking variants of arithmetic scalar dyn kernels

2022-09-14 Thread GitBox
sunchao commented on code in PR #2713: URL: https://github.com/apache/arrow-rs/pull/2713#discussion_r971322647 ## arrow/src/compute/kernels/arity.rs: ## @@ -162,6 +177,30 @@ where } } +/// Applies a fallible unary function to an array with primitive values. +pub fn try_u

[GitHub] [arrow] github-actions[bot] commented on pull request #14129: ARROW-17733: [C++] Take index_width into account when filling nulls in index buffer

2022-09-14 Thread GitBox
github-actions[bot] commented on PR #14129: URL: https://github.com/apache/arrow/pull/14129#issuecomment-1247323212 :warning: Ticket **has not been started in JIRA**, please click 'Start Progress'. -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [arrow] github-actions[bot] commented on pull request #14129: ARROW-17733: [C++] Take index_width into account when filling nulls in index buffer

2022-09-14 Thread GitBox
github-actions[bot] commented on PR #14129: URL: https://github.com/apache/arrow/pull/14129#issuecomment-1247323195 https://issues.apache.org/jira/browse/ARROW-17733 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow] rasnjo opened a new pull request, #14129: ARROW-17733: [C++] Take index_width into account when filling nulls in index buffer

2022-09-14 Thread GitBox
rasnjo opened a new pull request, #14129: URL: https://github.com/apache/arrow/pull/14129 Take into account index_width when offsetting by position into out_data. Otherwise we offset position bytes into the array, but we want to offset position places into the array. -- This is an automa

[GitHub] [arrow] richtia commented on a diff in pull request #14123: ARROW-17061: [Python][Substrait] Acero consumer is unable to consume count function from substrait query plan

2022-09-14 Thread GitBox
richtia commented on code in PR #14123: URL: https://github.com/apache/arrow/pull/14123#discussion_r971308357 ## python/pyarrow/tests/test_substrait.py: ## @@ -165,3 +169,108 @@ def test_get_supported_functions(): 'functions_arithmetic.yaml', 'add')

[GitHub] [arrow-rs] askoa commented on issue #2728: API for more ergonomic construction of `RecordBatchOptions`

2022-09-14 Thread GitBox
askoa commented on issue #2728: URL: https://github.com/apache/arrow-rs/issues/2728#issuecomment-1247316193 `let options = RecordBatchOptions{ row_count:Some(row_count), ..Default::default() };` I don't think the above option is possible due to non-exhaustive c

[GitHub] [arrow-rs] andygrove closed issue #1966: Add GPU support

2022-09-14 Thread GitBox
andygrove closed issue #1966: Add GPU support URL: https://github.com/apache/arrow-rs/issues/1966 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github

[GitHub] [arrow-rs] andygrove commented on issue #1966: Add GPU support

2022-09-14 Thread GitBox
andygrove commented on issue #1966: URL: https://github.com/apache/arrow-rs/issues/1966#issuecomment-1247312743 I am closing this issue for now because I don't have the bandwidth to work on this. I'm also no longer convinced that this is the best strategy for achieving GPU-accelerated queri

[GitHub] [arrow-rs] ursabot commented on pull request #2701: Add support of sorting dictionary of other primitive arrays

2022-09-14 Thread GitBox
ursabot commented on PR #2701: URL: https://github.com/apache/arrow-rs/pull/2701#issuecomment-1247308125 Benchmark runs are scheduled for baseline = 51466634f11b7d965ca3c912835c91e0f84a6c92 and contender = 2a0fc7703420f99d28141516cabdd0408a583dfc. 2a0fc7703420f99d28141516cabdd0408a583dfc i

[GitHub] [arrow-datafusion] kmitchener commented on pull request #3482: Address performance/execution plan of TPCH query 19

2022-09-14 Thread GitBox
kmitchener commented on PR #3482: URL: https://github.com/apache/arrow-datafusion/pull/3482#issuecomment-1247307994 @Dandandan adding verification of TPCH results to the CI would be nice. I was thinking of suggesting that once all the tests pass and we can verify results against the answer

[GitHub] [arrow-datafusion] avantgardnerio commented on pull request #3482: Address performance/execution plan of TPCH query 19

2022-09-14 Thread GitBox
avantgardnerio commented on PR #3482: URL: https://github.com/apache/arrow-datafusion/pull/3482#issuecomment-1247305961 > I think we should think of ways of adding the tpch queries to the CI? There are tests for several of them here: https://github.com/apache/arrow-datafusion/blob/5f

[GitHub] [arrow-rs] viirya commented on pull request #2701: Add support of sorting dictionary of other primitive arrays

2022-09-14 Thread GitBox
viirya commented on PR #2701: URL: https://github.com/apache/arrow-rs/pull/2701#issuecomment-1247302243 Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

  1   2   3   4   >