[GitHub] [arrow] palak-9202 commented on a diff in pull request #13428: ARROW-16918: [Gandiva][C++] Adding UTC-local timezone conversion functions

2022-07-12 Thread GitBox
palak-9202 commented on code in PR #13428: URL: https://github.com/apache/arrow/pull/13428#discussion_r919698275 ## cpp/src/gandiva/gdv_function_stubs.cc: ## @@ -611,6 +611,51 @@ int32_t gdv_fn_cast_intervalyear_utf8_int32(int64_t context_ptr, int64_t holder_ auto* holder =

[GitHub] [arrow] palak-9202 commented on a diff in pull request #13428: ARROW-16918: [Gandiva][C++] Adding UTC-local timezone conversion functions

2022-07-12 Thread GitBox
palak-9202 commented on code in PR #13428: URL: https://github.com/apache/arrow/pull/13428#discussion_r919694197 ## cpp/src/gandiva/gdv_function_stubs_test.cc: ## @@ -993,4 +993,61 @@ TEST(TestGdvFnStubs, TestTranslate) { EXPECT_EQ(expected, std::string(result, out_len)); }

[GitHub] [arrow-datafusion] liurenjie1024 commented on issue #2633: Introducing a new optimizer framework for datafusion.

2022-07-12 Thread GitBox
liurenjie1024 commented on issue #2633: URL: https://github.com/apache/arrow-datafusion/issues/2633#issuecomment-1182809782 @andygrove @alamb PTAL when you are available. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[GitHub] [arrow] djnavarro commented on pull request #12154: ARROW-14821: [R] Implement bindings for lubridate's floor_date, ceiling_date, and round_date

2022-07-12 Thread GitBox
djnavarro commented on PR #12154: URL: https://github.com/apache/arrow/pull/12154#issuecomment-1182809818 Thanks @rok! Yes, help with resolving conflicts on the C++ code would be appreciated if you can spare the time. I'm hoping to take care of the remaining issues on the R side later this

[GitHub] [arrow] cyb70289 commented on a diff in pull request #13583: ARROW-16807: [C++][R] count distinct incorrectly merges state

2022-07-12 Thread GitBox
cyb70289 commented on code in PR #13583: URL: https://github.com/apache/arrow/pull/13583#discussion_r919682027 ## cpp/src/arrow/util/hashing.h: ## @@ -428,6 +428,22 @@ class ScalarMemoTable : public MemoTable { value, [](int32_t i) {}, [](int32_t i) {}, out_memo_index);

[GitHub] [arrow] kou merged pull request #13579: ARROW-17050: [CI] Use -y flag on mamba install to not ask for confirmation

2022-07-12 Thread GitBox
kou merged PR #13579: URL: https://github.com/apache/arrow/pull/13579 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

[GitHub] [arrow] kou commented on a diff in pull request #13428: ARROW-16918: [Gandiva][C++] Adding UTC-local timezone conversion functions

2022-07-12 Thread GitBox
kou commented on code in PR #13428: URL: https://github.com/apache/arrow/pull/13428#discussion_r919659653 ## cpp/src/gandiva/gdv_function_stubs_test.cc: ## @@ -993,4 +993,61 @@ TEST(TestGdvFnStubs, TestTranslate) { EXPECT_EQ(expected, std::string(result, out_len)); } +TEST

[GitHub] [arrow] github-actions[bot] commented on pull request #13591: ARROW-17046 [Python] improve documentation of pyarrow.parquet.write_to_dataset function

2022-07-12 Thread GitBox
github-actions[bot] commented on PR #13591: URL: https://github.com/apache/arrow/pull/13591#issuecomment-1182773191 :warning: Ticket **has not been started in JIRA**, please click 'Start Progress'. -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [arrow] github-actions[bot] commented on pull request #13591: ARROW-17046 [Python] improve documentation of pyarrow.parquet.write_to_dataset function

2022-07-12 Thread GitBox
github-actions[bot] commented on PR #13591: URL: https://github.com/apache/arrow/pull/13591#issuecomment-1182773179 https://issues.apache.org/jira/browse/ARROW-17046 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow] mirkhosro opened a new pull request, #13591: ARROW-17046 [Python] improve documentation of pyarrow.parquet.write_to_dataset function

2022-07-12 Thread GitBox
mirkhosro opened a new pull request, #13591: URL: https://github.com/apache/arrow/pull/13591 This patch is an attempt to make the documentation of `pyarrow.parquet.write_to_dataset` function clearer so that the user can easily learn - Which parameters are used by the new code path and wh

[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #2893: Scalar list preserve element name

2022-07-12 Thread GitBox
codecov-commenter commented on PR #2893: URL: https://github.com/apache/arrow-datafusion/pull/2893#issuecomment-1182771886 # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/2893?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_

[GitHub] [arrow] kou merged pull request #13590: ARROW-17063: [GLib] Add examples to send/receive record batches via network

2022-07-12 Thread GitBox
kou merged PR #13590: URL: https://github.com/apache/arrow/pull/13590 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

[GitHub] [arrow] kou commented on pull request #13590: ARROW-17063: [GLib] Add examples to send/receive record batches via network

2022-07-12 Thread GitBox
kou commented on PR #13590: URL: https://github.com/apache/arrow/pull/13590#issuecomment-1182764966 +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mai

[GitHub] [arrow-rs] Ted-Jiang commented on a diff in pull request #2044: Support peek_next_page() and skip_next_page in serialized_reader.

2022-07-12 Thread GitBox
Ted-Jiang commented on code in PR #2044: URL: https://github.com/apache/arrow-rs/pull/2044#discussion_r919644656 ## parquet/src/file/serialized_reader.rs: ## @@ -481,6 +505,22 @@ pub struct SerializedPageReader { // Column chunk type. physical_type: Type, + +// t

[GitHub] [arrow-datafusion] comphead commented on pull request #2893: Scalar list preserve element name

2022-07-12 Thread GitBox
comphead commented on PR #2893: URL: https://github.com/apache/arrow-datafusion/pull/2893#issuecomment-1182756614 @alamb you mentioned in #2840 the encoding/decoding issue still exists, please give more details, I'll try to fix it in this PR -- This is an automated message from the Apach

[GitHub] [arrow-datafusion] comphead closed pull request #2840: Preserve list element name

2022-07-12 Thread GitBox
comphead closed pull request #2840: Preserve list element name URL: https://github.com/apache/arrow-datafusion/pull/2840 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

[GitHub] [arrow-datafusion] comphead opened a new pull request, #2893: Scalar list preserve element name

2022-07-12 Thread GitBox
comphead opened a new pull request, #2893: URL: https://github.com/apache/arrow-datafusion/pull/2893 # Which issue does this PR close? Closes #2450 . # Rationale for this change # What changes are included in this PR? Preserve list element name # Ar

[GitHub] [arrow] projjal commented on a diff in pull request #13446: ARROW-16917: [C++][Gandiva] Add a Secondary Cache to cache gandiva object code

2022-07-12 Thread GitBox
projjal commented on code in PR #13446: URL: https://github.com/apache/arrow/pull/13446#discussion_r919630065 ## cpp/src/gandiva/projector.cc: ## @@ -75,6 +92,17 @@ Status Projector::Make(SchemaPtr schema, const ExpressionVector& exprs, // Verify if previous projector obj co

[GitHub] [arrow-ballista] liukun4515 opened a new issue, #85: Support trace id for each query

2022-07-12 Thread GitBox
liukun4515 opened a new issue, #85: URL: https://github.com/apache/arrow-ballista/issues/85 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** As a distributed query engine, many sql query will submit to the scheduler, and the schedu

[GitHub] [arrow] projjal commented on a diff in pull request #13446: ARROW-16917: [C++][Gandiva] Add a Secondary Cache to cache gandiva object code

2022-07-12 Thread GitBox
projjal commented on code in PR #13446: URL: https://github.com/apache/arrow/pull/13446#discussion_r919624157 ## cpp/src/gandiva/filter.cc: ## @@ -37,8 +37,15 @@ Filter::Filter(std::unique_ptr llvm_generator, SchemaPtr schema, Filter::~Filter() {} +Status Filter::Make(Sche

[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #2892: Adds optional serde support to datafusion-proto

2022-07-12 Thread GitBox
codecov-commenter commented on PR #2892: URL: https://github.com/apache/arrow-datafusion/pull/2892#issuecomment-1182733897 # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/2892?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_

[GitHub] [arrow-rs] Ted-Jiang commented on a diff in pull request #2044: Support peek_next_page() and skip_next_page in serialized_reader.

2022-07-12 Thread GitBox
Ted-Jiang commented on code in PR #2044: URL: https://github.com/apache/arrow-rs/pull/2044#discussion_r919620826 ## parquet/src/file/serialized_reader.rs: ## @@ -526,6 +543,34 @@ impl SerializedPageReader { page_offset_index: None, seen_num_data_pages:

[GitHub] [arrow-rs] Ted-Jiang commented on a diff in pull request #2044: Support peek_next_page() and skip_next_page in serialized_reader.

2022-07-12 Thread GitBox
Ted-Jiang commented on code in PR #2044: URL: https://github.com/apache/arrow-rs/pull/2044#discussion_r919620637 ## parquet/src/file/serialized_reader.rs: ## @@ -481,6 +505,22 @@ pub struct SerializedPageReader { // Column chunk type. physical_type: Type, + +// t

[GitHub] [arrow-rs] Ted-Jiang commented on a diff in pull request #2044: Support peek_next_page() and skip_next_page in serialized_reader.

2022-07-12 Thread GitBox
Ted-Jiang commented on code in PR #2044: URL: https://github.com/apache/arrow-rs/pull/2044#discussion_r919618461 ## parquet/src/file/serialized_reader.rs: ## @@ -526,6 +543,34 @@ impl SerializedPageReader { page_offset_index: None, seen_num_data_pages:

[GitHub] [arrow-rs] Ted-Jiang commented on a diff in pull request #2044: Support peek_next_page() and skip_next_page in serialized_reader.

2022-07-12 Thread GitBox
Ted-Jiang commented on code in PR #2044: URL: https://github.com/apache/arrow-rs/pull/2044#discussion_r919617920 ## parquet/src/file/serialized_reader.rs: ## @@ -526,6 +543,34 @@ impl SerializedPageReader { page_offset_index: None, seen_num_data_pages:

[GitHub] [arrow-rs] Ted-Jiang commented on a diff in pull request #2044: Support peek_next_page() and skip_next_page in serialized_reader.

2022-07-12 Thread GitBox
Ted-Jiang commented on code in PR #2044: URL: https://github.com/apache/arrow-rs/pull/2044#discussion_r919617920 ## parquet/src/file/serialized_reader.rs: ## @@ -526,6 +543,34 @@ impl SerializedPageReader { page_offset_index: None, seen_num_data_pages:

[GitHub] [arrow-rs] Ted-Jiang commented on a diff in pull request #2044: Support peek_next_page() and skip_next_page in serialized_reader.

2022-07-12 Thread GitBox
Ted-Jiang commented on code in PR #2044: URL: https://github.com/apache/arrow-rs/pull/2044#discussion_r919616909 ## parquet/src/file/serialized_reader.rs: ## @@ -498,9 +566,23 @@ impl SerializedPageReader { seen_num_values: 0, decompressor,

[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #2892: Adds optional serde support to datafusion-proto

2022-07-12 Thread GitBox
tustvold commented on code in PR #2892: URL: https://github.com/apache/arrow-datafusion/pull/2892#discussion_r919610004 ## datafusion/proto/src/lib.rs: ## @@ -75,19 +78,32 @@ mod roundtrip_tests { use std::fmt::Formatter; use std::sync::Arc; +#[cfg(feature = "ser

[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #2892: Adds optional serde support to datafusion-proto

2022-07-12 Thread GitBox
tustvold commented on code in PR #2892: URL: https://github.com/apache/arrow-datafusion/pull/2892#discussion_r919610004 ## datafusion/proto/src/lib.rs: ## @@ -75,19 +78,32 @@ mod roundtrip_tests { use std::fmt::Formatter; use std::sync::Arc; +#[cfg(feature = "ser

[GitHub] [arrow-datafusion] tustvold opened a new pull request, #2892: Adds optional serde support to datafusion-proto

2022-07-12 Thread GitBox
tustvold opened a new pull request, #2892: URL: https://github.com/apache/arrow-datafusion/pull/2892 # Which issue does this PR close? Closes #2889 # Rationale for this change This provides a way to serialize the datafusion-proto messages to any serde serializer, e.g. J

[GitHub] [arrow-rs] codecov-commenter commented on pull request #2044: Support peek_next_page() and skip_next_page in serialized_reader.

2022-07-12 Thread GitBox
codecov-commenter commented on PR #2044: URL: https://github.com/apache/arrow-rs/pull/2044#issuecomment-1182711168 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/2044?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+S

[GitHub] [arrow-datafusion] andygrove commented on a diff in pull request #2881: Add h2o bench groupby queries

2022-07-12 Thread GitBox
andygrove commented on code in PR #2881: URL: https://github.com/apache/arrow-datafusion/pull/2881#discussion_r919606784 ## benchmarks/src/bin/h2o.rs: ## @@ -0,0 +1,92 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

[GitHub] [arrow] alexdesiqueira commented on pull request #13144: ARROW-16356: [Python] Expose RandomAccessFile::GetStream

2022-07-12 Thread GitBox
alexdesiqueira commented on PR #13144: URL: https://github.com/apache/arrow/pull/13144#issuecomment-1182707497 @pitrou sorry about the delay as well, life got in the way :slightly_smiling_face: -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [arrow] alexdesiqueira commented on pull request #13144: ARROW-16356: [Python] Expose RandomAccessFile::GetStream

2022-07-12 Thread GitBox
alexdesiqueira commented on PR #13144: URL: https://github.com/apache/arrow/pull/13144#issuecomment-1182706778 @pitrou sorry about the delay also; I'll try to fix it now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [arrow-datafusion] ursabot commented on pull request #2887: Add an ID generator in preparation for PR 2885

2022-07-12 Thread GitBox
ursabot commented on PR #2887: URL: https://github.com/apache/arrow-datafusion/pull/2887#issuecomment-1182705306 Benchmark runs are scheduled for baseline = eed77a286cac497683ca38fd75b6a134455cb1c2 and contender = 6a5de4fe08597896ab6375e3e4b76c5744dcfba7. 6a5de4fe08597896ab6375e3e4b76c574

[GitHub] [arrow-datafusion] andygrove commented on a diff in pull request #2881: Add h2o bench groupby queries

2022-07-12 Thread GitBox
andygrove commented on code in PR #2881: URL: https://github.com/apache/arrow-datafusion/pull/2881#discussion_r919602051 ## benchmarks/src/bin/h2o.rs: ## @@ -0,0 +1,92 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

[GitHub] [arrow-datafusion] andygrove closed issue #2886: It would be nice to have a way to generate unique IDs in optimizer rules

2022-07-12 Thread GitBox
andygrove closed issue #2886: It would be nice to have a way to generate unique IDs in optimizer rules URL: https://github.com/apache/arrow-datafusion/issues/2886 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[GitHub] [arrow-datafusion] andygrove merged pull request #2887: Add an ID generator in preparation for PR 2885

2022-07-12 Thread GitBox
andygrove merged PR #2887: URL: https://github.com/apache/arrow-datafusion/pull/2887 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[GitHub] [arrow] github-actions[bot] commented on pull request #13590: ARROW-17063: [GLib] Add examples to send/recive record batches via network

2022-07-12 Thread GitBox
github-actions[bot] commented on PR #13590: URL: https://github.com/apache/arrow/pull/13590#issuecomment-1182704153 :warning: Ticket **has not been started in JIRA**, please click 'Start Progress'. -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [arrow] github-actions[bot] commented on pull request #13590: ARROW-17063: [GLib] Add examples to send/recive record batches via network

2022-07-12 Thread GitBox
github-actions[bot] commented on PR #13590: URL: https://github.com/apache/arrow/pull/13590#issuecomment-1182704138 https://issues.apache.org/jira/browse/ARROW-17063 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow-rs] Ted-Jiang commented on a diff in pull request #2044: Support peek_next_page() and skip_next_page in serialized_reader.

2022-07-12 Thread GitBox
Ted-Jiang commented on code in PR #2044: URL: https://github.com/apache/arrow-rs/pull/2044#discussion_r919599314 ## parquet/src/file/serialized_reader.rs: ## @@ -481,6 +505,22 @@ pub struct SerializedPageReader { // Column chunk type. physical_type: Type, + +// t

[GitHub] [arrow-rs] codecov-commenter commented on pull request #2041: Add support of converting `FixedSizeBinaryArray` to `DecimalArray`

2022-07-12 Thread GitBox
codecov-commenter commented on PR #2041: URL: https://github.com/apache/arrow-rs/pull/2041#issuecomment-1182690086 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/2041?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+S

[GitHub] [arrow-rs] HaoYang670 commented on pull request #2041: Add support of converting `FixedSizeBinaryArray` to `DecimalArray`

2022-07-12 Thread GitBox
HaoYang670 commented on PR #2041: URL: https://github.com/apache/arrow-rs/pull/2041#issuecomment-1182684896 Hmm, the failure seems related to the CI env. https://github.com/apache/arrow-rs/runs/7313219137?check_suite_focus=true#step:3:12 Also find the same failrue in https://github

[GitHub] [arrow-rs] tustvold commented on pull request #2041: Add support of converting `FixedSizeBinaryArray` to `DecimalArray`

2022-07-12 Thread GitBox
tustvold commented on PR #2041: URL: https://github.com/apache/arrow-rs/pull/2041#issuecomment-1182674582 > if the features=force_validate is set. This is very much a testing only feature, this enables validation **everywhere** and will absolutely tank performance. It is a MIRI-light

[GitHub] [arrow-rs] HaoYang670 commented on pull request #2041: Add support of converting `FixedSizeBinaryArray` to `DecimalArray`

2022-07-12 Thread GitBox
HaoYang670 commented on PR #2041: URL: https://github.com/apache/arrow-rs/pull/2041#issuecomment-1182673187 > Looks good to me, although I am a little bit confused where we have ended up w.r.t validating decimals in ArrayData Thank you @tustvold. The decimal values will be validated i

[GitHub] [arrow] westonpace commented on pull request #13426: ARROW-16894: [C++] Add Benchmarks for Asof Join Node

2022-07-12 Thread GitBox
westonpace commented on PR #13426: URL: https://github.com/apache/arrow/pull/13426#issuecomment-1182655048 I'll take a closer look at this tomorrow. Another possibility for the python/c++ discrepancy would be to move the benchmarks themselves into python. I don't think we have any python

[GitHub] [arrow] westonpace merged pull request #13585: ARROW-17060: [C++] Change AsOfJoinNode to use ExecContext's Memory Pool

2022-07-12 Thread GitBox
westonpace merged PR #13585: URL: https://github.com/apache/arrow/pull/13585 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.ap

[GitHub] [arrow] kaoutherab commented on pull request #12706: ARROW-15961: [C++] Check nullability when validating fields on batches or struct arrays

2022-07-12 Thread GitBox
kaoutherab commented on PR #12706: URL: https://github.com/apache/arrow/pull/12706#issuecomment-1182635329 Sure, I'll follow up on this. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [arrow-datafusion] andygrove commented on issue #2890: Inconsistent type coercion rules with comparison expressions

2022-07-12 Thread GitBox
andygrove commented on issue #2890: URL: https://github.com/apache/arrow-datafusion/issues/2890#issuecomment-1182632382 @alamb DataFusion is inconsistent with itself as to when it adds implicit conversions or not. I think it would be better to not have implicit conversions at all and keep

[GitHub] [arrow-rs] ursabot commented on pull request #2047: Remove null count from write_batch_with_statistics

2022-07-12 Thread GitBox
ursabot commented on PR #2047: URL: https://github.com/apache/arrow-rs/pull/2047#issuecomment-1182596785 Benchmark runs are scheduled for baseline = a2f223c7d9ca525192550ff8e044a3e1b5dabeb0 and contender = 88e0de5d661def7d7a45e4bc51314a366d017dda. 88e0de5d661def7d7a45e4bc51314a366d017dda i

[GitHub] [arrow] kou merged pull request #13580: ARROW-16733: [C++] Bump vendored version of opentelemetry-cpp and opentelemetry-proto

2022-07-12 Thread GitBox
kou merged PR #13580: URL: https://github.com/apache/arrow/pull/13580 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

[GitHub] [arrow-rs] tustvold closed issue #2046: Don't double-count nulls in write_batch_with_statistics

2022-07-12 Thread GitBox
tustvold closed issue #2046: Don't double-count nulls in write_batch_with_statistics URL: https://github.com/apache/arrow-rs/issues/2046 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [arrow-rs] tustvold merged pull request #2047: Remove null count from write_batch_with_statistics

2022-07-12 Thread GitBox
tustvold merged PR #2047: URL: https://github.com/apache/arrow-rs/pull/2047 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

[GitHub] [arrow-datafusion] andygrove commented on issue #2890: Inconsistent type coercion rules with comparison expressions

2022-07-12 Thread GitBox
andygrove commented on issue #2890: URL: https://github.com/apache/arrow-datafusion/issues/2890#issuecomment-1182585938 Postgres doe not add implicit casts in any of these cases: ``` andy=# select * from type_test where c0 = c1; ERROR: operator does not exist: integer = charact

[GitHub] [arrow] drin commented on pull request #13583: ARROW-16807: count distinct incorrectly merges state

2022-07-12 Thread GitBox
drin commented on PR #13583: URL: https://github.com/apache/arrow/pull/13583#issuecomment-1182562069 I just realized that this introduces (or maybe just exposes) a bug when calling this function on scalar inputs. If the input is a scalar, `non_nulls` is incremented without changing state. T

[GitHub] [arrow] lidavidm commented on a diff in pull request #13520: ARROW-14889: [C++] GCS tests hang if testbench not installed

2022-07-12 Thread GitBox
lidavidm commented on code in PR #13520: URL: https://github.com/apache/arrow/pull/13520#discussion_r919474247 ## cpp/src/arrow/filesystem/gcsfs_test.cc: ## @@ -106,8 +110,27 @@ class GcsTestbench : public ::testing::Environment { continue; } - server_proc

[GitHub] [arrow] michalursa commented on a diff in pull request #13493: ARROW-14182: [C++][Compute] Hash Join performance improvement v2

2022-07-12 Thread GitBox
michalursa commented on code in PR #13493: URL: https://github.com/apache/arrow/pull/13493#discussion_r919465384 ## cpp/src/arrow/compute/exec/hash_join_benchmark.cc: ## @@ -132,7 +134,7 @@ class JoinBenchmark { left_keys, *r_batches_with_schema.

[GitHub] [arrow-datafusion] alamb commented on pull request #2875: Fix casts of `ScalarValue::Utf8` to `DictionaryArray`

2022-07-12 Thread GitBox
alamb commented on PR #2875: URL: https://github.com/apache/arrow-datafusion/pull/2875#issuecomment-1182543735 Update: Doing it properly in https://github.com/apache/arrow-datafusion/pull/2891 is showing good promise -- This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow-datafusion] alamb opened a new pull request, #2891: Implement `ScalarValue::Dictionary` and preserve type through conversion back/forth to Array

2022-07-12 Thread GitBox
alamb opened a new pull request, #2891: URL: https://github.com/apache/arrow-datafusion/pull/2891 # Which issue does this PR close? Closes https://github.com/apache/arrow-datafusion/issues/2874 Closes https://github.com/apache/arrow-datafusion/pull/2875 Closes https://github.com/

[GitHub] [arrow-datafusion] andygrove opened a new issue, #2890: Inconsistent type coercion rules with comparison expressions

2022-07-12 Thread GitBox
andygrove opened a new issue, #2890: URL: https://github.com/apache/arrow-datafusion/issues/2890 **Describe the bug** I have a table with an int32 and a string column. ``` ❯ select c3, c4 from data limit 2; ++--+ | c3 | c4 | +---

[GitHub] [arrow] michalursa commented on a diff in pull request #13493: ARROW-14182: [C++][Compute] Hash Join performance improvement v2

2022-07-12 Thread GitBox
michalursa commented on code in PR #13493: URL: https://github.com/apache/arrow/pull/13493#discussion_r919453444 ## cpp/src/arrow/compute/exec/swiss_join.h: ## @@ -0,0 +1,758 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreem

[GitHub] [arrow] wjones127 commented on a diff in pull request #13520: ARROW-14889: [C++] GCS tests hang if testbench not installed

2022-07-12 Thread GitBox
wjones127 commented on code in PR #13520: URL: https://github.com/apache/arrow/pull/13520#discussion_r919452500 ## cpp/src/arrow/filesystem/gcsfs_test.cc: ## @@ -106,8 +110,27 @@ class GcsTestbench : public ::testing::Environment { continue; } - server_pro

[GitHub] [arrow] wjones127 commented on a diff in pull request #13520: ARROW-14889: [C++] GCS tests hang if testbench not installed

2022-07-12 Thread GitBox
wjones127 commented on code in PR #13520: URL: https://github.com/apache/arrow/pull/13520#discussion_r919452500 ## cpp/src/arrow/filesystem/gcsfs_test.cc: ## @@ -106,8 +110,27 @@ class GcsTestbench : public ::testing::Environment { continue; } - server_pro

[GitHub] [arrow] wjones127 commented on a diff in pull request #13520: ARROW-14889: [C++] GCS tests hang if testbench not installed

2022-07-12 Thread GitBox
wjones127 commented on code in PR #13520: URL: https://github.com/apache/arrow/pull/13520#discussion_r919452500 ## cpp/src/arrow/filesystem/gcsfs_test.cc: ## @@ -106,8 +110,27 @@ class GcsTestbench : public ::testing::Environment { continue; } - server_pro

[GitHub] [arrow] wjones127 commented on a diff in pull request #13520: ARROW-14889: [C++] GCS tests hang if testbench not installed

2022-07-12 Thread GitBox
wjones127 commented on code in PR #13520: URL: https://github.com/apache/arrow/pull/13520#discussion_r919451986 ## cpp/src/arrow/filesystem/gcsfs_test.cc: ## @@ -106,8 +110,27 @@ class GcsTestbench : public ::testing::Environment { continue; } - server_pro

[GitHub] [arrow-rs] codecov-commenter commented on pull request #2056: Avoid creating null buffer for BooleanArray if null count is zero

2022-07-12 Thread GitBox
codecov-commenter commented on PR #2056: URL: https://github.com/apache/arrow-rs/pull/2056#issuecomment-1182528686 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/2056?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+S

[GitHub] [arrow-rs] jhorstmann commented on issue #2054: Inconsistent Builder Constructors

2022-07-12 Thread GitBox
jhorstmann commented on issue #2054: URL: https://github.com/apache/arrow-rs/issues/2054#issuecomment-1182527781 Sounds good, especially forcing users to specify both item and nested capacities for List/String/Binary arrays. My gut feeling also says that 1024 could be a good default, I'm as

[GitHub] [arrow] michalursa commented on a diff in pull request #13493: ARROW-14182: [C++][Compute] Hash Join performance improvement v2

2022-07-12 Thread GitBox
michalursa commented on code in PR #13493: URL: https://github.com/apache/arrow/pull/13493#discussion_r919450010 ## cpp/src/arrow/compute/exec/swiss_join.cc: ## @@ -0,0 +1,2545 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agre

[GitHub] [arrow] lidavidm commented on a diff in pull request #13520: ARROW-14889: [C++] GCS tests hang if testbench not installed

2022-07-12 Thread GitBox
lidavidm commented on code in PR #13520: URL: https://github.com/apache/arrow/pull/13520#discussion_r919446134 ## cpp/src/arrow/filesystem/gcsfs_test.cc: ## @@ -106,8 +110,27 @@ class GcsTestbench : public ::testing::Environment { continue; } - server_proc

[GitHub] [arrow] toddfarmer commented on a diff in pull request #13543: ARROW-17003: [Java][Docs] Document arrow-jdbc adapter

2022-07-12 Thread GitBox
toddfarmer commented on code in PR #13543: URL: https://github.com/apache/arrow/pull/13543#discussion_r919442634 ## docs/source/java/jdbc.rst: ## @@ -0,0 +1,169 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the

[GitHub] [arrow] wjones127 commented on a diff in pull request #13520: ARROW-14889: [C++] GCS tests hang if testbench not installed

2022-07-12 Thread GitBox
wjones127 commented on code in PR #13520: URL: https://github.com/apache/arrow/pull/13520#discussion_r919441252 ## cpp/src/arrow/filesystem/gcsfs_test.cc: ## @@ -106,8 +110,27 @@ class GcsTestbench : public ::testing::Environment { continue; } - server_pro

[GitHub] [arrow-rs] jhorstmann opened a new pull request, #2056: Avoid creating null buffer for BooleanArray if null count is zero

2022-07-12 Thread GitBox
jhorstmann opened a new pull request, #2056: URL: https://github.com/apache/arrow-rs/pull/2056 # Which issue does this PR close? Closes #2055. # Rationale for this change # What changes are included in this PR? # Are there any user-facing c

[GitHub] [arrow-rs] jhorstmann opened a new issue, #2055: BooleanArray::from_iter should omit validity buffer if all values are valid

2022-07-12 Thread GitBox
jhorstmann opened a new issue, #2055: URL: https://github.com/apache/arrow-rs/issues/2055 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Similar to #1856, `BooleanArray::from_iter` currently always initializes and returns a n

[GitHub] [arrow] michalursa commented on a diff in pull request #13493: ARROW-14182: [C++][Compute] Hash Join performance improvement v2

2022-07-12 Thread GitBox
michalursa commented on code in PR #13493: URL: https://github.com/apache/arrow/pull/13493#discussion_r919430595 ## cpp/src/arrow/compute/exec/hash_join_node.cc: ## @@ -708,8 +736,26 @@ class HashJoinNode : public ExecNode { // Generate output schema std::shared_ptr ou

[GitHub] [arrow-datafusion] Dandandan commented on a diff in pull request #2881: Add h2o bench groupby queries

2022-07-12 Thread GitBox
Dandandan commented on code in PR #2881: URL: https://github.com/apache/arrow-datafusion/pull/2881#discussion_r919429668 ## benchmarks/src/bin/h2o.rs: ## @@ -0,0 +1,92 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

[GitHub] [arrow-rs] viirya commented on a diff in pull request #2040: Truncate IPC record batch

2022-07-12 Thread GitBox
viirya commented on code in PR #2040: URL: https://github.com/apache/arrow-rs/pull/2040#discussion_r919426839 ## arrow/src/ipc/writer.rs: ## @@ -894,12 +1031,66 @@ fn write_array_data( Some(buffer) => buffer.clone(), }; -offset = write_buffer(&nul

[GitHub] [arrow] lidavidm commented on a diff in pull request #13543: ARROW-17003: [Java][Docs] Document arrow-jdbc adapter

2022-07-12 Thread GitBox
lidavidm commented on code in PR #13543: URL: https://github.com/apache/arrow/pull/13543#discussion_r919426886 ## docs/source/java/jdbc.rst: ## @@ -0,0 +1,169 @@ +.. Licensed to the Apache Software Foundation (ASF) under one +.. or more contributor license agreements. See the N

[GitHub] [arrow] github-actions[bot] commented on pull request #13584: ARROW-17059: [C++] Fix expression benchmark

2022-07-12 Thread GitBox
github-actions[bot] commented on PR #13584: URL: https://github.com/apache/arrow/pull/13584#issuecomment-1182498910 https://issues.apache.org/jira/browse/ARROW-17059 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow] westonpace commented on pull request #13584: ARROW-17059: [C++] Fix expression benchmark

2022-07-12 Thread GitBox
westonpace commented on PR #13584: URL: https://github.com/apache/arrow/pull/13584#issuecomment-1182497958 > Am I reading this correctly that it's basically reverting the change to cpp/src/arrow/compute/exec/expression_benchmark.cc in https://github.com/apache/arrow/pull/13179 There

[GitHub] [arrow] lidavidm merged pull request #13558: ARROW-17005: [Java] Allow overriding column nullability in arrow-jdbc

2022-07-12 Thread GitBox
lidavidm merged PR #13558: URL: https://github.com/apache/arrow/pull/13558 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apac

[GitHub] [arrow-datafusion] andygrove commented on issue #2879: Add h2o benchmark

2022-07-12 Thread GitBox
andygrove commented on issue #2879: URL: https://github.com/apache/arrow-datafusion/issues/2879#issuecomment-1182496621 Related to https://github.com/apache/arrow-datafusion/issues/147 fyi @Dandandan @matthewmturner -- This is an automated message from the Apache Git Service. To r

[GitHub] [arrow] github-actions[bot] commented on pull request #13588: ARROW-16759: [Go] backport gopkg.in/yaml.v3 security patch to v8

2022-07-12 Thread GitBox
github-actions[bot] commented on PR #13588: URL: https://github.com/apache/arrow/pull/13588#issuecomment-1182495438 https://issues.apache.org/jira/browse/ARROW-16759 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow] jonkeane merged pull request #13584: ARROW-17059: [C++] Fix expression benchmark

2022-07-12 Thread GitBox
jonkeane merged PR #13584: URL: https://github.com/apache/arrow/pull/13584 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apac

[GitHub] [arrow] jonkeane commented on pull request #13584: [C++] ARROW-17059: Fix expression benchmark

2022-07-12 Thread GitBox
jonkeane commented on PR #13584: URL: https://github.com/apache/arrow/pull/13584#issuecomment-1182492924 Are these failures cause the baseline is failing? > [Failed ⬇️0.0% ⬆️0.0%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/6dc38255446041b4ba0089e4c1c211f5...7d9cf2453f7049dc

[GitHub] [arrow-rs] viirya commented on a diff in pull request #2040: Truncate IPC record batch

2022-07-12 Thread GitBox
viirya commented on code in PR #2040: URL: https://github.com/apache/arrow-rs/pull/2040#discussion_r919419568 ## arrow/src/ipc/writer.rs: ## @@ -894,12 +1031,66 @@ fn write_array_data( Some(buffer) => buffer.clone(), }; -offset = write_buffer(&nul

[GitHub] [arrow] github-actions[bot] commented on pull request #13587: ARROW-16759: [Go] backport gopkg.in/yaml.v3 security patch to v7

2022-07-12 Thread GitBox
github-actions[bot] commented on PR #13587: URL: https://github.com/apache/arrow/pull/13587#issuecomment-1182492784 https://issues.apache.org/jira/browse/ARROW-16759 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #2887: Add an ID generator in preparation for PR 2885

2022-07-12 Thread GitBox
codecov-commenter commented on PR #2887: URL: https://github.com/apache/arrow-datafusion/pull/2887#issuecomment-1182491941 # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/2887?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_

[GitHub] [arrow] michalursa commented on a diff in pull request #13493: ARROW-14182: [C++][Compute] Hash Join performance improvement v2

2022-07-12 Thread GitBox
michalursa commented on code in PR #13493: URL: https://github.com/apache/arrow/pull/13493#discussion_r919418542 ## cpp/src/arrow/compute/exec/partition_util.h: ## @@ -118,6 +118,43 @@ class PartitionLocks { /// \brief Release a partition so that other threads can work on it

[GitHub] [arrow] nealrichardson commented on a diff in pull request #13577: ARROW-17045: [C++] Reject trailing slashes on file path

2022-07-12 Thread GitBox
nealrichardson commented on code in PR #13577: URL: https://github.com/apache/arrow/pull/13577#discussion_r919416810 ## r/R/io.R: ## @@ -241,7 +241,7 @@ mmap_open <- function(path, mode = c("read", "write", "readwrite")) { make_readable_file <- function(file, mmap = TRUE, comp

[GitHub] [arrow] icexelloss commented on a diff in pull request #13426: ARROW-16894: [C++] Add Benchmarks for Asof Join Node

2022-07-12 Thread GitBox
icexelloss commented on code in PR #13426: URL: https://github.com/apache/arrow/pull/13426#discussion_r919412017 ## cpp/src/arrow/compute/exec/asof_join_benchmark.cc: ## @@ -0,0 +1,1023 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

[GitHub] [arrow] nealrichardson commented on a diff in pull request #13577: ARROW-17045: [C++] Reject trailing slashes on file path

2022-07-12 Thread GitBox
nealrichardson commented on code in PR #13577: URL: https://github.com/apache/arrow/pull/13577#discussion_r919416391 ## r/R/io.R: ## @@ -241,7 +241,7 @@ mmap_open <- function(path, mode = c("read", "write", "readwrite")) { make_readable_file <- function(file, mmap = TRUE, comp

[GitHub] [arrow] icexelloss commented on a diff in pull request #13426: ARROW-16894: [C++] Add Benchmarks for Asof Join Node

2022-07-12 Thread GitBox
icexelloss commented on code in PR #13426: URL: https://github.com/apache/arrow/pull/13426#discussion_r919412017 ## cpp/src/arrow/compute/exec/asof_join_benchmark.cc: ## @@ -0,0 +1,1023 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

[GitHub] [arrow] icexelloss commented on a diff in pull request #13426: ARROW-16894: [C++] Add Benchmarks for Asof Join Node

2022-07-12 Thread GitBox
icexelloss commented on code in PR #13426: URL: https://github.com/apache/arrow/pull/13426#discussion_r919412017 ## cpp/src/arrow/compute/exec/asof_join_benchmark.cc: ## @@ -0,0 +1,1023 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

[GitHub] [arrow] icexelloss commented on a diff in pull request #13426: ARROW-16894: [C++] Add Benchmarks for Asof Join Node

2022-07-12 Thread GitBox
icexelloss commented on code in PR #13426: URL: https://github.com/apache/arrow/pull/13426#discussion_r919412017 ## cpp/src/arrow/compute/exec/asof_join_benchmark.cc: ## @@ -0,0 +1,1023 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

[GitHub] [arrow] icexelloss commented on a diff in pull request #13426: ARROW-16894: [C++] Add Benchmarks for Asof Join Node

2022-07-12 Thread GitBox
icexelloss commented on code in PR #13426: URL: https://github.com/apache/arrow/pull/13426#discussion_r919412017 ## cpp/src/arrow/compute/exec/asof_join_benchmark.cc: ## @@ -0,0 +1,1023 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

[GitHub] [arrow] icexelloss commented on a diff in pull request #13426: ARROW-16894: [C++] Add Benchmarks for Asof Join Node

2022-07-12 Thread GitBox
icexelloss commented on code in PR #13426: URL: https://github.com/apache/arrow/pull/13426#discussion_r919412017 ## cpp/src/arrow/compute/exec/asof_join_benchmark.cc: ## @@ -0,0 +1,1023 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

[GitHub] [arrow-datafusion] avantgardnerio commented on issue #2889: JSON version of `display_indent()`

2022-07-12 Thread GitBox
avantgardnerio commented on issue #2889: URL: https://github.com/apache/arrow-datafusion/issues/2889#issuecomment-1182484987 As a potential implementation option, we could use something like https://docs.serde.rs/serde_json/ with our existing visitor pattern. I know some folks are reticent

[GitHub] [arrow] lidavidm commented on pull request #13589: ARROW-17004: [Java] Add utility to bind Arrow data to JDBC parameters

2022-07-12 Thread GitBox
lidavidm commented on PR #13589: URL: https://github.com/apache/arrow/pull/13589#issuecomment-1182484408 TODOs: - [ ] Add documentation - [ ] Are we handling time/timestamp types properly when time zones come into play? - [ ] Add date support as well -- This is an automated messa

[GitHub] [arrow] lidavidm opened a new pull request, #13589: ARROW-17004: [Java] Add utility to bind Arrow data to JDBC parameters

2022-07-12 Thread GitBox
lidavidm opened a new pull request, #13589: URL: https://github.com/apache/arrow/pull/13589 This extends the arrow-jdbc adapter to also allow taking Arrow data and using it to bind JDBC PreparedStatement parameters, allowing you to "round trip" data to a certain extent. This was factored ou

[GitHub] [arrow-datafusion] avantgardnerio opened a new issue, #2889: JSON version of `display_indent()`

2022-07-12 Thread GitBox
avantgardnerio opened a new issue, #2889: URL: https://github.com/apache/arrow-datafusion/issues/2889 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** A significant portion of my time on #2885 was spent manually formatting subq

  1   2   3   4   >