[GitHub] [arrow] ManManson commented on a diff in pull request #13932: MINOR: [C++] Fix StringFormatter type error in localfs_benchmark

2022-08-21 Thread GitBox
ManManson commented on code in PR #13932: URL: https://github.com/apache/arrow/pull/13932#discussion_r951076256 ## cpp/src/arrow/filesystem/localfs_benchmark.cc: ## @@ -66,7 +66,7 @@ class LocalFSFixture : public benchmark::Fixture { arrow::int

[GitHub] [arrow] thisisnic commented on pull request #13937: ARROW-17489: [R] Nightly builds failing due to test referencing unrelease stringr functions

2022-08-21 Thread GitBox
thisisnic commented on PR #13937: URL: https://github.com/apache/arrow/pull/13937#issuecomment-1221926356 Not sure whether the best solution here is to bump the stringr version, or remove this test entirely for the moment? -- This is an automated message from the Apache Git Service. To re

[GitHub] [arrow] kou commented on pull request #13892: ARROW-12175: [C++] Fix CMake packages

2022-08-21 Thread GitBox
kou commented on PR #13892: URL: https://github.com/apache/arrow/pull/13892#issuecomment-1221923932 @github-actions crossbow submit -g nightly-tests -g nightly-packaging -g nightly-release -- This is an automated message from the Apache Git Service. To respond to the message, please log o

[GitHub] [arrow] ManManson commented on pull request #13804: ARROW-17318: [C++][Dataset] Support async streaming interface for getting fragments in Dataset

2022-08-21 Thread GitBox
ManManson commented on PR #13804: URL: https://github.com/apache/arrow/pull/13804#issuecomment-1221921233 @westonpace @pitrou Polite review ping. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #3218: [minor] fix bench aggregate_query_sql meta

2022-08-21 Thread GitBox
codecov-commenter commented on PR #3218: URL: https://github.com/apache/arrow-datafusion/pull/3218#issuecomment-1221918706 # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/3218?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_

[GitHub] [arrow] thisisnic commented on a diff in pull request #13748: MINOR: [R] Add note about the deprecated `type()` function

2022-08-21 Thread GitBox
thisisnic commented on code in PR #13748: URL: https://github.com/apache/arrow/pull/13748#discussion_r951055052 ## r/R/type.R: ## @@ -58,6 +58,10 @@ FLOAT_TYPES <- c("float16", "float32", "float64", "halffloat", "float", "double" #' Infer the arrow Array type from an R objec

[GitHub] [arrow] thisisnic commented on a diff in pull request #13748: MINOR: [R] Add note about the deprecated `type()` function

2022-08-21 Thread GitBox
thisisnic commented on code in PR #13748: URL: https://github.com/apache/arrow/pull/13748#discussion_r951055052 ## r/R/type.R: ## @@ -58,6 +58,10 @@ FLOAT_TYPES <- c("float16", "float32", "float64", "halffloat", "float", "double" #' Infer the arrow Array type from an R objec

[GitHub] [arrow] thisisnic commented on a diff in pull request #13748: MINOR: [R] Add note about the deprecated `type()` function

2022-08-21 Thread GitBox
thisisnic commented on code in PR #13748: URL: https://github.com/apache/arrow/pull/13748#discussion_r951053826 ## r/R/type.R: ## @@ -58,6 +58,10 @@ FLOAT_TYPES <- c("float16", "float32", "float64", "halffloat", "float", "double" #' Infer the arrow Array type from an R objec

[GitHub] [arrow-datafusion] Ted-Jiang commented on pull request #3218: [minor] fix bench aggregate_query_sql meta

2022-08-21 Thread GitBox
Ted-Jiang commented on PR #3218: URL: https://github.com/apache/arrow-datafusion/pull/3218#issuecomment-1221889826 @alamb PTAL😊 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

[GitHub] [arrow-datafusion] Ted-Jiang opened a new pull request, #3218: [minor] fix bench aggregate_query_sql meta

2022-08-21 Thread GitBox
Ted-Jiang opened a new pull request, #3218: URL: https://github.com/apache/arrow-datafusion/pull/3218 # Which issue does this PR close? related #3217 # Rationale for this change when i run `cargo bench --bench aggregate_query_sql` got ``` thread 'main' panicked at

[GitHub] [arrow-rs] liukun4515 commented on pull request #2547: Validate array data when creating array in ipc reader

2022-08-21 Thread GitBox
liukun4515 commented on PR #2547: URL: https://github.com/apache/arrow-rs/pull/2547#issuecomment-1221888759 why the data in the ipc can't be trusted from this issue https://github.com/apache/arrow-rs/issues/2541 ? @HaoYang670 @tustvold -- This is an automated message from the Apache Git

[GitHub] [arrow-rs] liukun4515 commented on pull request #2525: Clean the `create_array` in IPC reader.

2022-08-21 Thread GitBox
liukun4515 commented on PR #2525: URL: https://github.com/apache/arrow-rs/pull/2525#issuecomment-1221887697 why the data in the ipc can't be trusted from this issue https://github.com/apache/arrow-rs/issues/2541 ? -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] eitsupi commented on a diff in pull request #13748: MINOR: [R] Add note about the deprecated `type()` function

2022-08-21 Thread GitBox
eitsupi commented on code in PR #13748: URL: https://github.com/apache/arrow/pull/13748#discussion_r951040925 ## r/R/type.R: ## @@ -58,6 +58,10 @@ FLOAT_TYPES <- c("float16", "float32", "float64", "halffloat", "float", "double" #' Infer the arrow Array type from an R object

[GitHub] [arrow] AlenkaF commented on pull request #13311: ARROW-16340: [Python] Move all Python related code into PyArrow

2022-08-21 Thread GitBox
AlenkaF commented on PR #13311: URL: https://github.com/apache/arrow/pull/13311#issuecomment-1221865295 None of the failures seem related. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [arrow] AlenkaF commented on a diff in pull request #13311: ARROW-16340: [Python] Move all Python related code into PyArrow

2022-08-21 Thread GitBox
AlenkaF commented on code in PR #13311: URL: https://github.com/apache/arrow/pull/13311#discussion_r951032044 ## python/setup.py: ## @@ -227,6 +228,126 @@ def initialize_options(self): '_hdfsio', 'gandiva'] +def _run_cmake_pyarrow_cpp(self): +# ch

[GitHub] [arrow] cyb70289 commented on a diff in pull request #13924: ARROW-17475: [Go] Function interface and Registry impl

2022-08-21 Thread GitBox
cyb70289 commented on code in PR #13924: URL: https://github.com/apache/arrow/pull/13924#discussion_r951027061 ## go/arrow/compute/registry.go: ## @@ -0,0 +1,194 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See th

[GitHub] [arrow-rs] viirya opened a new pull request, #2549: Compare dictionary array with string array

2022-08-21 Thread GitBox
viirya opened a new pull request, #2549: URL: https://github.com/apache/arrow-rs/pull/2549 # Which issue does this PR close? Closes #2548. # Rationale for this change # What changes are included in this PR? # Are there any user-facing chang

[GitHub] [arrow-rs] viirya opened a new issue, #2548: Compare dictionary with string array

2022-08-21 Thread GitBox
viirya opened a new issue, #2548: URL: https://github.com/apache/arrow-rs/issues/2548 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Part of #2534. Support the comparison between dictionary array and string array.

[GitHub] [arrow-cookbook] thisisnic commented on pull request #249: Added License Headers to R module

2022-08-21 Thread GitBox
thisisnic commented on PR #249: URL: https://github.com/apache/arrow-cookbook/pull/249#issuecomment-1221826969 > @thisisnic Just curious - how to re-run the CI job? I assume I won't have the privileges That's correct, you don't, but you could send an empty commit or commit then rever

[GitHub] [arrow-ballista] andygrove commented on pull request #150: Move ExecutionGraph encoding and decoding logic into execution_graph for better encapsulation

2022-08-21 Thread GitBox
andygrove commented on PR #150: URL: https://github.com/apache/arrow-ballista/pull/150#issuecomment-1221805206 @thinkharderdev @avantgardnerio @mingmwang any objections to me merging this one? -- This is an automated message from the Apache Git Service. To respond to the message, please l

[GitHub] [arrow] github-actions[bot] commented on pull request #13892: ARROW-12175: [C++] Fix CMake packages

2022-08-21 Thread GitBox
github-actions[bot] commented on PR #13892: URL: https://github.com/apache/arrow/pull/13892#issuecomment-1221804373 Revision: 764deeeb9b335e0f2405c20e06cad3d0bada0ff0 Submitted crossbow builds: [ursacomputing/crossbow @ actions-9d41b7d4d9](https://github.com/ursacomputing/crossbow/bra

[GitHub] [arrow-ballista] andygrove merged pull request #151: Stop Executor Impl, Executor Graceful Shutdown

2022-08-21 Thread GitBox
andygrove merged PR #151: URL: https://github.com/apache/arrow-ballista/pull/151 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

[GitHub] [arrow-ballista] andygrove commented on pull request #151: Stop Executor Impl, Executor Graceful Shutdown

2022-08-21 Thread GitBox
andygrove commented on PR #151: URL: https://github.com/apache/arrow-ballista/pull/151#issuecomment-1221803279 Thanks @mingmwang for the work and thank you, @thinkharderdev @yahoNanJing and @avantgardnerio for reviewing. I will merge this now. -- This is an automated message from the Apac

[GitHub] [arrow-ballista] andygrove commented on a diff in pull request #151: Stop Executor Impl, Executor Graceful Shutdown

2022-08-21 Thread GitBox
andygrove commented on code in PR #151: URL: https://github.com/apache/arrow-ballista/pull/151#discussion_r951004236 ## ballista/rust/executor/src/main.rs: ## @@ -154,57 +162,156 @@ async fn main() -> Result<()> { let scheduler_policy = opt.task_scheduling_policy; let

[GitHub] [arrow] kou commented on pull request #13892: ARROW-12175: [C++] Fix CMake packages

2022-08-21 Thread GitBox
kou commented on PR #13892: URL: https://github.com/apache/arrow/pull/13892#issuecomment-1221802464 @github-actions crossbow submit -g nightly-tests -g nightly-packaging -g nightly-release -- This is an automated message from the Apache Git Service. To respond to the message, please log o

[GitHub] [arrow] kou commented on pull request #13661: ARROW-17436: [C++] Use -O2 instead of -O3 for RELEASE builds

2022-08-21 Thread GitBox
kou commented on PR #13661: URL: https://github.com/apache/arrow/pull/13661#issuecomment-1221801725 I don't know why but it seems that the "AMD64 Windows R 3.6 RTools 35" CI job is failed since this change is merged into master: https://github.com/apache/arrow/runs/7866259947?check_s

[GitHub] [arrow-datafusion] comphead commented on pull request #3168: Fix hash join non compat issues

2022-08-21 Thread GitBox
comphead commented on PR #3168: URL: https://github.com/apache/arrow-datafusion/pull/3168#issuecomment-1221773989 Thanks @alamb @liukun4515 Imho its preferable to coerce in logical plan, and we probably have to avoid any kind of `downcast` before coercion. That looks like a beefy change

[GitHub] [arrow-rs] kastolars commented on issue #2523: UnionBuilder Create Children With Capacity

2022-08-21 Thread GitBox
kastolars commented on issue #2523: URL: https://github.com/apache/arrow-rs/issues/2523#issuecomment-1221765269 Hi, has anyone started on this issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow-datafusion] Ted-Jiang opened a new issue, #3217: Create bench for approx_percentile_cont aggregate

2022-08-21 Thread GitBox
Ted-Jiang opened a new issue, #3217: URL: https://github.com/apache/arrow-datafusion/issues/3217 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** A clear and concise description of what the problem is. Ex. I'm always frustrated whe

[GitHub] [arrow-cookbook] mystic-lama commented on pull request #249: Added License Headers to R module

2022-08-21 Thread GitBox
mystic-lama commented on PR #249: URL: https://github.com/apache/arrow-cookbook/pull/249#issuecomment-1221748469 @thisisnic Just curious - how to re-run the CI job? I assume I won't have the privileges -- This is an automated message from the Apache Git Service. To respond to the message

[GitHub] [arrow-rs] viirya commented on pull request #2547: Validate array data when creating array in ipc reader

2022-08-21 Thread GitBox
viirya commented on PR #2547: URL: https://github.com/apache/arrow-rs/pull/2547#issuecomment-1221740061 > > A known issue. > > Is there any ticket to track this? I'd opened a ticket before to know why there are these values: https://issues.apache.org/jira/browse/ARROW-16696

[GitHub] [arrow-rs] HaoYang670 commented on pull request #2547: Validate array data when creating array in ipc reader

2022-08-21 Thread GitBox
HaoYang670 commented on PR #2547: URL: https://github.com/apache/arrow-rs/pull/2547#issuecomment-1221735665 > A known issue. Is there any ticket to track this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow-rs] viirya commented on pull request #2547: Validate array data when creating array in ipc reader

2022-08-21 Thread GitBox
viirya commented on PR #2547: URL: https://github.com/apache/arrow-rs/pull/2547#issuecomment-1221734819 For the following tests, there are some out of range decimal values for the given precision in older version of generated files. A known issue. ``` ipc::reader::tests::read_

[GitHub] [arrow-rs] viirya commented on pull request #2547: Validate array data when creating array in ipc reader

2022-08-21 Thread GitBox
viirya commented on PR #2547: URL: https://github.com/apache/arrow-rs/pull/2547#issuecomment-1221731623 `test_row_type_validation` passes in current master. You can retry after `cargo clean`. -- This is an automated message from the Apache Git Service. To respond to the message, please lo

[GitHub] [arrow-ballista] mingmwang commented on pull request #151: Stop Executor Impl, Executor Graceful Shutdown

2022-08-21 Thread GitBox
mingmwang commented on PR #151: URL: https://github.com/apache/arrow-ballista/pull/151#issuecomment-1221727706 @alamb @andygrove ]Please help to approve and merge the PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow-rs] HaoYang670 commented on pull request #2547: Validate array data when creating array in ipc reader

2022-08-21 Thread GitBox
HaoYang670 commented on PR #2547: URL: https://github.com/apache/arrow-rs/pull/2547#issuecomment-1221721908 5 tests failures so far ``` failures: ipc::reader::tests::read_generated_streams_014 stdout thread 'ipc::reader::tests::read_generated_streams_014' panicked at

[GitHub] [arrow-rs] HaoYang670 opened a new pull request, #2547: Validate array data when creating array in ipc reader

2022-08-21 Thread GitBox
HaoYang670 opened a new pull request, #2547: URL: https://github.com/apache/arrow-rs/pull/2547 Signed-off-by: remzi <1371656737...@gmail.com> # Which issue does this PR close? Closes #2541. # Rationale for this change # What changes are included in

[GitHub] [arrow-rs] HaoYang670 commented on issue #2541: Always validate the array data when creating array in IPC reader

2022-08-21 Thread GitBox
HaoYang670 commented on issue #2541: URL: https://github.com/apache/arrow-rs/issues/2541#issuecomment-1221717603 After replacing `build_unchecked` with `build().unwrap()` in `create_primitive_array` in the ipc reader, 5 tests fail: ``` failures: ipc::reader::tests::read_generate

[GitHub] [arrow-datafusion] liukun4515 commented on pull request #3185: optimizer: add framework for the rule of pre-add cast to the literal in comparison binary

2022-08-21 Thread GitBox
liukun4515 commented on PR #3185: URL: https://github.com/apache/arrow-datafusion/pull/3185#issuecomment-1221674043 @alamb @andygrove is there any comments for this pr? it's stays too long without updates. -- This is an automated message from the Apache Git Service. To respond to the m

[GitHub] [arrow-datafusion] liukun4515 commented on pull request #3168: Fix hash join non compat issues

2022-08-21 Thread GitBox
liukun4515 commented on PR #3168: URL: https://github.com/apache/arrow-datafusion/pull/3168#issuecomment-1221672087 > > @Dandandan @alamb is anywhere documentation on planner? > > I don't know of any real documentation on the planner > > Note there is a discussion about the cur

[GitHub] [arrow] github-actions[bot] commented on pull request #13936: ARROW-8567: [Python] Fix a bug that pa.array() may ignore "safe=False"

2022-08-21 Thread GitBox
github-actions[bot] commented on PR #13936: URL: https://github.com/apache/arrow/pull/13936#issuecomment-1221671179 :warning: Ticket **has not been started in JIRA**, please click 'Start Progress'. -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [arrow] github-actions[bot] commented on pull request #13936: ARROW-8567: [Python] Fix a bug that pa.array() may ignore "safe=False"

2022-08-21 Thread GitBox
github-actions[bot] commented on PR #13936: URL: https://github.com/apache/arrow/pull/13936#issuecomment-1221671160 https://issues.apache.org/jira/browse/ARROW-8567 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [arrow-rs] ursabot commented on pull request #2546: MINOR: Fix test_row_type_validation test

2022-08-21 Thread GitBox
ursabot commented on PR #2546: URL: https://github.com/apache/arrow-rs/pull/2546#issuecomment-1221669398 Benchmark runs are scheduled for baseline = 34216d57aba1739c866267825f54371d75d6c004 and contender = dc1448eee2018a3254b22fbffe18bb792e361a37. dc1448eee2018a3254b22fbffe18bb792e361a37 i

[GitHub] [arrow-rs] viirya merged pull request #2546: MINOR: Fix test_row_type_validation test

2022-08-21 Thread GitBox
viirya merged PR #2546: URL: https://github.com/apache/arrow-rs/pull/2546 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apach

[GitHub] [arrow-rs] viirya opened a new pull request, #2546: MINOR: Fix test_row_type_validation test

2022-08-21 Thread GitBox
viirya opened a new pull request, #2546: URL: https://github.com/apache/arrow-rs/pull/2546 # Which issue does this PR close? Closes #. # Rationale for this change Since `serde_json` 1.0.84+, `test_row_type_validation` fails. ``` json::reade

[GitHub] [arrow] marsupialtail commented on a diff in pull request #13931: ARROW-17481: [C++] [Python] Major performance improvements to CSV reading from S3

2022-08-21 Thread GitBox
marsupialtail commented on code in PR #13931: URL: https://github.com/apache/arrow/pull/13931#discussion_r950904697 ## cpp/src/arrow/dataset/file_csv.cc: ## @@ -184,16 +186,45 @@ static inline Future> OpenReaderAsync( auto span = tracer->StartSpan("arrow::dataset::CsvFileFo

[GitHub] [arrow] marsupialtail commented on a diff in pull request #13931: ARROW-17481: [C++] [Python] Major performance improvements to CSV reading from S3

2022-08-21 Thread GitBox
marsupialtail commented on code in PR #13931: URL: https://github.com/apache/arrow/pull/13931#discussion_r950904667 ## cpp/src/arrow/io/interfaces.h: ## @@ -343,5 +344,8 @@ ARROW_EXPORT Result>> MakeInputStreamIterator( std::shared_ptr stream, int64_t block_size); +ARROW

[GitHub] [arrow-rs] jhorstmann commented on a diff in pull request #2521: Improve performance of `%pat%` (>3x speedup)

2022-08-21 Thread GitBox
jhorstmann commented on code in PR #2521: URL: https://github.com/apache/arrow-rs/pull/2521#discussion_r950902502 ## arrow/src/compute/kernels/comparison.rs: ## @@ -263,11 +263,23 @@ pub fn like_utf8_scalar( } else if right.starts_with('%') && !right[1..].contains(is_like_p

[GitHub] [arrow-cookbook] thisisnic commented on pull request #249: Added License Headers to R module

2022-08-21 Thread GitBox
thisisnic commented on PR #249: URL: https://github.com/apache/arrow-cookbook/pull/249#issuecomment-1221609332 Thanks for the updates to this! Looking good now, so happy to merge once the CI passes. Looks like the previous CI failure was due to the download of the arrow R package fai

[GitHub] [arrow-rs] Dandandan commented on a diff in pull request #2545: Fix ilike_utf8_scalar kernals

2022-08-21 Thread GitBox
Dandandan commented on code in PR #2545: URL: https://github.com/apache/arrow-rs/pull/2545#discussion_r950891139 ## arrow/src/compute/kernels/comparison.rs: ## @@ -468,7 +468,7 @@ pub fn ilike_utf8_scalar( if !right.contains(is_like_pattern) { // fast path, can use

[GitHub] [arrow] thatstatsguy commented on pull request #13267: ARROW-16690: [R][FlightRPC] Additional max_chunksize parameter in do_put method

2022-08-21 Thread GitBox
thatstatsguy commented on PR #13267: URL: https://github.com/apache/arrow/pull/13267#issuecomment-1221601884 @paleolimbot thanks for your patience! All updated, assuming there are no build issues we should be good to go! -- This is an automated message from the Apache Git Service. To resp

[GitHub] [arrow] thatstatsguy commented on a diff in pull request #13267: ARROW-16690: [R][FlightRPC] Additional max_chunksize parameter in do_put method

2022-08-21 Thread GitBox
thatstatsguy commented on code in PR #13267: URL: https://github.com/apache/arrow/pull/13267#discussion_r950886554 ## r/tests/testthat/test-python-flight.R: ## @@ -37,6 +37,16 @@ if (process_is_running("demo_flight_server")) { regexp = 'data must be a "data.frame", "Table

[GitHub] [arrow] thatstatsguy commented on a diff in pull request #13267: ARROW-16690: [R][FlightRPC] Additional max_chunksize parameter in do_put method

2022-08-21 Thread GitBox
thatstatsguy commented on code in PR #13267: URL: https://github.com/apache/arrow/pull/13267#discussion_r950886498 ## r/R/flight.R: ## @@ -72,6 +74,8 @@ flight_put <- function(client, data, path, overwrite = TRUE) { writer <- client$do_put(descriptor_for_path(path), py_data$s

[GitHub] [arrow] thisisnic commented on a diff in pull request #13748: MINOR: [R] Add note about the deprecated `type()` function

2022-08-21 Thread GitBox
thisisnic commented on code in PR #13748: URL: https://github.com/apache/arrow/pull/13748#discussion_r950878764 ## r/R/type.R: ## @@ -58,6 +58,10 @@ FLOAT_TYPES <- c("float16", "float32", "float64", "halffloat", "float", "double" #' Infer the arrow Array type from an R objec

[GitHub] [arrow] marsupialtail commented on pull request #13640: ARROW-14635: [Python][C++] add O_DIRECT support to writes

2022-08-21 Thread GitBox
marsupialtail commented on PR #13640: URL: https://github.com/apache/arrow/pull/13640#issuecomment-1221580853 Note that the current implementation relies on a memcpy to ensure alignment. In general, if each write starts off from a potentially different buffer with an arbitrary length this m

[GitHub] [arrow-datafusion] kmitchener commented on issue #3174: Bug with csv type inference

2022-08-21 Thread GitBox
kmitchener commented on issue #3174: URL: https://github.com/apache/arrow-datafusion/issues/3174#issuecomment-1221579911 There's also a TODO directly related to this that could be cleaned up as part of this PR -> https://github.com/apache/arrow-datafusion/blob/master/datafusion/core/tests/

[GitHub] [arrow-rs] psvri commented on a diff in pull request #2545: Fix ilike_utf8_scalar kernals

2022-08-21 Thread GitBox
psvri commented on code in PR #2545: URL: https://github.com/apache/arrow-rs/pull/2545#discussion_r950871108 ## arrow/src/compute/kernels/comparison.rs: ## @@ -468,7 +468,7 @@ pub fn ilike_utf8_scalar( if !right.contains(is_like_pattern) { // fast path, can use equ

[GitHub] [arrow-rs] Dandandan commented on a diff in pull request #2545: Fix ilike_utf8_scalar kernals

2022-08-21 Thread GitBox
Dandandan commented on code in PR #2545: URL: https://github.com/apache/arrow-rs/pull/2545#discussion_r950870349 ## arrow/src/compute/kernels/comparison.rs: ## @@ -468,7 +468,7 @@ pub fn ilike_utf8_scalar( if !right.contains(is_like_pattern) { // fast path, can use

[GitHub] [arrow-rs] sunchao commented on issue #2502: Inline Generated Thift Code Into Parquet Crate

2022-08-21 Thread GitBox
sunchao commented on issue #2502: URL: https://github.com/apache/arrow-rs/issues/2502#issuecomment-1221578254 +1. Let me know if you need any help. > I'd be in favour of deprecating and eventually archiving parquet-format-rs, then generating the code here. I'd prefer to do th

[GitHub] [arrow-rs] psvri opened a new pull request, #2545: Fix ilike_utf8_scalar kernals

2022-08-21 Thread GitBox
psvri opened a new pull request, #2545: URL: https://github.com/apache/arrow-rs/pull/2545 # Which issue does this PR close? Closes #2544 . # Rationale for this change Incorrect logic in equals path of ilike_utf8_scalar kernals # What changes are included in this P

[GitHub] [arrow-rs] psvri opened a new issue, #2544: Ilike_ut8_scalar kernals have incorrect logic

2022-08-21 Thread GitBox
psvri opened a new issue, #2544: URL: https://github.com/apache/arrow-rs/issues/2544 **Describe the bug** Incorrect implemetation of ilike_utf8_scalars **To Reproduce** The below code fails ``` let left = StringArray::from(vec!["arrow", "parrow", "arrows", "arr"]); let r

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2509: Replace azure sdk with custom implementation

2022-08-21 Thread GitBox
tustvold commented on code in PR #2509: URL: https://github.com/apache/arrow-rs/pull/2509#discussion_r950868068 ## object_store/src/client/oauth.rs: ## @@ -219,3 +223,82 @@ fn b64_encode_obj(obj: &T) -> Result { let string = serde_json::to_string(obj).context(EncodeSnafu)?

[GitHub] [arrow-datafusion] jackwener closed issue #3216: Consider to categorize Operator

2022-08-21 Thread GitBox
jackwener closed issue #3216: Consider to categorize Operator URL: https://github.com/apache/arrow-datafusion/issues/3216 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

[GitHub] [arrow-rs] thinkharderdev opened a new issue, #2543: Panic when first data page is skipped using ColumnChunkData::Sparse

2022-08-21 Thread GitBox
thinkharderdev opened a new issue, #2543: URL: https://github.com/apache/arrow-rs/issues/2543 **Describe the bug** If you have a row selection which skips the first data page, `SerializedPageReader` will error incorrectly. **To Reproduce** Create a `ParquetRecor

[GitHub] [arrow-datafusion] jackwener commented on issue #3216: Consider to categorize Operator

2022-08-21 Thread GitBox
jackwener commented on issue #3216: URL: https://github.com/apache/arrow-datafusion/issues/3216#issuecomment-1221558178 This issue is inspired by my job. I have researched many projects planner while working on new planner and cascades-optimizer for Apache Doris. I think it's a points

[GitHub] [arrow-datafusion] jackwener commented on issue #3216: Consider to categorize Operator

2022-08-21 Thread GitBox
jackwener commented on issue #3216: URL: https://github.com/apache/arrow-datafusion/issues/3216#issuecomment-1221557101 @alamb @liukun4515 @andygrove. How do you think about this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [arrow-datafusion] jackwener opened a new issue, #3216: Consider to categorize Operator

2022-08-21 Thread GitBox
jackwener opened a new issue, #3216: URL: https://github.com/apache/arrow-datafusion/issues/3216 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** A clear and concise description of what the problem is. Ex. I'm always frustrated whe

[GitHub] [arrow-rs] alamb commented on issue #2542: Release Arrow `XXXX` (next release after `21.0.0`)

2022-08-21 Thread GitBox
alamb commented on issue #2542: URL: https://github.com/apache/arrow-rs/issues/2542#issuecomment-1221527794 @iajoiner here is ticket. The instructions are here: https://github.com/apache/arrow-rs/blob/master/dev/release/README.md Basically a PMC member such as myself, @nevi-me etc ne

[GitHub] [arrow-rs] alamb opened a new issue, #2542: Release Arrow `XXXX` (next release after `21.0.0`)

2022-08-21 Thread GitBox
alamb opened a new issue, #2542: URL: https://github.com/apache/arrow-rs/issues/2542 Follow on from https://github.com/apache/arrow-rs/issues/2382 * Planned Release Candidate: 2022-09-02 * Planned Release and Publish to crates.io: 2022-09-05 Items: - [ ] PR to update versi

[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #3215: Fix MIRI CI run + and update nightly version used

2022-08-21 Thread GitBox
codecov-commenter commented on PR #3215: URL: https://github.com/apache/arrow-datafusion/pull/3215#issuecomment-1221527210 # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/3215?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #3215: Fix MIRI CI run + and update nightly version used

2022-08-21 Thread GitBox
alamb commented on code in PR #3215: URL: https://github.com/apache/arrow-datafusion/pull/3215#discussion_r950826897 ## .github/workflows/rust.yml: ## @@ -374,9 +374,7 @@ jobs: MIRIFLAGS: "-Zmiri-disable-isolation" run: | cargo miri setup -

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #3189: Support "IS TRUE/FALSE" syntax

2022-08-21 Thread GitBox
alamb commented on code in PR #3189: URL: https://github.com/apache/arrow-datafusion/pull/3189#discussion_r950828871 ## datafusion/physical-expr/src/expressions/is_false.rs: ## @@ -0,0 +1,127 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribut

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #3189: Support "IS TRUE/FALSE" syntax

2022-08-21 Thread GitBox
alamb commented on code in PR #3189: URL: https://github.com/apache/arrow-datafusion/pull/3189#discussion_r950827446 ## datafusion/expr/src/expr_schema.rs: ## @@ -183,7 +185,11 @@ impl ExprSchemable for Expr { | Expr::WindowFunction { .. } | Expr::Aggre

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #3196: Clean up CI workflows by removing "matrix" strategy, simplifying names

2022-08-21 Thread GitBox
alamb commented on code in PR #3196: URL: https://github.com/apache/arrow-datafusion/pull/3196#discussion_r950826777 ## .github/workflows/rust.yml: ## @@ -395,8 +364,8 @@ jobs: key: ${{ runner.os }}-cargo-miri-${{ hashFiles('**/Cargo.lock') }} - name: Setup Rus

[GitHub] [arrow-datafusion] ShaoDaTao commented on issue #3203: datafusion cannot recognize chinese charactors.

2022-08-21 Thread GitBox
ShaoDaTao commented on issue #3203: URL: https://github.com/apache/arrow-datafusion/issues/3203#issuecomment-1221522984 Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

[GitHub] [arrow-datafusion] ShaoDaTao closed issue #3203: datafusion cannot recognize chinese charactors.

2022-08-21 Thread GitBox
ShaoDaTao closed issue #3203: datafusion cannot recognize chinese charactors. URL: https://github.com/apache/arrow-datafusion/issues/3203 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [arrow-datafusion] alamb opened a new pull request, #3215: Fix MIRI CI run + and update nightly version used

2022-08-21 Thread GitBox
alamb opened a new pull request, #3215: URL: https://github.com/apache/arrow-datafusion/pull/3215 # Which issue does this PR close? re #3045 # Rationale for this change As suggested by @jackwener 👍 https://github.com/apache/arrow-datafusion/pull/3196#discussion_r

[GitHub] [arrow-datafusion] ursabot commented on pull request #3196: Clean up CI workflows by removing "matrix" strategy, simplifying names

2022-08-21 Thread GitBox
ursabot commented on PR #3196: URL: https://github.com/apache/arrow-datafusion/pull/3196#issuecomment-1221522681 Benchmark runs are scheduled for baseline = 72487cfb0305f1d3c40d179ba963d73d57a79d48 and contender = 4838564d868243567df6b4ed19751a3da287e777. 4838564d868243567df6b4ed19751a3da

[GitHub] [arrow-datafusion] alamb merged pull request #3196: Clean up CI workflows by removing "matrix" strategy, simplifying names

2022-08-21 Thread GitBox
alamb merged PR #3196: URL: https://github.com/apache/arrow-datafusion/pull/3196 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

[GitHub] [arrow-datafusion] alamb commented on issue #3214: Don't scan first column on empty projection

2022-08-21 Thread GitBox
alamb commented on issue #3214: URL: https://github.com/apache/arrow-datafusion/issues/3214#issuecomment-1221521892 👍 this is an important optimization as `select count(*)` type queries are so common -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow-datafusion] alamb commented on pull request #3168: Fix hash join non compat issues

2022-08-21 Thread GitBox
alamb commented on PR #3168: URL: https://github.com/apache/arrow-datafusion/pull/3168#issuecomment-1221521672 > @Dandandan @alamb is anywhere documentation on planner? I don't know of any real documentation on the planner Note there is a discussion about the current location

[GitHub] [arrow-datafusion] ursabot commented on pull request #3213: minor: refactor simplify negate

2022-08-21 Thread GitBox
ursabot commented on PR #3213: URL: https://github.com/apache/arrow-datafusion/pull/3213#issuecomment-1221521365 Benchmark runs are scheduled for baseline = c8d61d8e3ce48120cc7074cfa2143cd79d33f22f and contender = 72487cfb0305f1d3c40d179ba963d73d57a79d48. 72487cfb0305f1d3c40d179ba963d73d5

[GitHub] [arrow-rs] alamb commented on pull request #2476: display NULL instead of empty string

2022-08-21 Thread GitBox
alamb commented on PR #2476: URL: https://github.com/apache/arrow-rs/pull/2476#issuecomment-1221521240 > Similarly we could add an option to quote strings, this would then provide an unambiguous NULL representation. I agree this would be great. Maybe we could take the opportun

[GitHub] [arrow-datafusion] alamb merged pull request #3213: minor: refactor simplify negate

2022-08-21 Thread GitBox
alamb merged PR #3213: URL: https://github.com/apache/arrow-datafusion/pull/3213 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #3213: minor: refactor simplify negate

2022-08-21 Thread GitBox
alamb commented on code in PR #3213: URL: https://github.com/apache/arrow-datafusion/pull/3213#discussion_r950824271 ## datafusion/expr/src/operator.rs: ## @@ -79,6 +79,41 @@ pub enum Operator { StringConcat, } +impl Operator { +/// If the operator can be negated, re

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #3201: Use .get() to avoid panic

2022-08-21 Thread GitBox
alamb commented on code in PR #3201: URL: https://github.com/apache/arrow-datafusion/pull/3201#discussion_r950823050 ## datafusion/common/src/parsing.rs: ## @@ -0,0 +1,17 @@ +pub fn is_system_variables(variable_names: &[String]) -> bool { Review Comment: FYI the reason the "

[GitHub] [arrow-datafusion] alamb commented on issue #3174: Bug with csv type inference

2022-08-21 Thread GitBox
alamb commented on issue #3174: URL: https://github.com/apache/arrow-datafusion/issues/3174#issuecomment-1221518248 👋 @bezbac I think looking into improving the inference in the arrow crate would be a good step Another potential alternative to using UInt64 might be Decimal12

[GitHub] [arrow-datafusion] Dandandan opened a new issue, #3214: Don't scan first column on empty projection

2022-08-21 Thread GitBox
Dandandan opened a new issue, #3214: URL: https://github.com/apache/arrow-datafusion/issues/3214 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** When we perform without needing the like `SELECT COUNT(1) FROM table`, the plan alwa

[GitHub] [arrow-cookbook] eitsupi commented on a diff in pull request #233: [R] Recipe for nested json

2022-08-21 Thread GitBox
eitsupi commented on code in PR #233: URL: https://github.com/apache/arrow-cookbook/pull/233#discussion_r950814293 ## r/content/reading_and_writing_data.Rmd: ## @@ -304,6 +304,39 @@ test_that("read_json_arrow chunk works as expected", { unlink(tf) ``` +### Discussion Review

[GitHub] [arrow-cookbook] eitsupi closed pull request #233: [R] Recipe for nested json

2022-08-21 Thread GitBox
eitsupi closed pull request #233: [R] Recipe for nested json URL: https://github.com/apache/arrow-cookbook/pull/233 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

[GitHub] [arrow-rs] roeap commented on a diff in pull request #2509: Replace azure sdk with custom implementation

2022-08-21 Thread GitBox
roeap commented on code in PR #2509: URL: https://github.com/apache/arrow-rs/pull/2509#discussion_r950810883 ## object_store/src/azure/credential.rs: ## @@ -0,0 +1,254 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

[GitHub] [arrow-rs] ursabot commented on pull request #2522: Remove DecimalByteArrayConvert (#2480)

2022-08-21 Thread GitBox
ursabot commented on PR #2522: URL: https://github.com/apache/arrow-rs/pull/2522#issuecomment-1221501371 Benchmark runs are scheduled for baseline = de7ad624234e139381bf79d2497e60beeb9949f8 and contender = 34216d57aba1739c866267825f54371d75d6c004. 34216d57aba1739c866267825f54371d75d6c004 i

[GitHub] [arrow-rs] tustvold merged pull request #2522: Remove DecimalByteArrayConvert (#2480)

2022-08-21 Thread GitBox
tustvold merged PR #2522: URL: https://github.com/apache/arrow-rs/pull/2522 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

[GitHub] [arrow] eitsupi closed pull request #12541: WIP ARROW-15734: [R][DOCS] Enable searching R docs

2022-08-21 Thread GitBox
eitsupi closed pull request #12541: WIP ARROW-15734: [R][DOCS] Enable searching R docs URL: https://github.com/apache/arrow/pull/12541 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

[GitHub] [arrow-rs] roeap commented on a diff in pull request #2509: Replace azure sdk with custom implementation

2022-08-21 Thread GitBox
roeap commented on code in PR #2509: URL: https://github.com/apache/arrow-rs/pull/2509#discussion_r950808326 ## object_store/src/client/oauth.rs: ## @@ -219,3 +223,82 @@ fn b64_encode_obj(obj: &T) -> Result { let string = serde_json::to_string(obj).context(EncodeSnafu)?;

[GitHub] [arrow] ursabot commented on pull request #13923: ARROW-17476: [Release][Packaging] Make binary uploader reusable from datafusion-c

2022-08-21 Thread GitBox
ursabot commented on PR #13923: URL: https://github.com/apache/arrow/pull/13923#issuecomment-1221498440 Benchmark runs are scheduled for baseline = 94fc25757864977ddfd3e47b1ef63c020df343ec and contender = fa33ca400759d4e9912b06df6680bf0199e28fd5. fa33ca400759d4e9912b06df6680bf0199e28fd5 is

[GitHub] [arrow-rs] JasonLi-cn commented on pull request #2476: display NULL instead of empty string

2022-08-21 Thread GitBox
JasonLi-cn commented on PR #2476: URL: https://github.com/apache/arrow-rs/pull/2476#issuecomment-1221494879 > Perhaps we could add an option to specify the NULL string, allowing people to opt-in to the new behaviour. This would likely involve creating a `FormatOptions` struct and passing it

[GitHub] [arrow-rs] tustvold commented on pull request #2529: add bench: decimal with byte array and fixed length byte array

2022-08-21 Thread GitBox
tustvold commented on PR #2529: URL: https://github.com/apache/arrow-rs/pull/2529#issuecomment-1221494757 > I can't find the point where to improve the performance in #2522 refactor to remove the DecimalByteArrayConvert? CompkexObjectArrayReader reads to the row format, i.e. separatel

[GitHub] [arrow-rs] liukun4515 commented on pull request #2529: add bench: decimal with byte array and fixed length byte array

2022-08-21 Thread GitBox
liukun4515 commented on PR #2529: URL: https://github.com/apache/arrow-rs/pull/2529#issuecomment-1221493313 It's better to remove the complexreader from the array reader @tustvold After your refactor, the type of parquet array reader will be matched with the parquet physical type `BOOL,

[GitHub] [arrow-rs] roeap commented on a diff in pull request #2509: Replace azure sdk with custom implementation

2022-08-21 Thread GitBox
roeap commented on code in PR #2509: URL: https://github.com/apache/arrow-rs/pull/2509#discussion_r950803790 ## object_store/src/azure/client.rs: ## @@ -0,0 +1,740 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See

  1   2   >