[GitHub] [arrow-rs] stevenliebregt closed issue #1627: Written Parquet file way bigger than input files

2022-04-29 Thread GitBox
stevenliebregt closed issue #1627: Written Parquet file way bigger than input files URL: https://github.com/apache/arrow-rs/issues/1627 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [arrow-rs] stevenliebregt commented on issue #1627: Written Parquet file way bigger than input files

2022-04-29 Thread GitBox
stevenliebregt commented on issue #1627: URL: https://github.com/apache/arrow-rs/issues/1627#issuecomment-1113935499 Thanks for the answer, I'll give those ideas a try, if I find it's a problem specific to Rust I'll create an issue. -- This is an automated message from the Apache Git Serv

[GitHub] [arrow] eitsupi commented on a diff in pull request #13005: ARROW-16276: [R] Arrow 8.0 News

2022-04-29 Thread GitBox
eitsupi commented on code in PR #13005: URL: https://github.com/apache/arrow/pull/13005#discussion_r862313809 ## r/NEWS.md: ## @@ -19,19 +19,123 @@ # arrow 7.0.0.9000 -* `read_csv_arrow()`'s readr-style type `T` is now mapped to `timestamp(unit = "ns")` instead of `timesta

[GitHub] [arrow] ursabot commented on pull request #12978: ARROW-16303: [C++] Check EINTR in file IO

2022-04-29 Thread GitBox
ursabot commented on PR #12978: URL: https://github.com/apache/arrow/pull/12978#issuecomment-1113919493 Benchmark runs are scheduled for baseline = eb4c4d6f6c94b58a89a8db3b5a98f94915c18d9b and contender = 468427c5cd89e8fdbeadc93e1044fc5b15cc1a80. 468427c5cd89e8fdbeadc93e1044fc5b15cc1a80 is

[GitHub] [arrow] westonpace commented on pull request #13028: ARROW-16083]: [WIP][C++] Implement AsofJoin execution node

2022-04-29 Thread GitBox
westonpace commented on PR #13028: URL: https://github.com/apache/arrow/pull/13028#issuecomment-1113910387 Ok, managed to look at it a bit more today. Your sidecar processing thread is probably fine for a first approach. Eventually we will probably want to get rid of it with somethin

[GitHub] [arrow-datafusion] dbr commented on issue #2374: Identifiers are made lower-case in SQL query

2022-04-29 Thread GitBox
dbr commented on issue #2374: URL: https://github.com/apache/arrow-datafusion/issues/2374#issuecomment-1113905938 Ah interesting! Quoting the identifiers works, thanks Just to be clear, is it still a bug that the original example fails? I would expect: - Given the field named

[GitHub] [arrow] vibhatha commented on a diff in pull request #12672: ARROW-15779: [Python] Create python bindings for Substrait consumer

2022-04-29 Thread GitBox
vibhatha commented on code in PR #12672: URL: https://github.com/apache/arrow/pull/12672#discussion_r862282740 ## cpp/src/arrow/engine/substrait/serde_test.cc: ## @@ -724,5 +728,103 @@ TEST(Substrait, ExtensionSetFromPlan) { EXPECT_EQ(decoded_add_func.name, "add"); } +TEST

[GitHub] [arrow] ursabot commented on pull request #12981: ARROW-16306: [CI] Fix Nightly verify rc on ubuntu

2022-04-29 Thread GitBox
ursabot commented on PR #12981: URL: https://github.com/apache/arrow/pull/12981#issuecomment-1113895816 Benchmark runs are scheduled for baseline = dcde920f24673e917b2893129a0bf3304c470047 and contender = eb4c4d6f6c94b58a89a8db3b5a98f94915c18d9b. eb4c4d6f6c94b58a89a8db3b5a98f94915c18d9b is

[GitHub] [arrow] westonpace commented on pull request #12601: ARROW-15901: [C++] Support Substrait projection with custom output field names

2022-04-29 Thread GitBox
westonpace commented on PR #12601: URL: https://github.com/apache/arrow/pull/12601#issuecomment-1113895574 Sorry for not noticing this originally. Feel free to ping someone in the future after a few days. As for the PR, I think this is part of the solution but I think we also want t

[GitHub] [arrow] vibhatha commented on a diff in pull request #12590: ARROW-15639 [C++][Python] UDF Scalar Function Implementation

2022-04-29 Thread GitBox
vibhatha commented on code in PR #12590: URL: https://github.com/apache/arrow/pull/12590#discussion_r862281448 ## cpp/src/arrow/compute/kernels/scalar_arithmetic.cc: ## @@ -2075,7 +2075,7 @@ Status ExecRound(KernelContext* ctx, const ExecBatch& batch, Datum* out) { // kernel d

[GitHub] [arrow-datafusion] yjshen commented on a diff in pull request #2388: Re-organize and rename aggregates physical plan

2022-04-29 Thread GitBox
yjshen commented on code in PR #2388: URL: https://github.com/apache/arrow-datafusion/pull/2388#discussion_r862273656 ## datafusion/core/src/physical_plan/aggregates/no_grouping.rs: ## @@ -0,0 +1,165 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

[GitHub] [arrow] vibhatha commented on a diff in pull request #12590: ARROW-15639 [C++][Python] UDF Scalar Function Implementation

2022-04-29 Thread GitBox
vibhatha commented on code in PR #12590: URL: https://github.com/apache/arrow/pull/12590#discussion_r862279818 ## python/pyarrow/tests/test_udf.py: ## @@ -0,0 +1,498 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See

[GitHub] [arrow] vibhatha commented on a diff in pull request #12590: ARROW-15639 [C++][Python] UDF Scalar Function Implementation

2022-04-29 Thread GitBox
vibhatha commented on code in PR #12590: URL: https://github.com/apache/arrow/pull/12590#discussion_r862279757 ## cpp/src/arrow/python/udf.cc: ## @@ -0,0 +1,131 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the

[GitHub] [arrow] westonpace closed pull request #13032: ARROW-16416: [C++] Support cast-function in Substrait

2022-04-29 Thread GitBox
westonpace closed pull request #13032: ARROW-16416: [C++] Support cast-function in Substrait URL: https://github.com/apache/arrow/pull/13032 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[GitHub] [arrow] westonpace commented on pull request #13036: ARROW-16417: [C++][Python] Segfault in test_exec_plan.py / test_joins

2022-04-29 Thread GitBox
westonpace commented on PR #13036: URL: https://github.com/apache/arrow/pull/13036#issuecomment-1113885656 Technically, we aren't quite there yet. The async generators in the scanner are allowed to run some cleanup after the exec plan runs. However, they strongly capture all of their stat

[GitHub] [arrow] lidavidm commented on pull request #13036: ARROW-16417: [C++][Python] Segfault in test_exec_plan.py / test_joins

2022-04-29 Thread GitBox
lidavidm commented on PR #13036: URL: https://github.com/apache/arrow/pull/13036#issuecomment-1113885234 Aha. Hmm, I wonder if it'd be useful to somehow have Pytest also assert that Arrow's thread pools are idle in between tests. (And frankly, Googletest as well.) -- This is an au

[GitHub] [arrow] westonpace commented on pull request #13036: ARROW-16417: [C++][Python] Segfault in test_exec_plan.py / test_joins

2022-04-29 Thread GitBox
westonpace commented on PR #13036: URL: https://github.com/apache/arrow/pull/13036#issuecomment-1113884973 I was able to get it to reproduce in debug mode with `stress -c 32`. Then I got reasonable stack traces that showed the hash join node still working while the python was getting ready

[GitHub] [arrow-datafusion] yjshen commented on a diff in pull request #2388: Re-organize and rename aggregates physical plan

2022-04-29 Thread GitBox
yjshen commented on code in PR #2388: URL: https://github.com/apache/arrow-datafusion/pull/2388#discussion_r862273656 ## datafusion/core/src/physical_plan/aggregates/no_grouping.rs: ## @@ -0,0 +1,165 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

[GitHub] [arrow] github-actions[bot] commented on pull request #13036: ARROW-16417: [C++][Python] Segfault in test_exec_plan.py / test_joins

2022-04-29 Thread GitBox
github-actions[bot] commented on PR #13036: URL: https://github.com/apache/arrow/pull/13036#issuecomment-1113883931 :warning: Ticket **has not been started in JIRA**, please click 'Start Progress'. -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [arrow] github-actions[bot] commented on pull request #13036: ARROW-16417: [C++][Python] Segfault in test_exec_plan.py / test_joins

2022-04-29 Thread GitBox
github-actions[bot] commented on PR #13036: URL: https://github.com/apache/arrow/pull/13036#issuecomment-1113883925 https://issues.apache.org/jira/browse/ARROW-16417 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow-datafusion] yjshen opened a new pull request, #2388: Re-organize and rename aggregates physical plan

2022-04-29 Thread GitBox
yjshen opened a new pull request, #2388: URL: https://github.com/apache/arrow-datafusion/pull/2388 # Which issue does this PR close? Closes #2387. # Rationale for this change - We currently have a hash-based implementation, `GroupedHashAggregateStream` for a

[GitHub] [arrow] westonpace opened a new pull request, #13036: ARROW-16417: [C++][Python] Segfault in test_exec_plan.py / test_joins

2022-04-29 Thread GitBox
westonpace opened a new pull request, #13036: URL: https://github.com/apache/arrow/pull/13036 This builds on top of #13035 which is also important for avoiding segmentation faults. On top of that there were a few more problems: * The python was using `SourceNodeOptions::FromTable` w

[GitHub] [arrow-datafusion] yjshen opened a new issue, #2387: Re-organize and rename aggregates physical plan

2022-04-29 Thread GitBox
yjshen opened a new issue, #2387: URL: https://github.com/apache/arrow-datafusion/issues/2387 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** - We currently have a hash-based implementation, `GroupedHashAggregateStream` for ag

[GitHub] [arrow] vibhatha commented on a diff in pull request #12590: ARROW-15639 [C++][Python] UDF Scalar Function Implementation

2022-04-29 Thread GitBox
vibhatha commented on code in PR #12590: URL: https://github.com/apache/arrow/pull/12590#discussion_r862270048 ## python/pyarrow/tests/test_udf.py: ## @@ -0,0 +1,498 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See

[GitHub] [arrow-datafusion] yjshen commented on pull request #2375: WIP: Use row format for aggregate

2022-04-29 Thread GitBox
yjshen commented on PR #2375: URL: https://github.com/apache/arrow-datafusion/pull/2375#issuecomment-1113877450 Sorry to mix two things into one PR. I would divide this as separate PRs. One for each of these ideas: 1. Promote `physical-plan/hash_aggregates.rs` to a directory, and ren

[GitHub] [arrow] vibhatha commented on a diff in pull request #12590: ARROW-15639 [C++][Python] UDF Scalar Function Implementation

2022-04-29 Thread GitBox
vibhatha commented on code in PR #12590: URL: https://github.com/apache/arrow/pull/12590#discussion_r862268470 ## python/pyarrow/tests/test_udf.py: ## @@ -0,0 +1,498 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See

[GitHub] [arrow-datafusion] WinkerDu commented on pull request #2386: refactor `distinct_expressions.rs` and split into `count_distinct.rs` and `array_agg_distinct.rs`

2022-04-29 Thread GitBox
WinkerDu commented on PR #2386: URL: https://github.com/apache/arrow-datafusion/pull/2386#issuecomment-1113874277 cc @andygrove @alamb @yjshen @xudong963 Please have a view, thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [arrow] ursabot commented on pull request #12979: ARROW-16305: [C++] Missed reference to ARROW_ENGINE during the rename

2022-04-29 Thread GitBox
ursabot commented on PR #12979: URL: https://github.com/apache/arrow/pull/12979#issuecomment-1113865690 Benchmark runs are scheduled for baseline = 4a2668346cf55e01fb97add2bd561392c9e55068 and contender = dcde920f24673e917b2893129a0bf3304c470047. dcde920f24673e917b2893129a0bf3304c470047 is

[GitHub] [arrow] ArianaVillegas commented on a diff in pull request #12625: ARROW-15587: [C++] Add support for all options specified by substrait::ReadRel::LocalFiles::FileOrFiles

2022-04-29 Thread GitBox
ArianaVillegas commented on code in PR #12625: URL: https://github.com/apache/arrow/pull/12625#discussion_r862248137 ## cpp/src/arrow/engine/substrait/relation_internal.cc: ## @@ -88,35 +142,41 @@ Result FromProto(const substrait::Rel& rel, std::shared_ptr format;

[GitHub] [arrow] sanjibansg commented on a diff in pull request #12977: ARROW-16302: [C++] Null values in partitioning field for FilenamePartitioning

2022-04-29 Thread GitBox
sanjibansg commented on code in PR #12977: URL: https://github.com/apache/arrow/pull/12977#discussion_r862234806 ## cpp/src/arrow/dataset/file_parquet.cc: ## @@ -847,7 +847,8 @@ ParquetDatasetFactory::CollectParquetFragments(const Partitioning& partitioning) auto row_group

[GitHub] [arrow] sanjibansg commented on a diff in pull request #12977: ARROW-16302: [C++] Null values in partitioning field for FilenamePartitioning

2022-04-29 Thread GitBox
sanjibansg commented on code in PR #12977: URL: https://github.com/apache/arrow/pull/12977#discussion_r862231668 ## cpp/src/arrow/dataset/file_parquet.cc: ## @@ -873,7 +874,8 @@ Result>> ParquetDatasetFactory::InspectSchem size_t i = 0; for (const auto& e : paths_wi

[GitHub] [arrow] sanjibansg commented on a diff in pull request #12977: ARROW-16302: [C++] Null values in partitioning field for FilenamePartitioning

2022-04-29 Thread GitBox
sanjibansg commented on code in PR #12977: URL: https://github.com/apache/arrow/pull/12977#discussion_r862231668 ## cpp/src/arrow/dataset/file_parquet.cc: ## @@ -873,7 +874,8 @@ Result>> ParquetDatasetFactory::InspectSchem size_t i = 0; for (const auto& e : paths_wi

[GitHub] [arrow] jonkeane closed pull request #12883: ARROW-16073: [R] clean-up date time unit testing once tzdb is available on Windows

2022-04-29 Thread GitBox
jonkeane closed pull request #12883: ARROW-16073: [R] clean-up date time unit testing once tzdb is available on Windows URL: https://github.com/apache/arrow/pull/12883 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [arrow] vibhatha commented on a diff in pull request #12672: ARROW-15779: [Python] Create python bindings for Substrait consumer

2022-04-29 Thread GitBox
vibhatha commented on code in PR #12672: URL: https://github.com/apache/arrow/pull/12672#discussion_r862228958 ## cpp/src/arrow/engine/substrait/serde_test.cc: ## @@ -724,5 +728,103 @@ TEST(Substrait, ExtensionSetFromPlan) { EXPECT_EQ(decoded_add_func.name, "add"); } +TEST

[GitHub] [arrow-julia] pcjentsch opened a new pull request, #321: fix version mismatch by changing footer to V5

2022-04-29 Thread GitBox
pcjentsch opened a new pull request, #321: URL: https://github.com/apache/arrow-julia/pull/321 fixes #320 hopefully V5 is the correct version for both of these With apache/arrow-rs#1631, files written by Arrow.jl are readable by `arrow-rs`. -- This is an automated me

[GitHub] [arrow-rs] pcjentsch opened a new pull request, #1631: do not assume footer exists, fixes issue #1335

2022-04-29 Thread GitBox
pcjentsch opened a new pull request, #1631: URL: https://github.com/apache/arrow-rs/pull/1631 This is my first Rust PR, please let me know if I could do this in a more idiomatic way! # Which issue does this PR close? Closes #1335, on the `arrow-rs` side at least, Julia's Ar

[GitHub] [arrow-rs] tustvold commented on issue #1626: Expose ArrowWriter row group flush in public API

2022-04-29 Thread GitBox
tustvold commented on issue #1626: URL: https://github.com/apache/arrow-rs/issues/1626#issuecomment-1113803611 I don't see any issue with exposing this, more power to the user, however, some thoughts: - I wonder if you could just set the max row group size smaller if you want greater

[GitHub] [arrow] ArianaVillegas commented on a diff in pull request #12625: ARROW-15587: [C++] Add support for all options specified by substrait::ReadRel::LocalFiles::FileOrFiles

2022-04-29 Thread GitBox
ArianaVillegas commented on code in PR #12625: URL: https://github.com/apache/arrow/pull/12625#discussion_r862203115 ## cpp/src/arrow/filesystem/path_util.cc: ## @@ -287,6 +288,38 @@ bool IsLikelyUri(util::string_view v) { return ::arrow::internal::IsValidUriScheme(v.substr(0

[GitHub] [arrow-rs] tustvold commented on issue #1627: Written Parquet file way bigger than input files

2022-04-29 Thread GitBox
tustvold commented on issue #1627: URL: https://github.com/apache/arrow-rs/issues/1627#issuecomment-1113791088 Some ideas to try: - Disable dictionary compression for columns that don't have repeated values - Use writer version 2, which has better string encoding - Represent the

[GitHub] [arrow] westonpace commented on a diff in pull request #12625: ARROW-15587: [C++] Add support for all options specified by substrait::ReadRel::LocalFiles::FileOrFiles

2022-04-29 Thread GitBox
westonpace commented on code in PR #12625: URL: https://github.com/apache/arrow/pull/12625#discussion_r862196990 ## cpp/src/arrow/filesystem/path_util.cc: ## @@ -287,6 +288,38 @@ bool IsLikelyUri(util::string_view v) { return ::arrow::internal::IsValidUriScheme(v.substr(0, po

[GitHub] [arrow] westonpace commented on a diff in pull request #12625: ARROW-15587: [C++] Add support for all options specified by substrait::ReadRel::LocalFiles::FileOrFiles

2022-04-29 Thread GitBox
westonpace commented on code in PR #12625: URL: https://github.com/apache/arrow/pull/12625#discussion_r862196550 ## cpp/src/arrow/filesystem/path_util.cc: ## @@ -287,6 +288,38 @@ bool IsLikelyUri(util::string_view v) { return ::arrow::internal::IsValidUriScheme(v.substr(0, po

[GitHub] [arrow] ArianaVillegas commented on a diff in pull request #12625: ARROW-15587: [C++] Add support for all options specified by substrait::ReadRel::LocalFiles::FileOrFiles

2022-04-29 Thread GitBox
ArianaVillegas commented on code in PR #12625: URL: https://github.com/apache/arrow/pull/12625#discussion_r862178024 ## cpp/src/arrow/filesystem/path_util.cc: ## @@ -287,6 +288,38 @@ bool IsLikelyUri(util::string_view v) { return ::arrow::internal::IsValidUriScheme(v.substr(0

[GitHub] [arrow] ArianaVillegas commented on a diff in pull request #12625: ARROW-15587: [C++] Add support for all options specified by substrait::ReadRel::LocalFiles::FileOrFiles

2022-04-29 Thread GitBox
ArianaVillegas commented on code in PR #12625: URL: https://github.com/apache/arrow/pull/12625#discussion_r862178024 ## cpp/src/arrow/filesystem/path_util.cc: ## @@ -287,6 +288,38 @@ bool IsLikelyUri(util::string_view v) { return ::arrow::internal::IsValidUriScheme(v.substr(0

[GitHub] [arrow] edponce commented on a diff in pull request #13009: ARROW-602: [C++] Provide iterator access to primitive elements inside an Array

2022-04-29 Thread GitBox
edponce commented on code in PR #13009: URL: https://github.com/apache/arrow/pull/13009#discussion_r862160969 ## cpp/src/arrow/stl_iterator.h: ## @@ -128,6 +131,148 @@ class ArrayIterator { int64_t index_; }; +template > +class ChunkedArrayIterator { + public: + using val

[GitHub] [arrow] ArianaVillegas commented on a diff in pull request #12625: ARROW-15587: [C++] Add support for all options specified by substrait::ReadRel::LocalFiles::FileOrFiles

2022-04-29 Thread GitBox
ArianaVillegas commented on code in PR #12625: URL: https://github.com/apache/arrow/pull/12625#discussion_r862178024 ## cpp/src/arrow/filesystem/path_util.cc: ## @@ -287,6 +288,38 @@ bool IsLikelyUri(util::string_view v) { return ::arrow::internal::IsValidUriScheme(v.substr(0

[GitHub] [arrow] ArianaVillegas commented on a diff in pull request #12625: ARROW-15587: [C++] Add support for all options specified by substrait::ReadRel::LocalFiles::FileOrFiles

2022-04-29 Thread GitBox
ArianaVillegas commented on code in PR #12625: URL: https://github.com/apache/arrow/pull/12625#discussion_r862178024 ## cpp/src/arrow/filesystem/path_util.cc: ## @@ -287,6 +288,38 @@ bool IsLikelyUri(util::string_view v) { return ::arrow::internal::IsValidUriScheme(v.substr(0

[GitHub] [arrow] wjones127 commented on pull request #12775: ARROW-16006: [C++] Row conversion helpers and example

2022-04-29 Thread GitBox
wjones127 commented on PR #12775: URL: https://github.com/apache/arrow/pull/12775#issuecomment-1113764393 Not sure if this is related to the appveyor segfault, but some batch sizes produce a segfault: ``` # segfault for me rapidjson-row-converter 1000 500 # runs fine r

[GitHub] [arrow] edponce commented on pull request #13009: ARROW-602: [C++] Provide iterator access to primitive elements inside an Array

2022-04-29 Thread GitBox
edponce commented on PR #13009: URL: https://github.com/apache/arrow/pull/13009#issuecomment-1113763772 @pitrou @lidavidm I apologize for all the unnecessary comments/discussion I made, I thought this was a new JIRA issue and had no idea of all the discussions that had happened in the JIRA

[GitHub] [arrow-datafusion] WinkerDu opened a new pull request, #2386: refactor `distinct_expressions.rs` and split into `count_distinct.rs` and `array_agg_distinct.rs`

2022-04-29 Thread GitBox
WinkerDu opened a new pull request, #2386: URL: https://github.com/apache/arrow-datafusion/pull/2386 # Which issue does this PR close? Closes #2385 . # Rationale for this change `distinct_expression.rs` can be split into `count_distinct.rs` and `array_agg_distin

[GitHub] [arrow-datafusion] WinkerDu opened a new issue, #2385: split `distinct_expression.rs` into `count_distinct.rs` and `array_agg_distinct.rs`

2022-04-29 Thread GitBox
WinkerDu opened a new issue, #2385: URL: https://github.com/apache/arrow-datafusion/issues/2385 `distinct_expression.rs` can be split into `count_distinct.rs` and `array_agg_distinct.rs` to provide better code organization. -- This is an automated message from the Apache Git Service. To r

[GitHub] [arrow-datafusion] andygrove commented on issue #2278: LogicalPlanBuilder::scan_csv creates scans with invalid table names

2022-04-29 Thread GitBox
andygrove commented on issue #2278: URL: https://github.com/apache/arrow-datafusion/issues/2278#issuecomment-1113753014 Sorry, just catching up here. This issue came up when I was attempting to refactor DF to have the plan just refer to table sources by name, and this would require names t

[GitHub] [arrow] WillAyd commented on a diff in pull request #12963: ARROW-16234: [C++] Vector Kernel for Rank

2022-04-29 Thread GitBox
WillAyd commented on code in PR #12963: URL: https://github.com/apache/arrow/pull/12963#discussion_r862162489 ## cpp/src/arrow/compute/kernels/vector_sort.cc: ## @@ -1909,6 +1909,110 @@ class SelectKUnstableMetaFunction : public MetaFunction { } }; +// ---

[GitHub] [arrow] edponce commented on a diff in pull request #13009: ARROW-602: [C++] Provide iterator access to primitive elements inside an Array

2022-04-29 Thread GitBox
edponce commented on code in PR #13009: URL: https://github.com/apache/arrow/pull/13009#discussion_r862160969 ## cpp/src/arrow/stl_iterator.h: ## @@ -128,6 +131,148 @@ class ArrayIterator { int64_t index_; }; +template > +class ChunkedArrayIterator { + public: + using val

[GitHub] [arrow-datafusion] andygrove closed issue #2356: simply_expressions panics on unsupported evaluations

2022-04-29 Thread GitBox
andygrove closed issue #2356: simply_expressions panics on unsupported evaluations URL: https://github.com/apache/arrow-datafusion/issues/2356 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [arrow-datafusion] andygrove commented on issue #2356: simply_expressions panics on unsupported evaluations

2022-04-29 Thread GitBox
andygrove commented on issue #2356: URL: https://github.com/apache/arrow-datafusion/issues/2356#issuecomment-1113741540 This issue was not valid. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow-datafusion] andygrove commented on issue #2350: Enable discussion tab?

2022-04-29 Thread GitBox
andygrove commented on issue #2350: URL: https://github.com/apache/arrow-datafusion/issues/2350#issuecomment-1113740669 I have been thinking about this as well and I think it would be a good idea so that we can keep discussions and issues separate. -- This is an automated message from th

[GitHub] [arrow-datafusion] andygrove commented on issue #2374: Identifiers are made lower-case in SQL query

2022-04-29 Thread GitBox
andygrove commented on issue #2374: URL: https://github.com/apache/arrow-datafusion/issues/2374#issuecomment-1113738686 @dbr Unquoted SQL identifiers are case-insensitive by design (to match ANSI and Postgres). SQL identifiers in double quotes are case-sensitive. If you put double quotes a

[GitHub] [arrow-datafusion] comphead commented on issue #2278: LogicalPlanBuilder::scan_csv creates scans with invalid table names

2022-04-29 Thread GitBox
comphead commented on issue #2278: URL: https://github.com/apache/arrow-datafusion/issues/2278#issuecomment-1113738338 > I haven't tried the scenario described in this report, but I would expect to be able to refer to a table named `employee.csv` using `"employee.csv"` (aka put it in singl

[GitHub] [arrow] edponce commented on pull request #13009: ARROW-602: [C++] Provide iterator access to primitive elements inside an Array

2022-04-29 Thread GitBox
edponce commented on PR #13009: URL: https://github.com/apache/arrow/pull/13009#issuecomment-1113736711 A solution is to have the factory `MakeIterator` not be templated and check the type of the `chunked_array` at runtime, and use a switch-case to dispatch the correct `ChunkedArrayIterator

[GitHub] [arrow] ursabot commented on pull request #12976: ARROW-16301: [C#][CI] Fix docker configuration for .NET 6

2022-04-29 Thread GitBox
ursabot commented on PR #12976: URL: https://github.com/apache/arrow/pull/12976#issuecomment-1113726770 Benchmark runs are scheduled for baseline = 95c8984f3c5050476d6db7c786eb9c81de42944c and contender = 4a2668346cf55e01fb97add2bd561392c9e55068. 4a2668346cf55e01fb97add2bd561392c9e55068 is

[GitHub] [arrow-datafusion] andygrove opened a new pull request, #2384: Allow CTEs to be referenced from subquery expressions

2022-04-29 Thread GitBox
andygrove opened a new pull request, #2384: URL: https://github.com/apache/arrow-datafusion/pull/2384 # Which issue does this PR close? Closes https://github.com/apache/arrow-datafusion/issues/2379 # Rationale for this change Allows CTEs fo be referenced from

[GitHub] [arrow] westonpace commented on a diff in pull request #13035: ARROW-16419: [Python] Properly wait for ExecPlan to finish

2022-04-29 Thread GitBox
westonpace commented on code in PR #13035: URL: https://github.com/apache/arrow/pull/13035#discussion_r862135947 ## python/pyarrow/includes/libarrow.pxd: ## @@ -81,6 +81,11 @@ cdef extern from "arrow/config.h" namespace "arrow" nogil: CRuntimeInfo GetRuntimeInfo() +cdef

[GitHub] [arrow] github-actions[bot] commented on pull request #13035: ARROW-16419: [Python] Properly wait for ExecPlan to finish

2022-04-29 Thread GitBox
github-actions[bot] commented on PR #13035: URL: https://github.com/apache/arrow/pull/13035#issuecomment-1113702769 https://issues.apache.org/jira/browse/ARROW-16419 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow] lidavidm commented on a diff in pull request #12941: ARROW-15755: [Java] Support Java 17

2022-04-29 Thread GitBox
lidavidm commented on code in PR #12941: URL: https://github.com/apache/arrow/pull/12941#discussion_r862131278 ## docs/source/developers/java/building.rst: ## @@ -32,7 +32,7 @@ Arrow Java uses the `Maven `_ build system. Building requires: -* JDK

[GitHub] [arrow] westonpace commented on issue #13030: [JAVA] Is any way reading partial parquet file into arrow

2022-04-29 Thread GitBox
westonpace commented on issue #13030: URL: https://github.com/apache/arrow/issues/13030#issuecomment-1113692280 A parquet file is made up of row groups, columns, and pages. A page is indivisible as it represents a compressed buffer. There is no way to read a part of a page and so it canno

[GitHub] [arrow] westonpace commented on a diff in pull request #12625: ARROW-15587: [C++] Add support for all options specified by substrait::ReadRel::LocalFiles::FileOrFiles

2022-04-29 Thread GitBox
westonpace commented on code in PR #12625: URL: https://github.com/apache/arrow/pull/12625#discussion_r862119369 ## cpp/src/arrow/filesystem/path_util.cc: ## @@ -287,6 +288,38 @@ bool IsLikelyUri(util::string_view v) { return ::arrow::internal::IsValidUriScheme(v.substr(0, po

[GitHub] [arrow] westonpace commented on a diff in pull request #12625: ARROW-15587: [C++] Add support for all options specified by substrait::ReadRel::LocalFiles::FileOrFiles

2022-04-29 Thread GitBox
westonpace commented on code in PR #12625: URL: https://github.com/apache/arrow/pull/12625#discussion_r862119369 ## cpp/src/arrow/filesystem/path_util.cc: ## @@ -287,6 +288,38 @@ bool IsLikelyUri(util::string_view v) { return ::arrow::internal::IsValidUriScheme(v.substr(0, po

[GitHub] [arrow] davisusanibar commented on pull request #12941: ARROW-15755: [Java] Support Java 17

2022-04-29 Thread GitBox
davisusanibar commented on PR #12941: URL: https://github.com/apache/arrow/pull/12941#issuecomment-1113676772 > Good point. @davisusanibar can we include the docs changes here? Thanks @toddfarmer. Just updated. -- This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow] westonpace commented on a diff in pull request #13032: ARROW-16416: [C++] Support cast-function in Substrait

2022-04-29 Thread GitBox
westonpace commented on code in PR #13032: URL: https://github.com/apache/arrow/pull/13032#discussion_r862108441 ## cpp/src/arrow/engine/substrait/expression_internal.cc: ## @@ -165,7 +165,15 @@ Result FromProto(const substrait::Expression& expr, ARROW_ASSIGN_OR_RAISE(

[GitHub] [arrow] lidavidm commented on pull request #12941: ARROW-15755: [Java] Support Java 17

2022-04-29 Thread GitBox
lidavidm commented on PR #12941: URL: https://github.com/apache/arrow/pull/12941#issuecomment-1113663753 Good point. @davisusanibar can we include the docs changes here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [arrow] toddfarmer commented on pull request #12941: ARROW-15755: [Java] Support Java 17

2022-04-29 Thread GitBox
toddfarmer commented on PR #12941: URL: https://github.com/apache/arrow/pull/12941#issuecomment-1113663251 Just a note here as a reminder that https://arrow.apache.org/docs/dev/developers/java/building.html will need updates once support for additional JDK versions is added. -- This is a

[GitHub] [arrow] wjones127 commented on a diff in pull request #13005: ARROW-16276: [R] Arrow 8.0 News

2022-04-29 Thread GitBox
wjones127 commented on code in PR #13005: URL: https://github.com/apache/arrow/pull/13005#discussion_r862093726 ## r/NEWS.md: ## @@ -19,19 +19,111 @@ # arrow 7.0.0.9000 -* `read_csv_arrow()`'s readr-style type `T` is now mapped to `timestamp(unit = "ns")` instead of `times

[GitHub] [arrow-datafusion] andygrove closed issue #2376: SQL planner is inconsistent in normalizing identifiers

2022-04-29 Thread GitBox
andygrove closed issue #2376: SQL planner is inconsistent in normalizing identifiers URL: https://github.com/apache/arrow-datafusion/issues/2376 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [arrow-datafusion] andygrove merged pull request #2373: Fix bugs with CTE aliasing and normalize all identifiers in the SQL planner

2022-04-29 Thread GitBox
andygrove merged PR #2373: URL: https://github.com/apache/arrow-datafusion/pull/2373 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[GitHub] [arrow-datafusion] andygrove closed issue #2372: Duplicate qualified field in complex SQL with CTE, alias, and join

2022-04-29 Thread GitBox
andygrove closed issue #2372: Duplicate qualified field in complex SQL with CTE, alias, and join URL: https://github.com/apache/arrow-datafusion/issues/2372 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow] AlvinJ15 commented on a diff in pull request #13009: ARROW-602: [C++] Provide iterator access to primitive elements inside an Array

2022-04-29 Thread GitBox
AlvinJ15 commented on code in PR #13009: URL: https://github.com/apache/arrow/pull/13009#discussion_r862089851 ## cpp/src/arrow/stl_iterator.h: ## @@ -128,6 +131,148 @@ class ArrayIterator { int64_t index_; }; +template > +class ChunkedArrayIterator { + public: + using va

[GitHub] [arrow] dragosmg commented on pull request #12883: ARROW-16073: [R] clean-up date time unit testing once tzdb is available on Windows

2022-04-29 Thread GitBox
dragosmg commented on PR #12883: URL: https://github.com/apache/arrow/pull/12883#issuecomment-1113650908 @jonkeane I believe I addressed all comments/suggestions. Would you have time for another look? -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow-datafusion] timvw commented on issue #2383: Build fails

2022-04-29 Thread GitBox
timvw commented on issue #2383: URL: https://github.com/apache/arrow-datafusion/issues/2383#issuecomment-1113644624 PEBKAC: was still working on outdated fork. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [arrow] lidavidm commented on a diff in pull request #12977: ARROW-16302: [C++] Null values in partitioning field for FilenamePartitioning

2022-04-29 Thread GitBox
lidavidm commented on code in PR #12977: URL: https://github.com/apache/arrow/pull/12977#discussion_r862081385 ## cpp/src/arrow/dataset/file_parquet.cc: ## @@ -873,7 +874,8 @@ Result>> ParquetDatasetFactory::InspectSchem size_t i = 0; for (const auto& e : paths_with

[GitHub] [arrow-datafusion] timvw closed issue #2383: Build fails

2022-04-29 Thread GitBox
timvw closed issue #2383: Build fails URL: https://github.com/apache/arrow-datafusion/issues/2383 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github

[GitHub] [arrow-datafusion] timvw opened a new issue, #2383: Build fails

2022-04-29 Thread GitBox
timvw opened a new issue, #2383: URL: https://github.com/apache/arrow-datafusion/issues/2383 **Describe the bug** Currently a "cargo build" fails with the following error: error[E0599]: no function or associated item named `from_str_unchecked` found for struct `proc_macro2::Literal` in

[GitHub] [arrow] sanjibansg commented on a diff in pull request #12977: ARROW-16302: [C++] Null values in partitioning field for FilenamePartitioning

2022-04-29 Thread GitBox
sanjibansg commented on code in PR #12977: URL: https://github.com/apache/arrow/pull/12977#discussion_r862080350 ## cpp/src/arrow/dataset/file_parquet.cc: ## @@ -873,7 +874,8 @@ Result>> ParquetDatasetFactory::InspectSchem size_t i = 0; for (const auto& e : paths_wi

[GitHub] [arrow] ArianaVillegas commented on pull request #12625: ARROW-15587: [C++] Add support for all options specified by substrait::ReadRel::LocalFiles::FileOrFiles

2022-04-29 Thread GitBox
ArianaVillegas commented on PR #12625: URL: https://github.com/apache/arrow/pull/12625#issuecomment-1113634696 I think I address most of your comments. Let me know if something else is missing :) -- This is an automated message from the Apache Git Service. To respond to the message, pleas

[GitHub] [arrow-datafusion] andygrove commented on pull request #2375: WIP: Use row format for aggregate

2022-04-29 Thread GitBox
andygrove commented on PR #2375: URL: https://github.com/apache/arrow-datafusion/pull/2375#issuecomment-1113633390 > The current PR seems scary in size, maybe I should move the physical_plan folder re-org as a separate PR first. I think that would help. Are we replacing HashAg

[GitHub] [arrow] edponce commented on a diff in pull request #13009: ARROW-602: [C++] Provide iterator access to primitive elements inside an Array

2022-04-29 Thread GitBox
edponce commented on code in PR #13009: URL: https://github.com/apache/arrow/pull/13009#discussion_r862073194 ## cpp/src/arrow/stl_iterator.h: ## @@ -128,6 +131,148 @@ class ArrayIterator { int64_t index_; }; +template > +class ChunkedArrayIterator { + public: + using val

[GitHub] [arrow] github-actions[bot] commented on pull request #12941: ARROW-15755: [Java] Support Java 17

2022-04-29 Thread GitBox
github-actions[bot] commented on PR #12941: URL: https://github.com/apache/arrow/pull/12941#issuecomment-1113629728 Revision: 439403bdfb15f578dfe9dc29fc8d285dbc719c10 Submitted crossbow builds: [ursacomputing/crossbow @ actions-1988](https://github.com/ursacomputing/crossbow/branches/

[GitHub] [arrow] lidavidm commented on pull request #12941: ARROW-15755: [Java] Support Java 17

2022-04-29 Thread GitBox
lidavidm commented on PR #12941: URL: https://github.com/apache/arrow/pull/12941#issuecomment-1113628654 @github-actions crossbow submit *java* -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow-rs] alamb commented on issue #1623: Release next version of arrow-rs after 12.0.0

2022-04-29 Thread GitBox
alamb commented on issue #1623: URL: https://github.com/apache/arrow-rs/issues/1623#issuecomment-1113626501 And we have an arrow-rs 13.0.0 release candidate: https://lists.apache.org/thread/zqfx9v1j8onwtfv98xqbn0zy9x5pvc04 -- This is an automated message from the Apache Git Service. To r

[GitHub] [arrow-datafusion] WinkerDu commented on pull request #2364: Add proper support for `null` literal by introducing `ScalarValue::Null`

2022-04-29 Thread GitBox
WinkerDu commented on PR #2364: URL: https://github.com/apache/arrow-datafusion/pull/2364#issuecomment-1113626308 cc @alamb PTAL, thank you ❤️ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[GitHub] [arrow] lidavidm commented on a diff in pull request #13033: ARROW-16413: [Python] Certain dataset APIs hang with a python filesystem

2022-04-29 Thread GitBox
lidavidm commented on code in PR #13033: URL: https://github.com/apache/arrow/pull/13033#discussion_r862066616 ## python/pyarrow/tests/test_dataset.py: ## @@ -2562,6 +2562,43 @@ def test_open_dataset_from_fsspec(tempdir): assert dataset.schema.equals(table.schema) +@pyt

[GitHub] [arrow] lidavidm commented on a diff in pull request #12977: ARROW-16302: [C++] Null values in partitioning field for FilenamePartitioning

2022-04-29 Thread GitBox
lidavidm commented on code in PR #12977: URL: https://github.com/apache/arrow/pull/12977#discussion_r862060024 ## cpp/src/arrow/dataset/file_parquet.cc: ## @@ -847,7 +847,8 @@ ParquetDatasetFactory::CollectParquetFragments(const Partitioning& partitioning) auto row_groups

[GitHub] [arrow] sanjibansg commented on a diff in pull request #12977: ARROW-16302: [C++] Null values in partitioning field for FilenamePartitioning

2022-04-29 Thread GitBox
sanjibansg commented on code in PR #12977: URL: https://github.com/apache/arrow/pull/12977#discussion_r862058051 ## cpp/src/arrow/dataset/file_parquet.cc: ## @@ -847,7 +847,8 @@ ParquetDatasetFactory::CollectParquetFragments(const Partitioning& partitioning) auto row_group

[GitHub] [arrow] lidavidm commented on pull request #12977: ARROW-16302: [C++] Null values in partitioning field for FilenamePartitioning

2022-04-29 Thread GitBox
lidavidm commented on PR #12977: URL: https://github.com/apache/arrow/pull/12977#issuecomment-1113616149 Thanks. @westonpace any comments here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2357: Implement physical planner support for DATE +/- INTERVAL

2022-04-29 Thread GitBox
alamb commented on code in PR #2357: URL: https://github.com/apache/arrow-datafusion/pull/2357#discussion_r862053645 ## datafusion/core/src/physical_plan/planner.rs: ## @@ -964,7 +965,27 @@ pub fn create_physical_expr( input_schema, execution_pr

[GitHub] [arrow-datafusion] andygrove commented on a diff in pull request #2373: Fix bugs with CTE aliasing and normalize all identifiers in the SQL planner

2022-04-29 Thread GitBox
andygrove commented on code in PR #2373: URL: https://github.com/apache/arrow-datafusion/pull/2373#discussion_r862053422 ## datafusion/core/tests/sql/information_schema.rs: ## @@ -307,14 +307,8 @@ async fn information_schema_show_columns() { let result = plan_and_collect(&c

[GitHub] [arrow] lidavidm commented on pull request #13009: ARROW-602: [C++] Provide iterator access to primitive elements inside an Array

2022-04-29 Thread GitBox
lidavidm commented on PR #13009: URL: https://github.com/apache/arrow/pull/13009#issuecomment-1113609517 > If we could add a `TypeClass` attribute to base `Array` then I think we are good, but what value would it have? How would this work? The value would vary at runtime, but we need

[GitHub] [arrow-datafusion] alamb commented on pull request #2177: User Defined Table Function (udtf) support

2022-04-29 Thread GitBox
alamb commented on PR #2177: URL: https://github.com/apache/arrow-datafusion/pull/2177#issuecomment-1113608594 @gandronchik thank you for the explanation in this PR's description. It helps though I will admit I still don't fully understand what is going o. I agree with @doki23 --

[GitHub] [arrow-datafusion] andygrove commented on a diff in pull request #2357: Implement physical planner support for DATE +/- INTERVAL

2022-04-29 Thread GitBox
andygrove commented on code in PR #2357: URL: https://github.com/apache/arrow-datafusion/pull/2357#discussion_r862049831 ## datafusion/core/src/optimizer/simplify_expressions.rs: ## @@ -400,9 +402,9 @@ impl<'a> ConstEvaluator<'a> { } /// Internal helper to evaluates

[GitHub] [arrow] edponce commented on pull request #13009: ARROW-602: [C++] Provide iterator access to primitive elements inside an Array

2022-04-29 Thread GitBox
edponce commented on PR #13009: URL: https://github.com/apache/arrow/pull/13009#issuecomment-1113603762 @pitrou @lidavidm I see now the issue with typing and `ChunkedArray`. Related to this, @AlvinJ15 attempted to add `ChunkedArrayIterator` as a friend class to `ChunkedArray`, but `ChunkedA

[GitHub] [arrow] sanjibansg commented on a diff in pull request #12977: ARROW-16302: [C++] Null values in partitioning field for FilenamePartitioning

2022-04-29 Thread GitBox
sanjibansg commented on code in PR #12977: URL: https://github.com/apache/arrow/pull/12977#discussion_r862046361 ## python/pyarrow/_dataset.pyx: ## @@ -1313,9 +1313,12 @@ cdef class Partitioning(_Weakrefable): cdef inline shared_ptr[CPartitioning] unwrap(self): ret

  1   2   3   >