[GitHub] [arrow-datafusion] andygrove commented on a diff in pull request #2791: Add config option for coalesce_batches physical optimization rule, make optional

2022-06-27 Thread GitBox
andygrove commented on code in PR #2791: URL: https://github.com/apache/arrow-datafusion/pull/2791#discussion_r908103294 ## datafusion/core/src/execution/context.rs: ## @@ -1247,16 +1250,26 @@ impl SessionState { rules.push(Arc::new(LimitPushDown::new())); rule

[GitHub] [arrow-datafusion] andygrove commented on pull request #2792: Add LogicalPlan::Distinct

2022-06-27 Thread GitBox
andygrove commented on PR #2792: URL: https://github.com/apache/arrow-datafusion/pull/2792#issuecomment-1168305149 Thanks @mrob95. I plan on reviewing this later this week. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow-datafusion] andygrove merged pull request #2802: Correct schema nullability declaration in tests

2022-06-27 Thread GitBox
andygrove merged PR #2802: URL: https://github.com/apache/arrow-datafusion/pull/2802 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[GitHub] [arrow-datafusion] andygrove merged pull request #2804: fix schema nullability for `information_schema` schema

2022-06-27 Thread GitBox
andygrove merged PR #2804: URL: https://github.com/apache/arrow-datafusion/pull/2804 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[GitHub] [arrow-rs] HaoYang670 commented on pull request #1951: Add add_dyn for DictionaryArray support

2022-06-27 Thread GitBox
HaoYang670 commented on PR #1951: URL: https://github.com/apache/arrow-rs/pull/1951#issuecomment-1168295794 > I am surprised at the amount of code required for this but I tried and couldn't really figure out any better way. There are lots of places in our code where we map `DataType`

[GitHub] [arrow-rs] tustvold merged pull request #1947: write columnmetadata to the behind of the column chunk data, not the ColumnChunk

2022-06-27 Thread GitBox
tustvold merged PR #1947: URL: https://github.com/apache/arrow-rs/pull/1947 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apa

[GitHub] [arrow-rs] tustvold closed issue #1946: Write error ColumnChunk to the Parquet File instead of ColumnMetaData

2022-06-27 Thread GitBox
tustvold closed issue #1946: Write error ColumnChunk to the Parquet File instead of ColumnMetaData URL: https://github.com/apache/arrow-rs/issues/1946 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow-rs] liukun4515 commented on a diff in pull request #1935: [WIP] add column index writer for parquet

2022-06-27 Thread GitBox
liukun4515 commented on code in PR #1935: URL: https://github.com/apache/arrow-rs/pull/1935#discussion_r908068776 ## parquet/src/file/writer.rs: ## @@ -339,11 +400,11 @@ impl<'a, W: Write> SerializedRowGroupWriter<'a, W> { .set_num_rows(self.total_rows_written.u

[GitHub] [arrow-rs] liukun4515 commented on a diff in pull request #1935: [WIP] add column index writer for parquet

2022-06-27 Thread GitBox
liukun4515 commented on code in PR #1935: URL: https://github.com/apache/arrow-rs/pull/1935#discussion_r908068776 ## parquet/src/file/writer.rs: ## @@ -339,11 +400,11 @@ impl<'a, W: Write> SerializedRowGroupWriter<'a, W> { .set_num_rows(self.total_rows_written.u

[GitHub] [arrow-rs] liukun4515 commented on a diff in pull request #1935: [WIP] add column index writer for parquet

2022-06-27 Thread GitBox
liukun4515 commented on code in PR #1935: URL: https://github.com/apache/arrow-rs/pull/1935#discussion_r908068424 ## parquet/src/file/writer.rs: ## @@ -339,11 +400,11 @@ impl<'a, W: Write> SerializedRowGroupWriter<'a, W> { .set_num_rows(self.total_rows_written.u

[GitHub] [arrow-rs] liukun4515 commented on a diff in pull request #1935: [WIP] add column index writer for parquet

2022-06-27 Thread GitBox
liukun4515 commented on code in PR #1935: URL: https://github.com/apache/arrow-rs/pull/1935#discussion_r908066758 ## parquet/src/file/writer.rs: ## @@ -85,7 +86,7 @@ pub type OnCloseColumnChunk<'a> = /// Callback invoked on closing a row group, arguments are: /// /// - the ro

[GitHub] [arrow-rs] Ted-Jiang closed issue #1834: Sperate get_next_page_header from get_next_page in PageReader

2022-06-27 Thread GitBox
Ted-Jiang closed issue #1834: Sperate get_next_page_header from get_next_page in PageReader URL: https://github.com/apache/arrow-rs/issues/1834 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [arrow-rs] Ted-Jiang commented on issue #1834: Sperate get_next_page_header from get_next_page in PageReader

2022-06-27 Thread GitBox
Ted-Jiang commented on issue #1834: URL: https://github.com/apache/arrow-rs/issues/1834#issuecomment-1168260387 @tustvold Thanks a lot, i miss the dictionary pate`MUST be the first one in the column chunk` part. Last two weeks, i busy about my personal things. I will go back and work

[GitHub] [arrow] kou commented on pull request #12914: ARROW-2034: [C++] Filesystem implementation for Azure Blob Storage

2022-06-27 Thread GitBox
kou commented on PR #12914: URL: https://github.com/apache/arrow/pull/12914#issuecomment-1168235450 > Keeping `ARROW_AZURE` : `OFF` in `windows-mingw` build for now OK. But could you report this problem to upstream to enable on MinGW in the future? -- This is an automated message f

[GitHub] [arrow] kou commented on a diff in pull request #12914: ARROW-2034: [C++] Filesystem implementation for Azure Blob Storage

2022-06-27 Thread GitBox
kou commented on code in PR #12914: URL: https://github.com/apache/arrow/pull/12914#discussion_r908040345 ## ci/scripts/install_azurite.sh: ## @@ -0,0 +1,45 @@ +#!/usr/bin/env bash +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license ag

[GitHub] [arrow] github-actions[bot] commented on pull request #13444: ARROW-16906: [CI][C++] Enable ARROW_GCS on MinGW workflows

2022-06-27 Thread GitBox
github-actions[bot] commented on PR #13444: URL: https://github.com/apache/arrow/pull/13444#issuecomment-1168224090 :warning: Ticket **has not been started in JIRA**, please click 'Start Progress'. -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [arrow] github-actions[bot] commented on pull request #13444: ARROW-16906: [CI][C++] Enable ARROW_GCS on MinGW workflows

2022-06-27 Thread GitBox
github-actions[bot] commented on PR #13444: URL: https://github.com/apache/arrow/pull/13444#issuecomment-1168224074 https://issues.apache.org/jira/browse/ARROW-16906 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow-ballista] dependabot[bot] opened a new pull request, #78: Update arrow requirement from 16.0.0 to 17.0.0

2022-06-27 Thread GitBox
dependabot[bot] opened a new pull request, #78: URL: https://github.com/apache/arrow-ballista/pull/78 Updates the requirements on [arrow](https://github.com/apache/arrow-rs) to permit the latest version. Changelog Sourced from https://github.com/apache/arrow-rs/blob/master/CHANGELO

[GitHub] [arrow-ballista] dependabot[bot] commented on pull request #78: Update arrow requirement from 16.0.0 to 17.0.0

2022-06-27 Thread GitBox
dependabot[bot] commented on PR #78: URL: https://github.com/apache/arrow-ballista/pull/78#issuecomment-1168218761 The following labels could not be found: `auto-dependencies`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[GitHub] [arrow-ballista] dependabot[bot] commented on pull request #77: Update arrow-flight requirement from 16.0.0 to 17.0.0

2022-06-27 Thread GitBox
dependabot[bot] commented on PR #77: URL: https://github.com/apache/arrow-ballista/pull/77#issuecomment-1168218697 The following labels could not be found: `auto-dependencies`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[GitHub] [arrow-ballista] dependabot[bot] opened a new pull request, #77: Update arrow-flight requirement from 16.0.0 to 17.0.0

2022-06-27 Thread GitBox
dependabot[bot] opened a new pull request, #77: URL: https://github.com/apache/arrow-ballista/pull/77 Updates the requirements on [arrow-flight](https://github.com/apache/arrow-rs) to permit the latest version. Changelog Sourced from https://github.com/apache/arrow-rs/blob/master/C

[GitHub] [arrow] kou commented on a diff in pull request #13311: ARROW-16340: [Python] Move all Python related code into PyArrow

2022-06-27 Thread GitBox
kou commented on code in PR #13311: URL: https://github.com/apache/arrow/pull/13311#discussion_r908027466 ## python/pyarrow/src_arrow/CMakeLists.txt: ## @@ -19,8 +19,45 @@ # arrow_python # +cmake_minimum_required(VERSION 3.5) + +# RPATH settings on macOS do not affect instal

[GitHub] [arrow] djnavarro commented on a diff in pull request #12154: ARROW-14821: [R] Implement bindings for lubridate's floor_date, ceiling_date, and round_date

2022-06-27 Thread GitBox
djnavarro commented on code in PR #12154: URL: https://github.com/apache/arrow/pull/12154#discussion_r908006731 ## r/tests/testthat/test-dplyr-funcs-datetime.R: ## @@ -1965,3 +1973,421 @@ test_that("lubridate's fast_strptime", { collect() ) }) + +test_that("round/floo

[GitHub] [arrow-rs] liukun4515 merged pull request #1942: Disallow cast from other datatypes to NullType

2022-06-27 Thread GitBox
liukun4515 merged PR #1942: URL: https://github.com/apache/arrow-rs/pull/1942 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.a

[GitHub] [arrow-rs] liukun4515 closed issue #1923: Disallow cast from other datatypes to NullType

2022-06-27 Thread GitBox
liukun4515 closed issue #1923: Disallow cast from other datatypes to NullType URL: https://github.com/apache/arrow-rs/issues/1923 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [arrow-datafusion] liukun4515 commented on issue #2799: coercion rule about `eq` and InList between string type and numeric type

2022-06-27 Thread GitBox
liukun4515 commented on issue #2799: URL: https://github.com/apache/arrow-datafusion/issues/2799#issuecomment-1168189491 > I wonder if the question here is "should we automatically try and coerce numbers to strings (which is more general but slower) or coerce strings to numbers (which is l

[GitHub] [arrow] djnavarro commented on a diff in pull request #12154: ARROW-14821: [R] Implement bindings for lubridate's floor_date, ceiling_date, and round_date

2022-06-27 Thread GitBox
djnavarro commented on code in PR #12154: URL: https://github.com/apache/arrow/pull/12154#discussion_r908004741 ## r/src/compute.cpp: ## @@ -519,6 +519,36 @@ std::shared_ptr make_compute_options( return out; } + if (func_name == "round_temporal" || func_name == "floo

[GitHub] [arrow-rs] liukun4515 commented on pull request #1947: write columnmetadata to the behind of the column chunk data, not the ColumnChunk

2022-06-27 Thread GitBox
liukun4515 commented on PR #1947: URL: https://github.com/apache/arrow-rs/pull/1947#issuecomment-1168186233 Many system or reader just read the footer and get the metadata, I think we should just follow the parquet-format. Maybe it's just historical issues or historical design -- This

[GitHub] [arrow-rs] liukun4515 commented on a diff in pull request #1947: write columnmetadata to the behind of the column chunk data, not the ColumnChunk

2022-06-27 Thread GitBox
liukun4515 commented on code in PR #1947: URL: https://github.com/apache/arrow-rs/pull/1947#discussion_r907999690 ## parquet/src/file/writer.rs: ## @@ -435,12 +435,15 @@ impl<'a, W: Write> SerializedPageWriter<'a, W> { Ok(self.sink.bytes_written() - start_pos) }

[GitHub] [arrow-rs] liukun4515 commented on a diff in pull request #1947: write columnmetadata to the behind of the column chunk data, not the ColumnChunk

2022-06-27 Thread GitBox
liukun4515 commented on code in PR #1947: URL: https://github.com/apache/arrow-rs/pull/1947#discussion_r907997553 ## parquet/src/file/metadata.rs: ## @@ -611,6 +611,29 @@ impl ColumnChunkMetaData { encrypted_column_metadata: None, } } + +/// Method

[GitHub] [arrow] kou merged pull request #13386: ARROW-13388 [C++][Parquet] Enable DELTA_LENGTH_BYTE_ARRAY decoder

2022-06-27 Thread GitBox
kou merged PR #13386: URL: https://github.com/apache/arrow/pull/13386 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

[GitHub] [arrow] kou commented on pull request #13386: ARROW-13388 [C++][Parquet] Enable DELTA_LENGTH_BYTE_ARRAY decoder

2022-06-27 Thread GitBox
kou commented on PR #13386: URL: https://github.com/apache/arrow/pull/13386#issuecomment-1168179312 Green. I'll merge this. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[GitHub] [arrow-rs] liukun4515 commented on pull request #1947: write columnmetadata to the behind of the column chunk data, not the ColumnChunk

2022-06-27 Thread GitBox
liukun4515 commented on PR #1947: URL: https://github.com/apache/arrow-rs/pull/1947#issuecomment-1168179198 > I can't help wondering if this was an oversight in the original parquet specification, not collocating column chunk metadata in the footer, that has since been papered over. All rea

[GitHub] [arrow] kou commented on pull request #13386: ARROW-13388 [C++][Parquet] Enable DELTA_LENGTH_BYTE_ARRAY decoder

2022-06-27 Thread GitBox
kou commented on PR #13386: URL: https://github.com/apache/arrow/pull/13386#issuecomment-1168164915 I'll re-run it. Please wait for a while. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [arrow] sfc-gh-mmuthuraman commented on pull request #13386: ARROW-13388 [C++][Parquet] Enable DELTA_LENGTH_BYTE_ARRAY decoder

2022-06-27 Thread GitBox
sfc-gh-mmuthuraman commented on PR #13386: URL: https://github.com/apache/arrow/pull/13386#issuecomment-1168149487 @kou / @pitrou I see a failure in "_C++ / AMD64 Windows 2019 C++17_" irrelevant to my changes. `[ RUN ] TestThreadPool.SubmitWithStopTokenCancelled D:/a/arrow/arrow/

[GitHub] [arrow-rs] HaoYang670 commented on pull request #1951: Add add_dyn for DictionaryArray support

2022-06-27 Thread GitBox
HaoYang670 commented on PR #1951: URL: https://github.com/apache/arrow-rs/pull/1951#issuecomment-1168143352 > I am surprised at the amount of code required for this but I tried and couldn't really figure out any better way (as the macros are basically implementing all the type dispatch to t

[GitHub] [arrow] ursabot commented on pull request #13437: ARROW-16872: [C++] Fix csv parser edge case

2022-06-27 Thread GitBox
ursabot commented on PR #13437: URL: https://github.com/apache/arrow/pull/13437#issuecomment-1168142919 Benchmark runs are scheduled for baseline = 49f26962456e11902621e41574bc5890205eac7a and contender = ad15fe1a7087ab7dc50d63069d1ad828e138f539. Results will be available as each benchmark

[GitHub] [arrow] cyb70289 commented on pull request #13437: ARROW-16872: [C++] Fix csv parser edge case

2022-06-27 Thread GitBox
cyb70289 commented on PR #13437: URL: https://github.com/apache/arrow/pull/13437#issuecomment-1168142905 @ursabot please benchmark command=cpp-micro --suite-filter="arrow-csv-*" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [arrow-datafusion] ming535 commented on issue #2723: Consolidate GroupByHash implementations `row_hash.rs` and `hash.rs`

2022-06-27 Thread GitBox
ming535 commented on issue #2723: URL: https://github.com/apache/arrow-datafusion/issues/2723#issuecomment-1168142381 @yjshen Hi, do you know why `Distinct*`, for example `DistinctCount`'s `state_fields` is using `DataType::List`? I have ran a few examples and all of them actually using `L

[GitHub] [arrow] nealrichardson commented on a diff in pull request #13441: ARROW-16912: [R][CI] Fix nightly centos package without GCS

2022-06-27 Thread GitBox
nealrichardson commented on code in PR #13441: URL: https://github.com/apache/arrow/pull/13441#discussion_r907940234 ## docker-compose.yml: ## @@ -410,9 +410,9 @@ services: ARROW_MIMALLOC: "ON" command: > /bin/bash -c " -if grep -q -i -e 'centos.* 7' /

[GitHub] [arrow] nealrichardson commented on pull request #13441: ARROW-16912: [R][CI] Fix nightly centos package without GCS

2022-06-27 Thread GitBox
nealrichardson commented on PR #13441: URL: https://github.com/apache/arrow/pull/13441#issuecomment-1168109321 > 👎 In regards to adding `r-binary-packages` to the r-group. I think we should take a closer look at the PR CI `r.yml` and see if we want to change it to be more in-line with `

[GitHub] [arrow-rs] alamb commented on a diff in pull request #1951: Add add_dyn for DictionaryArray support

2022-06-27 Thread GitBox
alamb commented on code in PR #1951: URL: https://github.com/apache/arrow-rs/pull/1951#discussion_r907917971 ## arrow/src/compute/kernels/arithmetic.rs: ## @@ -423,6 +429,247 @@ where Ok(PrimitiveArrayfrom(data)) } +/// Applies $OP to $LEFT and $RIGHT which are two d

[GitHub] [arrow-datafusion] alamb opened a new pull request, #2804: fix schema nullability for `information_schema` schema

2022-06-27 Thread GitBox
alamb opened a new pull request, #2804: URL: https://github.com/apache/arrow-datafusion/pull/2804 # Which issue does this PR close? Part of https://github.com/apache/arrow-datafusion/pull/2778 # Rationale for this change https://github.com/apache/arrow-rs/issues/1888 (ad

[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #2803: fix: correctly calculate join output schema nullability

2022-06-27 Thread GitBox
codecov-commenter commented on PR #2803: URL: https://github.com/apache/arrow-datafusion/pull/2803#issuecomment-1168070199 # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/2803?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_

[GitHub] [arrow-datafusion] comphead commented on issue #2450: Preserve Element Name in ScalarValue::List

2022-06-27 Thread GitBox
comphead commented on issue #2450: URL: https://github.com/apache/arrow-datafusion/issues/2450#issuecomment-1168068480 > The latter, you will need to both specify the name of the lists element, and its nullability Thanks @tustvold for the quick reply. I was confused because we have t

[GitHub] [arrow] kou merged pull request #13443: MINOR: [Docs][Python] Remove outdated reference to libhdfs3 backend

2022-06-27 Thread GitBox
kou merged PR #13443: URL: https://github.com/apache/arrow/pull/13443 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

[GitHub] [arrow] kou commented on a diff in pull request #13440: ARROW-14819: [R] Binding for lubridate::qday

2022-06-27 Thread GitBox
kou commented on code in PR #13440: URL: https://github.com/apache/arrow/pull/13440#discussion_r907904202 ## r/NEWS.md: ## @@ -17,6 +17,12 @@ under the License. --> +# arrow 9.0.0 + +## Enhancements to date and time support + +* added `lubridate::qday()` (day of quarter)

[GitHub] [arrow] paleolimbot commented on pull request #13440: ARROW-14819: [R] Binding for lubridate::qday

2022-06-27 Thread GitBox
paleolimbot commented on PR #13440: URL: https://github.com/apache/arrow/pull/13440#issuecomment-1168049310 (I'm on vacation this week but look forward to taking a look on Monday!) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [arrow] kou merged pull request #13439: MINOR: [C++][Dev] Remove unused Hive flag

2022-06-27 Thread GitBox
kou merged PR #13439: URL: https://github.com/apache/arrow/pull/13439 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

[GitHub] [arrow] kou commented on pull request #13439: MINOR: [C++][Dev] Remove unused Hive flag

2022-06-27 Thread GitBox
kou commented on PR #13439: URL: https://github.com/apache/arrow/pull/13439#issuecomment-1168044085 Ah, it's already removed from docs. Sorry. I'll merge this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow-datafusion] alamb opened a new pull request, #2803: fix: correctly calculate join output schema nullability

2022-06-27 Thread GitBox
alamb opened a new pull request, #2803: URL: https://github.com/apache/arrow-datafusion/pull/2803 # Which issue does this PR close? Part of https://github.com/apache/arrow-datafusion/pull/2778 # Rationale for this change https://github.com/apache/arrow-rs/issues/1888

[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #2802: Correct schema nullability declaration in tests

2022-06-27 Thread GitBox
codecov-commenter commented on PR #2802: URL: https://github.com/apache/arrow-datafusion/pull/2802#issuecomment-1168039981 # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/2802?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_

[GitHub] [arrow-datafusion] alamb merged pull request #2750: Add optimizer pass to reduce `left`/`right`/`full` joins to `inner` join if possible

2022-06-27 Thread GitBox
alamb merged PR #2750: URL: https://github.com/apache/arrow-datafusion/pull/2750 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

[GitHub] [arrow-datafusion] alamb closed issue #2757: Reduce outer joins

2022-06-27 Thread GitBox
alamb closed issue #2757: Reduce outer joins URL: https://github.com/apache/arrow-datafusion/issues/2757 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [arrow] sfc-gh-mmuthuraman commented on pull request #13386: ARROW-13388 [C++][Parquet] Enable DELTA_LENGTH_BYTE_ARRAY decoder

2022-06-27 Thread GitBox
sfc-gh-mmuthuraman commented on PR #13386: URL: https://github.com/apache/arrow/pull/13386#issuecomment-1168022627 @pitrou / @kou Could you please approve the pending workflows? I just rebased my changes. Thanks! -- This is an automated message from the Apache Git Service. To respond to t

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2802: Correct schema nullability declaration in tests

2022-06-27 Thread GitBox
alamb commented on code in PR #2802: URL: https://github.com/apache/arrow-datafusion/pull/2802#discussion_r907870656 ## datafusion/core/src/physical_optimizer/aggregate_statistics.rs: ## @@ -276,8 +276,8 @@ mod tests { /// Mock data using a MemoryExec which has an exact cou

[GitHub] [arrow-datafusion] alamb opened a new pull request, #2802: Correct schema nullability declaration in tests

2022-06-27 Thread GitBox
alamb opened a new pull request, #2802: URL: https://github.com/apache/arrow-datafusion/pull/2802 # Which issue does this PR close? Part of https://github.com/apache/arrow-datafusion/pull/2778 # Rationale for this change https://github.com/apache/arrow-rs/issues/1888 (ad

[GitHub] [arrow-datafusion] alamb commented on pull request #2750: try to reduce left/right/full join to inner join

2022-06-27 Thread GitBox
alamb commented on PR #2750: URL: https://github.com/apache/arrow-datafusion/pull/2750#issuecomment-1168005581 There is a logical conflict in this PR with https://github.com/apache/arrow-datafusion/pull/2789 I took the liberty of fixing the conflicts in d0f1f8365 -- This is an

[GitHub] [arrow-datafusion] alamb commented on pull request #2792: Add LogicalPlan::Distinct

2022-06-27 Thread GitBox
alamb commented on PR #2792: URL: https://github.com/apache/arrow-datafusion/pull/2792#issuecomment-1167990705 cc @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [arrow-datafusion] alamb merged pull request #2789: Improve readability of table scan projections in query plans (remove `Some` and `None`)

2022-06-27 Thread GitBox
alamb merged PR #2789: URL: https://github.com/apache/arrow-datafusion/pull/2789 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

[GitHub] [arrow-datafusion] alamb closed issue #2697: Improve readability of table scan projections in query plans

2022-06-27 Thread GitBox
alamb closed issue #2697: Improve readability of table scan projections in query plans URL: https://github.com/apache/arrow-datafusion/issues/2697 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] jduo commented on pull request #13434: ARROW-16902: [C++][FlightRPC] Fix DLL linkage in Flight SQL

2022-06-27 Thread GitBox
jduo commented on PR #13434: URL: https://github.com/apache/arrow/pull/13434#issuecomment-1167986129 > Protobuf does not interact very well with dllexport declarations (seemingly on purpose/the team considers it a bad idea: https://groups.google.com/g/protobuf/c/PDR1bqRazts) so hopefully th

[GitHub] [arrow-rs] alamb opened a new issue, #1952: Release Arrow XXX (next release after 16.0.0)

2022-06-27 Thread GitBox
alamb opened a new issue, #1952: URL: https://github.com/apache/arrow-rs/issues/1952 * Planned Release Candidate: 2022-7-28 * Planned Release and Publish to crates.io: 2022-07-11 Items: - [ ] Update version and make CHANGELOG - [ ] Create release candidate - [ ] Release can

[GitHub] [arrow-rs] alamb commented on issue #1925: Release Arrow 17.0.0 (next release after 16.0.0)

2022-06-27 Thread GitBox
alamb commented on issue #1925: URL: https://github.com/apache/arrow-rs/issues/1925#issuecomment-1167975393 https://lists.apache.org/thread/c1rfx2hyfxrosv43ypg57s6bqxq4pj2d -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[GitHub] [arrow-rs] alamb closed issue #1925: Release Arrow 17.0.0 (next release after 16.0.0)

2022-06-27 Thread GitBox
alamb closed issue #1925: Release Arrow 17.0.0 (next release after 16.0.0) URL: https://github.com/apache/arrow-rs/issues/1925 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [arrow-rs] alamb commented on issue #1925: Release Arrow 17.0.0 (next release after 16.0.0)

2022-06-27 Thread GitBox
alamb commented on issue #1925: URL: https://github.com/apache/arrow-rs/issues/1925#issuecomment-1167975034 Release is complete -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[GitHub] [arrow-rs] tustvold commented on pull request #1937: Set adjusted to UTC if UTC timezone (#1932)

2022-06-27 Thread GitBox
tustvold commented on PR #1937: URL: https://github.com/apache/arrow-rs/pull/1937#issuecomment-1167959064 I think you might be right, will re-read tomorrow and potentially file a PR. Thank you -- This is an automated message from the Apache Git Service. To respond to the message, please l

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #1861: Faster StringDictionaryBuilder (~60% faster) (#1851)

2022-06-27 Thread GitBox
tustvold commented on code in PR #1861: URL: https://github.com/apache/arrow-rs/pull/1861#discussion_r907830729 ## arrow/Cargo.toml: ## @@ -38,13 +38,15 @@ path = "src/lib.rs" bench = false [dependencies] +ahash = { version = "0.7", default-features = false } Review Comment

[GitHub] [arrow-rs] alamb commented on pull request #1929: Update indexmap dependency

2022-06-27 Thread GitBox
alamb commented on PR #1929: URL: https://github.com/apache/arrow-rs/pull/1929#issuecomment-1167950741 Apparently this caused some issue with `ahash` for @jhorstmann at https://lists.apache.org/thread/qxt3qv5pv8tfy4cj6jgp8gl90k3rr562 -- This is an automated message from the Apache Git Ser

[GitHub] [arrow-rs] alamb commented on issue #1882: Remove `indexmap` dependency

2022-06-27 Thread GitBox
alamb commented on issue #1882: URL: https://github.com/apache/arrow-rs/issues/1882#issuecomment-1167950378 @jhorstmann notes he may try this issue -- more context on https://lists.apache.org/thread/qxt3qv5pv8tfy4cj6jgp8gl90k3rr562 / https://github.com/apache/arrow-rs/pull/1929 -- This

[GitHub] [arrow] saulpw commented on pull request #13442: ARROW-9612: [IO] increase default block_size from 1MB to 16MB

2022-06-27 Thread GitBox
saulpw commented on PR #13442: URL: https://github.com/apache/arrow/pull/13442#issuecomment-1167949244 Thanks Weston. I agree that handling large blobs without needing to muck with the block size would be ideal. We can/should still do that at some point, and this PR just makes the experie

[GitHub] [arrow-datafusion] tustvold commented on issue #2709: Updating arrow2 branch

2022-06-27 Thread GitBox
tustvold commented on issue #2709: URL: https://github.com/apache/arrow-datafusion/issues/2709#issuecomment-1167943585 > AFAIK governance has not been a factor when considering dependencies in DataFusion To conflate arrow which is core to both the in-memory layout and query computat

[GitHub] [arrow] sfc-gh-mmuthuraman commented on pull request #13386: ARROW-13388 [C++][Parquet] Enable DELTA_LENGTH_BYTE_ARRAY decoder

2022-06-27 Thread GitBox
sfc-gh-mmuthuraman commented on PR #13386: URL: https://github.com/apache/arrow/pull/13386#issuecomment-1167943293 @kou Could you please approve the workflows? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[GitHub] [arrow-rs] jorgecarleitao commented on pull request #1937: Set adjusted to UTC if UTC timezone (#1932)

2022-06-27 Thread GitBox
jorgecarleitao commented on PR #1937: URL: https://github.com/apache/arrow-rs/pull/1937#issuecomment-1167941057 While applying this fix in arrow2, I broke some of our integration tests against pyarrow. Coming to the specs, when the tz string is set in Arrow, it means that 1. the valu

[GitHub] [arrow-rs] martin-g commented on a diff in pull request #1951: Add add_dyn for DictionaryArray support

2022-06-27 Thread GitBox
martin-g commented on code in PR #1951: URL: https://github.com/apache/arrow-rs/pull/1951#discussion_r907817838 ## arrow/src/compute/kernels/arithmetic.rs: ## @@ -423,6 +429,245 @@ where Ok(PrimitiveArrayfrom(data)) } +/// Applies $OP to $LEFT and $RIGHT which are tw

[GitHub] [arrow-datafusion] tustvold commented on pull request #2677: Switch to object_store crate (#2489)

2022-06-27 Thread GitBox
tustvold commented on PR #2677: URL: https://github.com/apache/arrow-datafusion/pull/2677#issuecomment-1167926271 > Are we planning to donate it to apache arrow (to be included in the arrow-datafusion repo)? I personally would be fine with it being part of arrow-datafusion or arrow-

[GitHub] [arrow-datafusion] alamb commented on issue #957: Inconsistent cast behavior

2022-06-27 Thread GitBox
alamb commented on issue #957: URL: https://github.com/apache/arrow-datafusion/issues/957#issuecomment-1167925300 I defer to @andygrove -- the current behavior (that matches postgres) makes sense to me, for what it is worth -- This is an automated message from the Apache Git Service. To

[GitHub] [arrow-rs] codecov-commenter commented on pull request #1951: Add add_dyn for DictionaryArray support

2022-06-27 Thread GitBox
codecov-commenter commented on PR #1951: URL: https://github.com/apache/arrow-rs/pull/1951#issuecomment-1167922450 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1951?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+S

[GitHub] [arrow] westonpace commented on a diff in pull request #13375: ARROW-16823: [C++] Arrow Substrait enhancements for UDF

2022-06-27 Thread GitBox
westonpace commented on code in PR #13375: URL: https://github.com/apache/arrow/pull/13375#discussion_r907807493 ## cpp/src/arrow/engine/substrait/serde.h: ## @@ -40,22 +41,81 @@ using ConsumerFactory = std::function /// \brief Deserializes a Substrait Plan message to a list

[GitHub] [arrow] westonpace commented on a diff in pull request #13375: ARROW-16823: [C++] Arrow Substrait enhancements for UDF

2022-06-27 Thread GitBox
westonpace commented on code in PR #13375: URL: https://github.com/apache/arrow/pull/13375#discussion_r907806229 ## cpp/src/arrow/engine/substrait/serde.cc: ## @@ -58,12 +58,57 @@ Result DeserializeRelation(const Buffer& buf, return FromProto(rel, ext_set); } -Result> Des

[GitHub] [arrow-rs] Dandandan commented on a diff in pull request #1861: Faster StringDictionaryBuilder (~60% faster) (#1851)

2022-06-27 Thread GitBox
Dandandan commented on code in PR #1861: URL: https://github.com/apache/arrow-rs/pull/1861#discussion_r907804661 ## arrow/Cargo.toml: ## @@ -38,13 +38,15 @@ path = "src/lib.rs" bench = false [dependencies] +ahash = { version = "0.7", default-features = false } Review Commen

[GitHub] [arrow] westonpace commented on a diff in pull request #13375: ARROW-16823: [C++] Arrow Substrait enhancements for UDF

2022-06-27 Thread GitBox
westonpace commented on code in PR #13375: URL: https://github.com/apache/arrow/pull/13375#discussion_r907801495 ## cpp/src/arrow/engine/substrait/extension_set.cc: ## @@ -315,6 +315,11 @@ struct ExtensionIdRegistryImpl : ExtensionIdRegistry { return Status::OK(); } +

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #1861: Faster StringDictionaryBuilder (~60% faster) (#1851)

2022-06-27 Thread GitBox
tustvold commented on code in PR #1861: URL: https://github.com/apache/arrow-rs/pull/1861#discussion_r907799497 ## arrow/Cargo.toml: ## @@ -38,13 +38,15 @@ path = "src/lib.rs" bench = false [dependencies] +ahash = { version = "0.7", default-features = false } Review Comment

[GitHub] [arrow-rs] alamb commented on a diff in pull request #1861: Faster StringDictionaryBuilder (~60% faster) (#1851)

2022-06-27 Thread GitBox
alamb commented on code in PR #1861: URL: https://github.com/apache/arrow-rs/pull/1861#discussion_r907798318 ## arrow/Cargo.toml: ## @@ -38,13 +38,15 @@ path = "src/lib.rs" bench = false [dependencies] +ahash = { version = "0.7", default-features = false } Review Comment:

[GitHub] [arrow-rs] viirya opened a new pull request, #1951: Add add_dyn for DictionaryArray support

2022-06-27 Thread GitBox
viirya opened a new pull request, #1951: URL: https://github.com/apache/arrow-rs/pull/1951 # Which issue does this PR close? Closes #1950. # Rationale for this change # What changes are included in this PR? # Are there any user-facing chan

[GitHub] [arrow-rs] viirya opened a new issue, #1950: Support DictionaryArray in add kernel

2022-06-27 Thread GitBox
viirya opened a new issue, #1950: URL: https://github.com/apache/arrow-rs/issues/1950 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Adding DictionaryArray support to `add` kernel. **Describe the solution you'd like**

[GitHub] [arrow-datafusion] alamb commented on pull request #2677: Switch to object_store crate (#2489)

2022-06-27 Thread GitBox
alamb commented on PR #2677: URL: https://github.com/apache/arrow-datafusion/pull/2677#issuecomment-1167893169 Inspired by @jorgecarleitao 's comment on https://github.com/apache/arrow-datafusion/issues/2709#issuecomment-1167334326 Given how core the object store abstraction is to da

[GitHub] [arrow] westonpace commented on a diff in pull request #13375: ARROW-16823: [C++] Arrow Substrait enhancements for UDF

2022-06-27 Thread GitBox
westonpace commented on code in PR #13375: URL: https://github.com/apache/arrow/pull/13375#discussion_r907788223 ## cpp/src/arrow/compute/exec/options.h: ## @@ -229,6 +229,20 @@ class ARROW_EXPORT SinkNodeConsumer { virtual Future<> Finish() = 0; }; +class ARROW_EXPORT Nul

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2797: Add support for month & year intervals

2022-06-27 Thread GitBox
alamb commented on code in PR #2797: URL: https://github.com/apache/arrow-datafusion/pull/2797#discussion_r907787029 ## datafusion/physical-expr/src/expressions/datetime.rs: ## @@ -86,76 +89,114 @@ impl PhysicalExpr for DateIntervalExpr { let dates = self.lhs.evaluate(b

[GitHub] [arrow] westonpace commented on pull request #13442: ARROW-9612: [IO] increase default block_size from 1MB to 16MB

2022-06-27 Thread GitBox
westonpace commented on PR #13442: URL: https://github.com/apache/arrow/pull/13442#issuecomment-1167878337 It's configurable so presumable users with large blocks could always configure it larger. This change is simple enough and I don't *think* this would have much impact on performance b

[GitHub] [arrow-ballista] yacineb closed issue #76: Unable to build master

2022-06-27 Thread GitBox
yacineb closed issue #76: Unable to build master URL: https://github.com/apache/arrow-ballista/issues/76 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [arrow-ballista] yacineb commented on issue #76: Unable to build master

2022-06-27 Thread GitBox
yacineb commented on issue #76: URL: https://github.com/apache/arrow-ballista/issues/76#issuecomment-1167857790 done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #2797: Add support for month & year intervals

2022-06-27 Thread GitBox
avantgardnerio commented on code in PR #2797: URL: https://github.com/apache/arrow-datafusion/pull/2797#discussion_r907767853 ## datafusion/physical-expr/src/expressions/datetime.rs: ## @@ -86,76 +89,114 @@ impl PhysicalExpr for DateIntervalExpr { let dates = self.lhs.e

[GitHub] [arrow] rtpsw commented on a diff in pull request #13375: ARROW-16823: [C++] Arrow Substrait enhancements for UDF

2022-06-27 Thread GitBox
rtpsw commented on code in PR #13375: URL: https://github.com/apache/arrow/pull/13375#discussion_r907730758 ## cpp/src/arrow/engine/substrait/serde.h: ## @@ -40,22 +41,81 @@ using ConsumerFactory = std::function /// \brief Deserializes a Substrait Plan message to a list of E

[GitHub] [arrow] rtpsw commented on a diff in pull request #13375: ARROW-16823: [C++] Arrow Substrait enhancements for UDF

2022-06-27 Thread GitBox
rtpsw commented on code in PR #13375: URL: https://github.com/apache/arrow/pull/13375#discussion_r907744244 ## cpp/src/arrow/compute/exec/options.h: ## @@ -229,6 +229,20 @@ class ARROW_EXPORT SinkNodeConsumer { virtual Future<> Finish() = 0; }; +class ARROW_EXPORT NullSink

[GitHub] [arrow] lidavidm opened a new pull request, #13443: MINOR: [Docs][Python] Remove outdated reference to libhdfs3 backend

2022-06-27 Thread GitBox
lidavidm opened a new pull request, #13443: URL: https://github.com/apache/arrow/pull/13443 While the (deprecated) docs give an example of using libhdfs3 instead of the JNI interface, this actually hasn't existed in a long time. -- This is an automated message from the Apache Git Service.

[GitHub] [arrow] rtpsw commented on a diff in pull request #13375: ARROW-16823: [C++] Arrow Substrait enhancements for UDF

2022-06-27 Thread GitBox
rtpsw commented on code in PR #13375: URL: https://github.com/apache/arrow/pull/13375#discussion_r907736969 ## cpp/src/arrow/engine/substrait/serde.cc: ## @@ -58,12 +58,57 @@ Result DeserializeRelation(const Buffer& buf, return FromProto(rel, ext_set); } -Result> Deserial

[GitHub] [arrow] rtpsw commented on a diff in pull request #13375: ARROW-16823: [C++] Arrow Substrait enhancements for UDF

2022-06-27 Thread GitBox
rtpsw commented on code in PR #13375: URL: https://github.com/apache/arrow/pull/13375#discussion_r907730758 ## cpp/src/arrow/engine/substrait/serde.h: ## @@ -40,22 +41,81 @@ using ConsumerFactory = std::function /// \brief Deserializes a Substrait Plan message to a list of E

[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #2797: Add support for month & year intervals

2022-06-27 Thread GitBox
avantgardnerio commented on code in PR #2797: URL: https://github.com/apache/arrow-datafusion/pull/2797#discussion_r907722613 ## datafusion/physical-expr/src/expressions/datetime.rs: ## @@ -86,76 +89,114 @@ impl PhysicalExpr for DateIntervalExpr { let dates = self.lhs.e

[GitHub] [arrow] rtpsw commented on a diff in pull request #13375: ARROW-16823: [C++] Arrow Substrait enhancements for UDF

2022-06-27 Thread GitBox
rtpsw commented on code in PR #13375: URL: https://github.com/apache/arrow/pull/13375#discussion_r907719221 ## cpp/src/arrow/engine/substrait/extension_set.h: ## @@ -19,6 +19,7 @@ #pragma once +#include Review Comment: Probably a leftover; I'll check. -- This is an

  1   2   >