[GitHub] [arrow-rs] nevi-me closed issue #1606: Parquet schema should allow scale == precision for decimal type

2022-04-22 Thread GitBox
nevi-me closed issue #1606: Parquet schema should allow scale == precision for decimal type URL: https://github.com/apache/arrow-rs/issues/1606 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [arrow-rs] nevi-me merged pull request #1607: Parquet: schema validation should allow scale == precision for decimal type

2022-04-22 Thread GitBox
nevi-me merged PR #1607: URL: https://github.com/apache/arrow-rs/pull/1607 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apac

[GitHub] [arrow] sanjibansg commented on a diff in pull request #12701: ARROW-15892: [C++] Dataset APIs require s3:ListBucket Permissions

2022-04-22 Thread GitBox
sanjibansg commented on code in PR #12701: URL: https://github.com/apache/arrow/pull/12701#discussion_r856858101 ## python/pyarrow/tests/test_dataset.py: ## @@ -4286,6 +4286,66 @@ def test_write_dataset_s3(s3_example_simple): assert result.equals(table) +_minio_put_only

[GitHub] [arrow] ursabot commented on pull request #12894: ARROW-14911: [C++] arrow-compute-hash-join-node-test failed

2022-04-22 Thread GitBox
ursabot commented on PR #12894: URL: https://github.com/apache/arrow/pull/12894#issuecomment-1107371867 Benchmark runs are scheduled for baseline = b9952840be6ff7234b416b5b80a48ecd7a5ecf60 and contender = 4f08a9b6d0f1249f3f3246167e18360da52a6f0d. 4f08a9b6d0f1249f3f3246167e18360da52a6f0d is

[GitHub] [arrow] emkornfield commented on a diff in pull request #12763: ARROW-14892: [Python][C++] GCS Bindings

2022-04-22 Thread GitBox
emkornfield commented on code in PR #12763: URL: https://github.com/apache/arrow/pull/12763#discussion_r856837612 ## ci/scripts/python_wheel_unix_test.sh: ## @@ -78,6 +83,7 @@ fi if [ "${CHECK_UNITTESTS}" == "ON" ]; then # Install testing dependencies pip install -U -r ${

[GitHub] [arrow-rs] codecov-commenter commented on pull request #1610: Read/Write nested dictionaries under FixedSizeList in IPC

2022-04-22 Thread GitBox
codecov-commenter commented on PR #1610: URL: https://github.com/apache/arrow-rs/pull/1610#issuecomment-1107339857 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1610?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+S

[GitHub] [arrow] ursabot commented on pull request #12939: ARROW-15777: [Python][Flight] Allow passing IpcReadOptions to FlightCallOptions

2022-04-22 Thread GitBox
ursabot commented on PR #12939: URL: https://github.com/apache/arrow/pull/12939#issuecomment-1107338038 Benchmark runs are scheduled for baseline = 83cc5727571c6291dad513e5a2cab15e6182b569 and contender = b9952840be6ff7234b416b5b80a48ecd7a5ecf60. b9952840be6ff7234b416b5b80a48ecd7a5ecf60 is

[GitHub] [arrow] kou closed pull request #12971: ARROW-16296: [GLib] Add missing casts for GArrowRoundMode

2022-04-22 Thread GitBox
kou closed pull request #12971: ARROW-16296: [GLib] Add missing casts for GArrowRoundMode URL: https://github.com/apache/arrow/pull/12971 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [arrow-rs] viirya opened a new pull request, #1610: Read/Write nested dictionaries under FixedSizeList in IPC

2022-04-22 Thread GitBox
viirya opened a new pull request, #1610: URL: https://github.com/apache/arrow-rs/pull/1610 # Which issue does this PR close? Closes #1609. # Rationale for this change # What changes are included in this PR? # Are there any user-facing chan

[GitHub] [arrow-rs] codecov-commenter commented on pull request #1608: Add `substring` support for binary

2022-04-22 Thread GitBox
codecov-commenter commented on PR #1608: URL: https://github.com/apache/arrow-rs/pull/1608#issuecomment-1107298299 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1608?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+S

[GitHub] [arrow-rs] viirya opened a new issue, #1609: Read/write nested dictionary under fixed size list in ipc stream reader/write

2022-04-22 Thread GitBox
viirya opened a new issue, #1609: URL: https://github.com/apache/arrow-rs/issues/1609 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** This is for nested dictionaries under FixedSizeList in IPC stream reader/writer. **Descr

[GitHub] [arrow] kou commented on pull request #12971: ARROW-16296: [GLib] Add missing casts for GArrowRoundMode

2022-04-22 Thread GitBox
kou commented on PR #12971: URL: https://github.com/apache/arrow/pull/12971#issuecomment-1107298221 +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mai

[GitHub] [arrow-rs] HaoYang670 commented on a diff in pull request #1608: Add `substring` support for binary

2022-04-22 Thread GitBox
HaoYang670 commented on code in PR #1608: URL: https://github.com/apache/arrow-rs/pull/1608#discussion_r856785189 ## arrow/src/compute/kernels/substring.rs: ## @@ -25,7 +26,68 @@ use crate::{ }; use std::cmp::Ordering; -fn generic_substring( +fn binary_substring( Review Com

[GitHub] [arrow-rs] HaoYang670 commented on a diff in pull request #1608: Add `substring` support for binary

2022-04-22 Thread GitBox
HaoYang670 commented on code in PR #1608: URL: https://github.com/apache/arrow-rs/pull/1608#discussion_r856778546 ## arrow/src/compute/kernels/substring.rs: ## @@ -291,11 +575,14 @@ mod tests { cases.into_iter().try_for_each::<_, Result<()>>( |(array, sta

[GitHub] [arrow-rs] HaoYang670 opened a new pull request, #1608: Add `substring` support for binary

2022-04-22 Thread GitBox
HaoYang670 opened a new pull request, #1608: URL: https://github.com/apache/arrow-rs/pull/1608 # Which issue does this PR close? Closes #1593 . # Rationale for this change # What changes are included in this PR? 1. Add substring support for (Large)BinaryArr

[GitHub] [arrow] github-actions[bot] commented on pull request #12971: ARROW-16296: [GLib] Add missing casts for GArrowRoundMode

2022-04-22 Thread GitBox
github-actions[bot] commented on PR #12971: URL: https://github.com/apache/arrow/pull/12971#issuecomment-1107197527 https://issues.apache.org/jira/browse/ARROW-16296 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow] github-actions[bot] commented on pull request #12971: ARROW-16296: [GLib] Add missing casts for GArrowRoundMode

2022-04-22 Thread GitBox
github-actions[bot] commented on PR #12971: URL: https://github.com/apache/arrow/pull/12971#issuecomment-1107197632 :warning: Ticket **has not been started in JIRA**, please click 'Start Progress'. -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [arrow] github-actions[bot] commented on pull request #12970: ARROW-16295: [CI][Release] Use windows-2019 for verify-rc-source-windows

2022-04-22 Thread GitBox
github-actions[bot] commented on PR #12970: URL: https://github.com/apache/arrow/pull/12970#issuecomment-1107197230 Revision: a4804584c07bf5b47078d5dad59f7f295676f2fa Submitted crossbow builds: [ursacomputing/crossbow @ actions-1927](https://github.com/ursacomputing/crossbow/branches/

[GitHub] [arrow] kou opened a new pull request, #12971: ARROW-16296: [GLib] Add missing casts for GArrowRoundMode

2022-04-22 Thread GitBox
kou opened a new pull request, #12971: URL: https://github.com/apache/arrow/pull/12971 This fixes the following warning: GLib-GObject-WARNING **: value "((GArrowRoundMode) -765521144)" of type 'GArrowRoundMode' is invalid or out of range for property 'mode' of type 'G

[GitHub] [arrow] github-actions[bot] commented on pull request #12970: ARROW-16295: [CI][Release] Use windows-2019 for verify-rc-source-windows

2022-04-22 Thread GitBox
github-actions[bot] commented on PR #12970: URL: https://github.com/apache/arrow/pull/12970#issuecomment-1107193897 https://issues.apache.org/jira/browse/ARROW-16295 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow] github-actions[bot] commented on pull request #12970: ARROW-16295: [CI][Release] Use windows-2019 for verify-rc-source-windows

2022-04-22 Thread GitBox
github-actions[bot] commented on PR #12970: URL: https://github.com/apache/arrow/pull/12970#issuecomment-1107194001 :warning: Ticket **has not been started in JIRA**, please click 'Start Progress'. -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [arrow] kou commented on pull request #12970: ARROW-16295: [CI][Release] Use windows-2019 for verify-rc-source-windows

2022-04-22 Thread GitBox
kou commented on PR #12970: URL: https://github.com/apache/arrow/pull/12970#issuecomment-1107193562 @github-actions crossbow submit verify-rc-source-windows -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [arrow] kou opened a new pull request, #12970: ARROW-16295: [CI][Release] Use windows-2019 for verify-rc-source-windows

2022-04-22 Thread GitBox
kou opened a new pull request, #12970: URL: https://github.com/apache/arrow/pull/12970 Because windows-2016 is deprecated: https://github.com/actions/virtual-environments/issues/4312 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow] github-actions[bot] commented on pull request #12969: ARROW-16278: [CI] Fix git installation failure on brew

2022-04-22 Thread GitBox
github-actions[bot] commented on PR #12969: URL: https://github.com/apache/arrow/pull/12969#issuecomment-1107175037 Revision: c5af348f4a9f416d7a76b0fcb9fce2600c712289 Submitted crossbow builds: [ursacomputing/crossbow @ actions-1926](https://github.com/ursacomputing/crossbow/branches/

[GitHub] [arrow] github-actions[bot] commented on pull request #12969: ARROW-16278: [CI] Fix git installation failure on brew

2022-04-22 Thread GitBox
github-actions[bot] commented on PR #12969: URL: https://github.com/apache/arrow/pull/12969#issuecomment-1107173482 https://issues.apache.org/jira/browse/ARROW-16278 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow] kou commented on pull request #12969: ARROW-16278: [CI] Fix git installation failure on brew

2022-04-22 Thread GitBox
kou commented on PR #12969: URL: https://github.com/apache/arrow/pull/12969#issuecomment-1107173517 @github-actions crossbow submit java-jars -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [arrow] kou opened a new pull request, #12969: ARROW-16278: [CI] Fix git installation failure on brew

2022-04-22 Thread GitBox
kou opened a new pull request, #12969: URL: https://github.com/apache/arrow/pull/12969 This is a follow-up of #12958. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

[GitHub] [arrow] lidavidm commented on a diff in pull request #12967: ARROW-16294: [C++] Improve performance of parquet readahead

2022-04-22 Thread GitBox
lidavidm commented on code in PR #12967: URL: https://github.com/apache/arrow/pull/12967#discussion_r856744758 ## cpp/src/parquet/arrow/reader.cc: ## @@ -1051,33 +1051,66 @@ class RowGroupGenerator { using RecordBatchGenerator = ::arrow::AsyncGenerator>; + struct Re

[GitHub] [arrow] westonpace commented on pull request #12228: ARROW-15410: [C++][Datasets] Improve memory usage of datasets API when scanning parquet

2022-04-22 Thread GitBox
westonpace commented on PR #12228: URL: https://github.com/apache/arrow/pull/12228#issuecomment-1107138130 Sorry Kou. Thanks David! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [arrow] lidavidm commented on a diff in pull request #12967: ARROW-16294: [C++] Improve performance of parquet readahead

2022-04-22 Thread GitBox
lidavidm commented on code in PR #12967: URL: https://github.com/apache/arrow/pull/12967#discussion_r856742557 ## cpp/src/parquet/arrow/reader.cc: ## @@ -1113,15 +1146,19 @@ class RowGroupGenerator { ::arrow::internal::Executor* cpu_executor_; std::vector row_groups_; s

[GitHub] [arrow] lidavidm commented on pull request #12228: ARROW-15410: [C++][Datasets] Improve memory usage of datasets API when scanning parquet

2022-04-22 Thread GitBox
lidavidm commented on PR #12228: URL: https://github.com/apache/arrow/pull/12228#issuecomment-1107119684 Sorry, we should've looked at CI more carefully. Thanks for catching this @kou and please see #12968 -- This is an automated message from the Apache Git Service. To respond to the mess

[GitHub] [arrow-rs] wangfenjin commented on pull request #1386: Implement basic FlightSQL Server

2022-04-22 Thread GitBox
wangfenjin commented on PR #1386: URL: https://github.com/apache/arrow-rs/pull/1386#issuecomment-1107073575 @timvw you could create an PR to #1413 I’m working on it but still got no time to finish it. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] ursabot commented on pull request #12933: ARROW-16250: [GLib][Parquet] Add GParquetColumnChunkMetadata

2022-04-22 Thread GitBox
ursabot commented on PR #12933: URL: https://github.com/apache/arrow/pull/12933#issuecomment-1107066612 Benchmark runs are scheduled for baseline = c73870acdc775c43b00c1816f9c17e78036c8025 and contender = 83cc5727571c6291dad513e5a2cab15e6182b569. 83cc5727571c6291dad513e5a2cab15e6182b569 is

[GitHub] [arrow] kou closed pull request #12964: ARROW-16293: [CI][GLib] Make tests stable

2022-04-22 Thread GitBox
kou closed pull request #12964: ARROW-16293: [CI][GLib] Make tests stable URL: https://github.com/apache/arrow/pull/12964 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

[GitHub] [arrow] kou commented on pull request #12964: ARROW-16293: [CI][GLib] Make tests stable

2022-04-22 Thread GitBox
kou commented on PR #12964: URL: https://github.com/apache/arrow/pull/12964#issuecomment-1107060815 +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mai

[GitHub] [arrow] kou commented on pull request #12228: ARROW-15410: [C++][Datasets] Improve memory usage of datasets API when scanning parquet

2022-04-22 Thread GitBox
kou commented on PR #12228: URL: https://github.com/apache/arrow/pull/12228#issuecomment-1107059684 @westonpace This breaks a GLib CI on macOS: https://github.com/apache/arrow/runs/6136118707?check_suite_focus=true#step:9:128 ```text FAILED: arrow-glib/libarrow-glib.800.dyli

[GitHub] [arrow] github-actions[bot] commented on pull request #12966: ARROW-16278: [CI] Fix git installation failure on brew

2022-04-22 Thread GitBox
github-actions[bot] commented on PR #12966: URL: https://github.com/apache/arrow/pull/12966#issuecomment-1107058438 Revision: 8ec260e00c65e4094ae59bb654e90a8012036b24 Submitted crossbow builds: [ursacomputing/crossbow @ actions-1925](https://github.com/ursacomputing/crossbow/branches/

[GitHub] [arrow] github-actions[bot] commented on pull request #12967: ARROW-16294: [C++] Improve performance of parquet readahead

2022-04-22 Thread GitBox
github-actions[bot] commented on PR #12967: URL: https://github.com/apache/arrow/pull/12967#issuecomment-1107058194 https://issues.apache.org/jira/browse/ARROW-16294 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow] github-actions[bot] commented on pull request #12967: ARROW-16294: [C++] Improve performance of parquet readahead

2022-04-22 Thread GitBox
github-actions[bot] commented on PR #12967: URL: https://github.com/apache/arrow/pull/12967#issuecomment-1107058208 :warning: Ticket **has not been started in JIRA**, please click 'Start Progress'. -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [arrow] github-actions[bot] commented on pull request #12967: ARROW-16294: [C++] Improve performance of parquet readahead

2022-04-22 Thread GitBox
github-actions[bot] commented on PR #12967: URL: https://github.com/apache/arrow/pull/12967#issuecomment-1107058202 :warning: Ticket **has no components in JIRA**, make sure you assign one. -- This is an automated message from the Apache Git Service. To respond to the message, please log o

[GitHub] [arrow] westonpace opened a new pull request, #12967: ARROW-16294: [C++] Improve performance of parquet readahead

2022-04-22 Thread GitBox
westonpace opened a new pull request, #12967: URL: https://github.com/apache/arrow/pull/12967 * Turns out the batch size we were slicing internally for parquet (in the TableBatchReader) was not the batch size from the scanner. I added batch slicing in file_parquet to leave the parquet rea

[GitHub] [arrow] kou commented on pull request #12966: ARROW-16278: [CI] Fix git installation failure on brew

2022-04-22 Thread GitBox
kou commented on PR #12966: URL: https://github.com/apache/arrow/pull/12966#issuecomment-1107058035 @github-actions crossbow submit verify-rc-source-*-macos-amd64 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow] github-actions[bot] commented on pull request #12966: ARROW-16278: [CI] Fix git installation failure on brew

2022-04-22 Thread GitBox
github-actions[bot] commented on PR #12966: URL: https://github.com/apache/arrow/pull/12966#issuecomment-1107057884 https://issues.apache.org/jira/browse/ARROW-16278 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow] kou opened a new pull request, #12966: ARROW-16278: [CI] Fix git installation failure on brew

2022-04-22 Thread GitBox
kou opened a new pull request, #12966: URL: https://github.com/apache/arrow/pull/12966 This is a follow-up of #12958. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

[GitHub] [arrow] kou commented on a diff in pull request #12763: ARROW-14892: [Python][C++] GCS Bindings

2022-04-22 Thread GitBox
kou commented on code in PR #12763: URL: https://github.com/apache/arrow/pull/12763#discussion_r856674725 ## ci/scripts/python_wheel_unix_test.sh: ## @@ -78,6 +83,7 @@ fi if [ "${CHECK_UNITTESTS}" == "ON" ]; then # Install testing dependencies pip install -U -r ${source_d

[GitHub] [arrow-rs] sunchao commented on pull request #1607: Parquet: schema validation should allow scale == precision for decimal type

2022-04-22 Thread GitBox
sunchao commented on PR #1607: URL: https://github.com/apache/arrow-rs/pull/1607#issuecomment-1107054193 cc @viirya @alamb @nevi-me could you help to review this? thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [arrow-cookbook] davisusanibar opened a new pull request, #184: ARROW-16291: [Java] Support JSE17 for Java Cookbooks

2022-04-22 Thread GitBox
davisusanibar opened a new pull request, #184: URL: https://github.com/apache/arrow-cookbook/pull/184 Initial changes to support Arrow Java code with JSE17/18 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[GitHub] [arrow] westonpace closed pull request #12228: ARROW-15410: [C++][Datasets] Improve memory usage of datasets API when scanning parquet

2022-04-22 Thread GitBox
westonpace closed pull request #12228: ARROW-15410: [C++][Datasets] Improve memory usage of datasets API when scanning parquet URL: https://github.com/apache/arrow/pull/12228 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

[GitHub] [arrow] westonpace commented on pull request #12228: ARROW-15410: [C++][Datasets] Improve memory usage of datasets API when scanning parquet

2022-04-22 Thread GitBox
westonpace commented on PR #12228: URL: https://github.com/apache/arrow/pull/12228#issuecomment-1106968968 Ok. I'm going to merge this so we get some nightly tests, etc. run on it and then I'm going to see if I can get a PR up for the ideal behavior pretty quickly. -- This is an automat

[GitHub] [arrow] lidavidm commented on pull request #12228: ARROW-15410: [C++][Datasets] Improve memory usage of datasets API when scanning parquet

2022-04-22 Thread GitBox
lidavidm commented on PR #12228: URL: https://github.com/apache/arrow/pull/12228#issuecomment-1106965551 Hmm, okay. Well, low number of files + smallish row groups should hopefully not be too big a deal then. It might be worth calling this out in the release notes come time just so t

[GitHub] [arrow] westonpace commented on pull request #12228: ARROW-15410: [C++][Datasets] Improve memory usage of datasets API when scanning parquet

2022-04-22 Thread GitBox
westonpace commented on PR #12228: URL: https://github.com/apache/arrow/pull/12228#issuecomment-1106964238 There's no config that we can easily change that would go back to the old behavior. -- This is an automated message from the Apache Git Service. To respond to the message, please l

[GitHub] [arrow] lidavidm commented on pull request #12228: ARROW-15410: [C++][Datasets] Improve memory usage of datasets API when scanning parquet

2022-04-22 Thread GitBox
lidavidm commented on PR #12228: URL: https://github.com/apache/arrow/pull/12228#issuecomment-1106962794 That said you should feel free to merge here in that case, we should prefer not crashing/freezing -- This is an automated message from the Apache Git Service. To respond to the message

[GitHub] [arrow] westonpace closed pull request #12944: ARROW-16264: [C++][CI] Valgrind timeout in arrow-compute-hash-join-node-test

2022-04-22 Thread GitBox
westonpace closed pull request #12944: ARROW-16264: [C++][CI] Valgrind timeout in arrow-compute-hash-join-node-test URL: https://github.com/apache/arrow/pull/12944 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [arrow] lidavidm commented on pull request #12228: ARROW-15410: [C++][Datasets] Improve memory usage of datasets API when scanning parquet

2022-04-22 Thread GitBox
lidavidm commented on PR #12228: URL: https://github.com/apache/arrow/pull/12228#issuecomment-1106961685 Just so that regressions can be easily fixed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] lidavidm commented on pull request #12228: ARROW-15410: [C++][Datasets] Improve memory usage of datasets API when scanning parquet

2022-04-22 Thread GitBox
lidavidm commented on PR #12228: URL: https://github.com/apache/arrow/pull/12228#issuecomment-1106961501 Is there a quick config change we can document for S3? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[GitHub] [arrow] westonpace commented on pull request #12944: ARROW-16264: [C++][CI] Valgrind timeout in arrow-compute-hash-join-node-test

2022-04-22 Thread GitBox
westonpace commented on PR #12944: URL: https://github.com/apache/arrow/pull/12944#issuecomment-1106961615 I'm going to move forward with this to unblock valgrind. If we decide later we want to extend the test set on valgrind we can fix that in a follow-up. -- This is an automated messag

[GitHub] [arrow] westonpace commented on pull request #12228: ARROW-15410: [C++][Datasets] Improve memory usage of datasets API when scanning parquet

2022-04-22 Thread GitBox
westonpace commented on PR #12228: URL: https://github.com/apache/arrow/pull/12228#issuecomment-1106960830 @lidavidm Hmmdoing some more testing on this I think this might be non-ideal in a few situations (S3, low number of files, smallish row groups). This is because we are alw

[GitHub] [arrow] jwijffels opened a new issue, #12965: looks like you

2022-04-22 Thread GitBox
jwijffels opened a new issue, #12965: URL: https://github.com/apache/arrow/issues/12965 looks like you have an issue with the CRAN R package, it should Import` tidyselect >= 1.0.0` as function `all_of ` was only available in tidyselect since `version >= 1.0.0`. ``` > install.packa

[GitHub] [arrow-datafusion] jdye64 opened a new issue, #2321: [doc] Add code formatting instructions to CONTRIBUTING.md

2022-04-22 Thread GitBox
jdye64 opened a new issue, #2321: URL: https://github.com/apache/arrow-datafusion/issues/2321 A section in `CONTRIBUTING.md` for `Code Formatting` would be useful for new developers to understand patterns, standards, and also prevent unnecessary commits with linter errors. **Describe

[GitHub] [arrow] github-actions[bot] commented on pull request #12893: ARROW-16198: [CI][Packaging][Python] Update VCPKG version

2022-04-22 Thread GitBox
github-actions[bot] commented on PR #12893: URL: https://github.com/apache/arrow/pull/12893#issuecomment-1106916343 Revision: cb83048f143072012524e4179ca6180900cdc759 Submitted crossbow builds: [ursacomputing/crossbow @ actions-1924](https://github.com/ursacomputing/crossbow/branches/

[GitHub] [arrow-datafusion] jdye64 opened a new pull request, #2320: deprecate `index_of` and make `index_of_column_by_name` public

2022-04-22 Thread GitBox
jdye64 opened a new pull request, #2320: URL: https://github.com/apache/arrow-datafusion/pull/2320 Closes #2319 # Are there any user-facing changes? Along with `index_of` being deprecated the user will be presented with a `DataFusion::Plan` error if they attempt to use `index_of`

[GitHub] [arrow] kszucs commented on pull request #12893: ARROW-16198: [CI][Packaging][Python] Update VCPKG version

2022-04-22 Thread GitBox
kszucs commented on PR #12893: URL: https://github.com/apache/arrow/pull/12893#issuecomment-1106915142 @github-actions crossbow submit test-build-vcpkg-win -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [arrow] github-actions[bot] commented on pull request #12964: ARROW-16293: [CI][GLib] Make tests stable

2022-04-22 Thread GitBox
github-actions[bot] commented on PR #12964: URL: https://github.com/apache/arrow/pull/12964#issuecomment-1106904813 :warning: Ticket **has not been started in JIRA**, please click 'Start Progress'. -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [arrow] github-actions[bot] commented on pull request #12964: ARROW-16293: [CI][GLib] Make tests stable

2022-04-22 Thread GitBox
github-actions[bot] commented on PR #12964: URL: https://github.com/apache/arrow/pull/12964#issuecomment-1106904778 https://issues.apache.org/jira/browse/ARROW-16293 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow] kou opened a new pull request, #12964: ARROW-16293: [CI][GLib] Make tests stable

2022-04-22 Thread GitBox
kou opened a new pull request, #12964: URL: https://github.com/apache/arrow/pull/12964 1. Increase timeout to 60 from 40 for macOS job because 40 is short to build without ccache cache. 2. Don't build C++ utilities (ARROW_BUILD_UTILITIES=OFF) because they aren't used. 3. Omi

[GitHub] [arrow] kszucs closed pull request #12955: ARROW-16240: [Python] Support row_group_size/chunk_size keyword in pq.write_to_dataset with use_legacy_dataset=False

2022-04-22 Thread GitBox
kszucs closed pull request #12955: ARROW-16240: [Python] Support row_group_size/chunk_size keyword in pq.write_to_dataset with use_legacy_dataset=False URL: https://github.com/apache/arrow/pull/12955 -- This is an automated message from the Apache Git Service. To respond to the message, plea

[GitHub] [arrow] github-actions[bot] commented on pull request #12893: ARROW-16198: [CI][Packaging][Python] Update VCPKG version

2022-04-22 Thread GitBox
github-actions[bot] commented on PR #12893: URL: https://github.com/apache/arrow/pull/12893#issuecomment-1106886109 Revision: cb83048f143072012524e4179ca6180900cdc759 Submitted crossbow builds: [ursacomputing/crossbow @ actions-1923](https://github.com/ursacomputing/crossbow/branches/

[GitHub] [arrow] ursabot commented on pull request #12849: ARROW-15092: [R] Support create_package_with_all_dependencies() on non-linux systems

2022-04-22 Thread GitBox
ursabot commented on PR #12849: URL: https://github.com/apache/arrow/pull/12849#issuecomment-1106885261 Benchmark runs are scheduled for baseline = 20bc63a820e3d691f4484135108c73b5c2ef6746 and contender = c73870acdc775c43b00c1816f9c17e78036c8025. c73870acdc775c43b00c1816f9c17e78036c8025 is

[GitHub] [arrow] kszucs commented on pull request #12893: ARROW-16198: [CI][Packaging][Python] Update VCPKG version

2022-04-22 Thread GitBox
kszucs commented on PR #12893: URL: https://github.com/apache/arrow/pull/12893#issuecomment-1106885125 @github-actions crossbow submit wheel-macos-high-sierra-cp310-amd64 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [arrow] github-actions[bot] commented on pull request #12763: ARROW-14892: [Python][C++] GCS Bindings

2022-04-22 Thread GitBox
github-actions[bot] commented on PR #12763: URL: https://github.com/apache/arrow/pull/12763#issuecomment-1106875847 Revision: 023b4f8e146c6821d8c5b5f04f2560348a134bb1 Submitted crossbow builds: [ursacomputing/crossbow @ actions-1922](https://github.com/ursacomputing/crossbow/branches/

[GitHub] [arrow] kou closed pull request #12953: ARROW-16251: [GLib][Parquet] Add GParquetStatistics and its family

2022-04-22 Thread GitBox
kou closed pull request #12953: ARROW-16251: [GLib][Parquet] Add GParquetStatistics and its family URL: https://github.com/apache/arrow/pull/12953 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] kou commented on pull request #12953: ARROW-16251: [GLib][Parquet] Add GParquetStatistics and its family

2022-04-22 Thread GitBox
kou commented on PR #12953: URL: https://github.com/apache/arrow/pull/12953#issuecomment-1106875314 +1 The macOS failure is unrelated. It's caused by timeout without ccache. I'll fix it in another pull request. -- This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow] jvanstraten commented on a diff in pull request #12852: ARROW-15583: [C++] The Substrait consumer could potentially use a massive amount of RAM if the producer uses large anchors

2022-04-22 Thread GitBox
jvanstraten commented on code in PR #12852: URL: https://github.com/apache/arrow/pull/12852#discussion_r856573840 ## cpp/src/arrow/engine/substrait/plan_internal.cc: ## @@ -108,13 +108,23 @@ void SetElement(size_t i, const Element& element, std::vector* vector) { } (*vect

[GitHub] [arrow] kou commented on pull request #12763: ARROW-14892: [Python][C++] GCS Bindings

2022-04-22 Thread GitBox
kou commented on PR #12763: URL: https://github.com/apache/arrow/pull/12763#issuecomment-1106874434 @github-actions crossbow submit -g nightly-tests -g nightly-packaging -g nightly-release -- This is an automated message from the Apache Git Service. To respond to the message, please log o

[GitHub] [arrow] github-actions[bot] commented on pull request #12893: ARROW-16198: [CI][Packaging][Python] Update VCPKG version

2022-04-22 Thread GitBox
github-actions[bot] commented on PR #12893: URL: https://github.com/apache/arrow/pull/12893#issuecomment-1106868826 Revision: cb83048f143072012524e4179ca6180900cdc759 Submitted crossbow builds: [ursacomputing/crossbow @ actions-1921](https://github.com/ursacomputing/crossbow/branches/

[GitHub] [arrow] kszucs commented on pull request #12893: ARROW-16198: [CI][Packaging][Python] Update VCPKG version

2022-04-22 Thread GitBox
kszucs commented on PR #12893: URL: https://github.com/apache/arrow/pull/12893#issuecomment-1106867381 @github-actions crossbow submit wheel-macos-big-sur-cp310-universal2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

[GitHub] [arrow] github-actions[bot] commented on pull request #12944: ARROW-16264: [C++][CI] Valgrind timeout in arrow-compute-hash-join-node-test

2022-04-22 Thread GitBox
github-actions[bot] commented on PR #12944: URL: https://github.com/apache/arrow/pull/12944#issuecomment-1106858111 Revision: 3080dab08aea954fb10209d3e61d89bbb025ba00 Submitted crossbow builds: [ursacomputing/crossbow @ actions-1920](https://github.com/ursacomputing/crossbow/branches/

[GitHub] [arrow] westonpace commented on pull request #12944: ARROW-16264: [C++][CI] Valgrind timeout in arrow-compute-hash-join-node-test

2022-04-22 Thread GitBox
westonpace commented on PR #12944: URL: https://github.com/apache/arrow/pull/12944#issuecomment-1106857089 @github-actions crossbow submit test-conda-cpp-valgrind -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow] kszucs merged pull request #12951: MINOR: [Docs] Update parquet.rst to note support for zstd and lz4

2022-04-22 Thread GitBox
kszucs merged PR #12951: URL: https://github.com/apache/arrow/pull/12951 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache

[GitHub] [arrow] kszucs commented on pull request #12701: ARROW-15892: [C++] Dataset APIs require s3:ListBucket Permissions

2022-04-22 Thread GitBox
kszucs commented on PR #12701: URL: https://github.com/apache/arrow/pull/12701#issuecomment-1106851091 @sanjibansg please rebase to make the builds pass -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow] kszucs closed pull request #12838: ARROW-15757: [Python] Missing bindings for existing_data_behavior makes it impossible to maintain old behavior

2022-04-22 Thread GitBox
kszucs closed pull request #12838: ARROW-15757: [Python] Missing bindings for existing_data_behavior makes it impossible to maintain old behavior URL: https://github.com/apache/arrow/pull/12838 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [arrow-datafusion] andygrove commented on issue #2319: Improve UX of looking up a DFSchema column by name

2022-04-22 Thread GitBox
andygrove commented on issue #2319: URL: https://github.com/apache/arrow-datafusion/issues/2319#issuecomment-1106840883 Actually ignore my last comment - that is not an `index_of` method -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [arrow-datafusion] andygrove commented on issue #2319: Improve UX of looking up a DFSchema column by name

2022-04-22 Thread GitBox
andygrove commented on issue #2319: URL: https://github.com/apache/arrow-datafusion/issues/2319#issuecomment-1106839519 I just noticed there is also `field_with_name` which would work fine, so I think all we need to is deprecate `index_of` and the check in there for qualified names -- T

[GitHub] [arrow] kou closed pull request #12961: MINOR: [Docs] Fix typo in parquet.rst

2022-04-22 Thread GitBox
kou closed pull request #12961: MINOR: [Docs] Fix typo in parquet.rst URL: https://github.com/apache/arrow/pull/12961 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

[GitHub] [arrow] kou commented on pull request #12962: ARROW-16282: [WIP][CI] [C#] Verifiy release on c-sharp has been failing since upgrading ubuntu to 22.04

2022-04-22 Thread GitBox
kou commented on PR #12962: URL: https://github.com/apache/arrow/pull/12962#issuecomment-1106837347 We have a pull request for this: https://github.com/apache/arrow/pull/12870 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [arrow-rs] sunchao opened a new pull request, #1607: Parquet: schema validation should allow scale == precision for decimal type

2022-04-22 Thread GitBox
sunchao opened a new pull request, #1607: URL: https://github.com/apache/arrow-rs/pull/1607 # Which issue does this PR close? Closes #1606. # Rationale for this change For decimal type, it is a valid case for scale to be equal to precision. However

[GitHub] [arrow-datafusion] jdye64 commented on issue #2319: Improve UX of looking up a DFSchema column by name

2022-04-22 Thread GitBox
jdye64 commented on issue #2319: URL: https://github.com/apache/arrow-datafusion/issues/2319#issuecomment-1106835346 This would be helpful for something I am working on and am happy to work on this issue. -- This is an automated message from the Apache Git Service. To respond to the mess

[GitHub] [arrow] westonpace commented on a diff in pull request #12852: ARROW-15583: [C++] The Substrait consumer could potentially use a massive amount of RAM if the producer uses large anchors

2022-04-22 Thread GitBox
westonpace commented on code in PR #12852: URL: https://github.com/apache/arrow/pull/12852#discussion_r856547712 ## cpp/src/arrow/engine/substrait/plan_internal.cc: ## @@ -108,13 +108,23 @@ void SetElement(size_t i, const Element& element, std::vector* vector) { } (*vecto

[GitHub] [arrow] lidavidm commented on pull request #12590: ARROW-15639 [C++][Python] UDF Scalar Function Implementation

2022-04-22 Thread GitBox
lidavidm commented on PR #12590: URL: https://github.com/apache/arrow/pull/12590#issuecomment-1106829143 That sounds reasonable to me. It would simplify things quite a bit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[GitHub] [arrow-datafusion] andygrove opened a new issue, #2319: Improve UX of looking up a DFSchema column by name

2022-04-22 Thread GitBox
andygrove opened a new issue, #2319: URL: https://github.com/apache/arrow-datafusion/issues/2319 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** I have a simple schema containing a column with the qualified name `df.a`: ```

[GitHub] [arrow-rs] sunchao opened a new issue, #1606: Parquet schema should allow scale == precision for decimal type

2022-04-22 Thread GitBox
sunchao opened a new issue, #1606: URL: https://github.com/apache/arrow-rs/issues/1606 **Describe the bug** Currently when building a Parquet primitive type, the `PrimitiveTypeBuilder` won't allow cases such as `Decimal(1, 1)`: ```rust if self.scale >= self.precision

[GitHub] [arrow] lidavidm commented on a diff in pull request #12963: ARROW-16234: [C++] Vector Kernel for Rank

2022-04-22 Thread GitBox
lidavidm commented on code in PR #12963: URL: https://github.com/apache/arrow/pull/12963#discussion_r856541244 ## cpp/src/arrow/compute/kernels/vector_sort.cc: ## @@ -1909,6 +1909,68 @@ class SelectKUnstableMetaFunction : public MetaFunction { } }; +//

[GitHub] [arrow] westonpace commented on pull request #12590: ARROW-15639 [C++][Python] UDF Scalar Function Implementation

2022-04-22 Thread GitBox
westonpace commented on PR #12590: URL: https://github.com/apache/arrow/pull/12590#issuecomment-1106822179 I tried to create an example test case where batch_length mattered. This meant using these UDFs as a projection expression in a plan. I learned a lot about how things are currently i

[GitHub] [arrow] lidavidm commented on pull request #12957: ARROW-16280: [C++] Avoid copying shared_ptr in Expression::type()

2022-04-22 Thread GitBox
lidavidm commented on PR #12957: URL: https://github.com/apache/arrow/pull/12957#issuecomment-1106821023 Thanks for catching this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[GitHub] [arrow] lidavidm closed pull request #12957: ARROW-16280: [C++] Avoid copying shared_ptr in Expression::type()

2022-04-22 Thread GitBox
lidavidm closed pull request #12957: ARROW-16280: [C++] Avoid copying shared_ptr in Expression::type() URL: https://github.com/apache/arrow/pull/12957 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] lidavidm commented on pull request #12961: MINOR: [Docs] Fix typo in parquet.rst

2022-04-22 Thread GitBox
lidavidm commented on PR #12961: URL: https://github.com/apache/arrow/pull/12961#issuecomment-1106820427 Hmm, we could manually add an anchor if that's a concern. But within the docs itself the anchor isn't used so I don't think it's a big deal. -- This is an automated message from the Ap

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2226: Introduce new optional scheduler, using Morsel-driven Parallelism + rayon (#2199)

2022-04-22 Thread GitBox
alamb commented on code in PR #2226: URL: https://github.com/apache/arrow-datafusion/pull/2226#discussion_r856535693 ## datafusion/scheduler/src/pipeline/execution.rs: ## @@ -0,0 +1,330 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2226: Introduce new optional scheduler, using Morsel-driven Parallelism + rayon (#2199)

2022-04-22 Thread GitBox
alamb commented on code in PR #2226: URL: https://github.com/apache/arrow-datafusion/pull/2226#discussion_r856535099 ## datafusion/scheduler/src/pipeline/execution.rs: ## @@ -0,0 +1,330 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2226: Introduce new optional scheduler, using Morsel-driven Parallelism + rayon (#2199)

2022-04-22 Thread GitBox
alamb commented on code in PR #2226: URL: https://github.com/apache/arrow-datafusion/pull/2226#discussion_r856534975 ## datafusion/scheduler/Cargo.toml: ## @@ -0,0 +1,57 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements.

[GitHub] [arrow] github-actions[bot] commented on pull request #12963: ARROW-16234: [C++] Vector Kernel for Rank

2022-04-22 Thread GitBox
github-actions[bot] commented on PR #12963: URL: https://github.com/apache/arrow/pull/12963#issuecomment-1106813254 :warning: Ticket **has not been started in JIRA**, please click 'Start Progress'. -- This is an automated message from the Apache Git Service. To respond to the message, ple

  1   2   3   >