[GitHub] [arrow] ursabot edited a comment on pull request #12044: ARROW-13294: [C#] Create Flight example server and client

2022-01-07 Thread GitBox
ursabot edited a comment on pull request #12044: URL: https://github.com/apache/arrow/pull/12044#issuecomment-1007001772 Benchmark runs are scheduled for baseline = 7929cc803b093d082a5b8e52edb593807693a6d5 and contender = 1e7bfa24c579887f324982a27c0e06f6f9f5a803. 1e7bfa24c579887f324982a27

[GitHub] [arrow-datafusion] liukun4515 commented on a change in pull request #1525: Add stddev operator

2022-01-07 Thread GitBox
liukun4515 commented on a change in pull request #1525: URL: https://github.com/apache/arrow-datafusion/pull/1525#discussion_r780096756 ## File path: datafusion/src/physical_plan/expressions/stddev.rs ## @@ -0,0 +1,312 @@ +// Licensed to the Apache Software Foundation (ASF) und

[GitHub] [arrow-datafusion] liukun4515 commented on a change in pull request #1525: Add stddev operator

2022-01-07 Thread GitBox
liukun4515 commented on a change in pull request #1525: URL: https://github.com/apache/arrow-datafusion/pull/1525#discussion_r780097748 ## File path: datafusion/src/physical_plan/expressions/mod.rs ## @@ -84,9 +86,13 @@ pub use nth_value::NthValue; pub use nullif::{nullif_func

[GitHub] [arrow-datafusion] liukun4515 commented on a change in pull request #1525: Add stddev operator

2022-01-07 Thread GitBox
liukun4515 commented on a change in pull request #1525: URL: https://github.com/apache/arrow-datafusion/pull/1525#discussion_r780099721 ## File path: datafusion/src/physical_plan/expressions/stddev.rs ## @@ -0,0 +1,312 @@ +// Licensed to the Apache Software Foundation (ASF) und

[GitHub] [arrow-datafusion] liukun4515 commented on a change in pull request #1525: Add stddev operator

2022-01-07 Thread GitBox
liukun4515 commented on a change in pull request #1525: URL: https://github.com/apache/arrow-datafusion/pull/1525#discussion_r780100249 ## File path: datafusion/src/physical_plan/expressions/stddev.rs ## @@ -0,0 +1,312 @@ +// Licensed to the Apache Software Foundation (ASF) und

[GitHub] [arrow-datafusion] liukun4515 commented on a change in pull request #1525: Add stddev operator

2022-01-07 Thread GitBox
liukun4515 commented on a change in pull request #1525: URL: https://github.com/apache/arrow-datafusion/pull/1525#discussion_r780103988 ## File path: datafusion/src/physical_plan/expressions/variance.rs ## @@ -0,0 +1,376 @@ +// Licensed to the Apache Software Foundation (ASF) u

[GitHub] [arrow] andersonm-ibm commented on pull request #10450: ARROW-9947: [Python] High-level Python API for Parquet encryption of files.

2022-01-07 Thread GitBox
andersonm-ibm commented on pull request #10450: URL: https://github.com/apache/arrow/pull/10450#issuecomment-1007233380 > Right, can we get a functional PyArrow even without encryption enabled? @pitrou - I separated parquet encryption, so it's no longer mandatory to enable parquet en

[GitHub] [arrow] ursabot edited a comment on pull request #12031: ARROW-15138: [C++] Make ExecPlan::ToString give some additional information

2022-01-07 Thread GitBox
ursabot edited a comment on pull request #12031: URL: https://github.com/apache/arrow/pull/12031#issuecomment-1007001783 Benchmark runs are scheduled for baseline = 1e7bfa24c579887f324982a27c0e06f6f9f5a803 and contender = e64480db51fc9622d02613f3ec60bac34d765092. e64480db51fc9622d02613f3e

[GitHub] [arrow] thisisnic commented on a change in pull request #12097: ARROW-14590: [R] Implement lubridate::week

2022-01-07 Thread GitBox
thisisnic commented on a change in pull request #12097: URL: https://github.com/apache/arrow/pull/12097#discussion_r780125202 ## File path: r/tests/testthat/test-dplyr-funcs-datetime.R ## @@ -382,6 +382,15 @@ test_that("extract epiweek from timestamp", { ) }) +test_that("

[GitHub] [arrow] thisisnic commented on a change in pull request #12097: ARROW-14590: [R] Implement lubridate::week

2022-01-07 Thread GitBox
thisisnic commented on a change in pull request #12097: URL: https://github.com/apache/arrow/pull/12097#discussion_r780126775 ## File path: r/R/dplyr-funcs-datetime.R ## @@ -101,6 +101,10 @@ register_bindings_datetime <- function() { Expression$create("day_of_week", x, opt

[GitHub] [arrow] jvanstraten opened a new pull request #12098: ARROW-14592: list_parent_indices output type should not depend on input type

2022-01-07 Thread GitBox
jvanstraten opened a new pull request #12098: URL: https://github.com/apache/arrow/pull/12098 Changes the type returned by `list_parent_indices` to `int64` regardless of list index type (it used to return `int32` for `list`), as the output refers to row indices rather than list indices.

[GitHub] [arrow] github-actions[bot] commented on pull request #12098: ARROW-14592: list_parent_indices output type should not depend on input type

2022-01-07 Thread GitBox
github-actions[bot] commented on pull request #12098: URL: https://github.com/apache/arrow/pull/12098#issuecomment-1007258192 https://issues.apache.org/jira/browse/ARROW-14592 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[GitHub] [arrow] AlenkaF commented on a change in pull request #12097: ARROW-14590: [R] Implement lubridate::week

2022-01-07 Thread GitBox
AlenkaF commented on a change in pull request #12097: URL: https://github.com/apache/arrow/pull/12097#discussion_r780134565 ## File path: r/R/dplyr-funcs-datetime.R ## @@ -101,6 +101,10 @@ register_bindings_datetime <- function() { Expression$create("day_of_week", x, optio

[GitHub] [arrow] ursabot edited a comment on pull request #12031: ARROW-15138: [C++] Make ExecPlan::ToString give some additional information

2022-01-07 Thread GitBox
ursabot edited a comment on pull request #12031: URL: https://github.com/apache/arrow/pull/12031#issuecomment-1007001783 Benchmark runs are scheduled for baseline = 1e7bfa24c579887f324982a27c0e06f6f9f5a803 and contender = e64480db51fc9622d02613f3ec60bac34d765092. e64480db51fc9622d02613f3e

[GitHub] [arrow-datafusion] yjshen opened a new pull request #1526: A simplified memory manager for query execution

2022-01-07 Thread GitBox
yjshen opened a new pull request #1526: URL: https://github.com/apache/arrow-datafusion/pull/1526 # Which issue does this PR close? Closes #587 . # Rationale for this change When DataFusion processes a single partition, it will keep allocating memory until the

[GitHub] [arrow-rs] alamb commented on a change in pull request #1141: Update version to 7.0.0 and update CHANGELOG

2022-01-07 Thread GitBox
alamb commented on a change in pull request #1141: URL: https://github.com/apache/arrow-rs/pull/1141#discussion_r780207803 ## File path: CHANGELOG.md ## @@ -19,8 +19,146 @@ For older versions, see [apache/arrow/CHANGELOG.md](https://github.com/apache/arrow/blob/master/CHANG

[GitHub] [arrow-datafusion] alamb commented on pull request #1401: Fix bugs with nullability during rewrites: Combine `simplify` and `Simplifier`

2022-01-07 Thread GitBox
alamb commented on pull request #1401: URL: https://github.com/apache/arrow-datafusion/pull/1401#issuecomment-1007352996 @houqp or @Dandandan -- any concern if I merge this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #1401: Fix bugs with nullability during rewrites: Combine `simplify` and `Simplifier`

2022-01-07 Thread GitBox
Dandandan commented on a change in pull request #1401: URL: https://github.com/apache/arrow-datafusion/pull/1401#discussion_r780235612 ## File path: datafusion/src/optimizer/simplify_expressions.rs ## @@ -554,212 +416,250 @@ impl<'a> Simplifier<'a> { false } -

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #1401: Fix bugs with nullability during rewrites: Combine `simplify` and `Simplifier`

2022-01-07 Thread GitBox
Dandandan commented on a change in pull request #1401: URL: https://github.com/apache/arrow-datafusion/pull/1401#discussion_r780235612 ## File path: datafusion/src/optimizer/simplify_expressions.rs ## @@ -554,212 +416,250 @@ impl<'a> Simplifier<'a> { false } -

[GitHub] [arrow] rok commented on a change in pull request #12097: ARROW-14590: [R] Implement lubridate::week

2022-01-07 Thread GitBox
rok commented on a change in pull request #12097: URL: https://github.com/apache/arrow/pull/12097#discussion_r780237616 ## File path: r/R/dplyr-funcs-datetime.R ## @@ -101,6 +101,10 @@ register_bindings_datetime <- function() { Expression$create("day_of_week", x, options =

[GitHub] [arrow] rok commented on a change in pull request #12097: ARROW-14590: [R] Implement lubridate::week

2022-01-07 Thread GitBox
rok commented on a change in pull request #12097: URL: https://github.com/apache/arrow/pull/12097#discussion_r780237616 ## File path: r/R/dplyr-funcs-datetime.R ## @@ -101,6 +101,10 @@ register_bindings_datetime <- function() { Expression$create("day_of_week", x, options =

[GitHub] [arrow] lidavidm commented on a change in pull request #11991: ARROW-13554: [C++] Remove deprecated Scanner::Scan

2022-01-07 Thread GitBox
lidavidm commented on a change in pull request #11991: URL: https://github.com/apache/arrow/pull/11991#discussion_r780263584 ## File path: cpp/src/arrow/dataset/scanner.h ## @@ -138,41 +133,46 @@ struct ARROW_DS_EXPORT ScanOptions { // This is used by Fragment implementation

[GitHub] [arrow-rs] paddyhoran commented on pull request #1140: feat(ipc): support for reading union arrays through IPC

2022-01-07 Thread GitBox
paddyhoran commented on pull request #1140: URL: https://github.com/apache/arrow-rs/pull/1140#issuecomment-1007431548 Thank you very much for this @helgikrs! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

[GitHub] [arrow] jonkeane closed pull request #11894: ARROW-14029: [R] Repair map_batches()

2022-01-07 Thread GitBox
jonkeane closed pull request #11894: URL: https://github.com/apache/arrow/pull/11894 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubs

[GitHub] [arrow] jonkeane closed pull request #12090: ARROW-15266: [R] [CI] Test reorganization triggering valgrind errors

2022-01-07 Thread GitBox
jonkeane closed pull request #12090: URL: https://github.com/apache/arrow/pull/12090 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubs

[GitHub] [arrow] ursabot commented on pull request #11894: ARROW-14029: [R] Repair map_batches()

2022-01-07 Thread GitBox
ursabot commented on pull request #11894: URL: https://github.com/apache/arrow/pull/11894#issuecomment-1007435606 Benchmark runs are scheduled for baseline = e64480db51fc9622d02613f3ec60bac34d765092 and contender = f0544403b36c1d994f01d37d7ee77c08a87a6d29. f0544403b36c1d994f01d37d7ee77c08

[GitHub] [arrow] ursabot commented on pull request #12090: ARROW-15266: [R] [CI] Test reorganization triggering valgrind errors

2022-01-07 Thread GitBox
ursabot commented on pull request #12090: URL: https://github.com/apache/arrow/pull/12090#issuecomment-1007435619 Benchmark runs are scheduled for baseline = f0544403b36c1d994f01d37d7ee77c08a87a6d29 and contender = 66832557006a39c356d8608ad9cbbdb773bed0c7. 66832557006a39c356d8608ad9cbbdb7

[GitHub] [arrow] ursabot edited a comment on pull request #11894: ARROW-14029: [R] Repair map_batches()

2022-01-07 Thread GitBox
ursabot edited a comment on pull request #11894: URL: https://github.com/apache/arrow/pull/11894#issuecomment-1007435606 Benchmark runs are scheduled for baseline = e64480db51fc9622d02613f3ec60bac34d765092 and contender = f0544403b36c1d994f01d37d7ee77c08a87a6d29. f0544403b36c1d994f01d37d7

[GitHub] [arrow] ursabot edited a comment on pull request #12090: ARROW-15266: [R] [CI] Test reorganization triggering valgrind errors

2022-01-07 Thread GitBox
ursabot edited a comment on pull request #12090: URL: https://github.com/apache/arrow/pull/12090#issuecomment-1007435619 Benchmark runs are scheduled for baseline = f0544403b36c1d994f01d37d7ee77c08a87a6d29 and contender = 66832557006a39c356d8608ad9cbbdb773bed0c7. 66832557006a39c356d8608ad

[GitHub] [arrow] chriscasola commented on a change in pull request #11832: ARROW-5599: [Go] Migrate array.{Interface,Record,Column,Chunked,Table} to arrow.{Array,Record,Column,Chunked,Table}

2022-01-07 Thread GitBox
chriscasola commented on a change in pull request #11832: URL: https://github.com/apache/arrow/pull/11832#discussion_r780300956 ## File path: go/arrow/array.go ## @@ -0,0 +1,71 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agr

[GitHub] [arrow] zois-tasoulas commented on issue #11654: Linking error when building debug flavor on Windows

2022-01-07 Thread GitBox
zois-tasoulas commented on issue #11654: URL: https://github.com/apache/arrow/issues/11654#issuecomment-1007473071 Sure, I will do that this weekend. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow-datafusion] matthewmturner commented on pull request #1525: Add stddev operator

2022-01-07 Thread GitBox
matthewmturner commented on pull request #1525: URL: https://github.com/apache/arrow-datafusion/pull/1525#issuecomment-1007492724 Perhaps @Dandandan would be interested in this as he was involved in db-benchmark which this will help -- This is an automated message from the Apache Git Se

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #1525: Add stddev operator

2022-01-07 Thread GitBox
Dandandan commented on a change in pull request #1525: URL: https://github.com/apache/arrow-datafusion/pull/1525#discussion_r780335960 ## File path: datafusion/src/physical_plan/aggregates.rs ## @@ -212,6 +222,26 @@ pub fn create_aggregate_expr( "AVG(DISTINCT)

[GitHub] [arrow] thisisnic commented on pull request #11921: ARROW-12743 [R] Add DESCRIPTION fields for dev dependencies

2022-01-07 Thread GitBox
thisisnic commented on pull request #11921: URL: https://github.com/apache/arrow/pull/11921#issuecomment-1007516733 > There is a [new(ish) GHA workflow for installing dependencies](https://github.com/r-lib/actions/tree/v2-branch/setup-r-dependencies). Do we want to make use of it or should

[GitHub] [arrow] ursabot edited a comment on pull request #11894: ARROW-14029: [R] Repair map_batches()

2022-01-07 Thread GitBox
ursabot edited a comment on pull request #11894: URL: https://github.com/apache/arrow/pull/11894#issuecomment-1007435606 Benchmark runs are scheduled for baseline = e64480db51fc9622d02613f3ec60bac34d765092 and contender = f0544403b36c1d994f01d37d7ee77c08a87a6d29. f0544403b36c1d994f01d37d7

[GitHub] [arrow] ursabot edited a comment on pull request #11894: ARROW-14029: [R] Repair map_batches()

2022-01-07 Thread GitBox
ursabot edited a comment on pull request #11894: URL: https://github.com/apache/arrow/pull/11894#issuecomment-1007435606 Benchmark runs are scheduled for baseline = e64480db51fc9622d02613f3ec60bac34d765092 and contender = f0544403b36c1d994f01d37d7ee77c08a87a6d29. f0544403b36c1d994f01d37d7

[GitHub] [arrow] eerhardt commented on a change in pull request #12068: ARROW-15037: [C#] A stream processing example of IoT sensor data

2022-01-07 Thread GitBox
eerhardt commented on a change in pull request #12068: URL: https://github.com/apache/arrow/pull/12068#discussion_r780371694 ## File path: csharp/examples/IoTDataPipelineExample/Program.cs ## @@ -0,0 +1,107 @@ +// Licensed to the Apache Software Foundation (ASF) under one or m

[GitHub] [arrow] lidavidm closed pull request #12098: ARROW-14592: [C++] list_parent_indices output type should not depend on input type

2022-01-07 Thread GitBox
lidavidm closed pull request #12098: URL: https://github.com/apache/arrow/pull/12098 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubs

[GitHub] [arrow-cookbook] wjones127 commented on issue #83: [R] Recipe for random sampling

2022-01-07 Thread GitBox
wjones127 commented on issue #83: URL: https://github.com/apache/arrow-cookbook/issues/83#issuecomment-1007584263 As a part of apache/arrow#11894, I added an example to the datasets vignette to show how to create a sample from a dataset that doesn't fit into memory. This will be available

[GitHub] [arrow] ursabot commented on pull request #12098: ARROW-14592: [C++] list_parent_indices output type should not depend on input type

2022-01-07 Thread GitBox
ursabot commented on pull request #12098: URL: https://github.com/apache/arrow/pull/12098#issuecomment-1007587829 Benchmark runs are scheduled for baseline = 66832557006a39c356d8608ad9cbbdb773bed0c7 and contender = ddea0c9e5d50d0c147b1577e4aa0dc3cf9e64831. ddea0c9e5d50d0c147b1577e4aa0dc3c

[GitHub] [arrow] jonkeane closed pull request #12072: ARROW-15235: [R] drop support for R 3.3

2022-01-07 Thread GitBox
jonkeane closed pull request #12072: URL: https://github.com/apache/arrow/pull/12072 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubs

[GitHub] [arrow] ursabot edited a comment on pull request #12098: ARROW-14592: [C++] list_parent_indices output type should not depend on input type

2022-01-07 Thread GitBox
ursabot edited a comment on pull request #12098: URL: https://github.com/apache/arrow/pull/12098#issuecomment-1007587829 Benchmark runs are scheduled for baseline = 66832557006a39c356d8608ad9cbbdb773bed0c7 and contender = ddea0c9e5d50d0c147b1577e4aa0dc3cf9e64831. ddea0c9e5d50d0c147b1577e4

[GitHub] [arrow] ursabot edited a comment on pull request #12090: ARROW-15266: [R] [CI] Test reorganization triggering valgrind errors

2022-01-07 Thread GitBox
ursabot edited a comment on pull request #12090: URL: https://github.com/apache/arrow/pull/12090#issuecomment-1007435619 Benchmark runs are scheduled for baseline = f0544403b36c1d994f01d37d7ee77c08a87a6d29 and contender = 66832557006a39c356d8608ad9cbbdb773bed0c7. 66832557006a39c356d8608ad

[GitHub] [arrow] ursabot commented on pull request #12072: ARROW-15235: [R] drop support for R 3.3

2022-01-07 Thread GitBox
ursabot commented on pull request #12072: URL: https://github.com/apache/arrow/pull/12072#issuecomment-1007600991 Benchmark runs are scheduled for baseline = ddea0c9e5d50d0c147b1577e4aa0dc3cf9e64831 and contender = b325ef7f95f8348cc7b3230dd65a172bfd0ce650. b325ef7f95f8348cc7b3230dd65a172b

[GitHub] [arrow] wjones127 commented on pull request #11714: ARROW-8605: [R] Add brotli to Windows R build

2022-01-07 Thread GitBox
wjones127 commented on pull request #11714: URL: https://github.com/apache/arrow/pull/11714#issuecomment-1007601326 @github-actions crossbow submit -g r -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow] github-actions[bot] commented on pull request #11714: ARROW-8605: [R] Add brotli to Windows R build

2022-01-07 Thread GitBox
github-actions[bot] commented on pull request #11714: URL: https://github.com/apache/arrow/pull/11714#issuecomment-1007602079 Revision: a269a69b8f1fab130a5f0948179bacfdd9b0e178 Submitted crossbow builds: [ursacomputing/crossbow @ actions-1379](https://github.com/ursacomputing/crossbo

[GitHub] [arrow] zeroshade commented on a change in pull request #11832: ARROW-5599: [Go] Migrate array.{Interface,Record,Column,Chunked,Table} to arrow.{Array,Record,Column,Chunked,Table}

2022-01-07 Thread GitBox
zeroshade commented on a change in pull request #11832: URL: https://github.com/apache/arrow/pull/11832#discussion_r780427481 ## File path: go/arrow/record.go ## @@ -0,0 +1,45 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agre

[GitHub] [arrow] ursabot edited a comment on pull request #12090: ARROW-15266: [R] [CI] Test reorganization triggering valgrind errors

2022-01-07 Thread GitBox
ursabot edited a comment on pull request #12090: URL: https://github.com/apache/arrow/pull/12090#issuecomment-1007435619 Benchmark runs are scheduled for baseline = f0544403b36c1d994f01d37d7ee77c08a87a6d29 and contender = 66832557006a39c356d8608ad9cbbdb773bed0c7. 66832557006a39c356d8608ad

[GitHub] [arrow] ursabot edited a comment on pull request #12072: ARROW-15235: [R] drop support for R 3.3

2022-01-07 Thread GitBox
ursabot edited a comment on pull request #12072: URL: https://github.com/apache/arrow/pull/12072#issuecomment-1007600991 Benchmark runs are scheduled for baseline = ddea0c9e5d50d0c147b1577e4aa0dc3cf9e64831 and contender = b325ef7f95f8348cc7b3230dd65a172bfd0ce650. b325ef7f95f8348cc7b3230dd

[GitHub] [arrow] zeroshade commented on pull request #11832: ARROW-5599: [Go] Migrate array.{Interface,Record,Column,Chunked,Table} to arrow.{Array,Record,Column,Chunked,Table}

2022-01-07 Thread GitBox
zeroshade commented on pull request #11832: URL: https://github.com/apache/arrow/pull/11832#issuecomment-1007622829 @chriscasola comments added and updated. also rebased. thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

[GitHub] [arrow] coryan commented on a change in pull request #11996: ARROW-15114: [C++] GcsFileSystem uses metadata for directory markers

2022-01-07 Thread GitBox
coryan commented on a change in pull request #11996: URL: https://github.com/apache/arrow/pull/11996#discussion_r780442988 ## File path: cpp/src/arrow/filesystem/gcsfs.cc ## @@ -505,20 +555,23 @@ class GcsFileSystem::Impl { } private: - static Result GetFileInfoDirector

[GitHub] [arrow-datafusion] realno commented on a change in pull request #1525: Add stddev operator

2022-01-07 Thread GitBox
realno commented on a change in pull request #1525: URL: https://github.com/apache/arrow-datafusion/pull/1525#discussion_r780443231 ## File path: datafusion/src/physical_plan/expressions/stddev.rs ## @@ -0,0 +1,312 @@ +// Licensed to the Apache Software Foundation (ASF) under o

[GitHub] [arrow] coryan commented on a change in pull request #11996: ARROW-15114: [C++] GcsFileSystem uses metadata for directory markers

2022-01-07 Thread GitBox
coryan commented on a change in pull request #11996: URL: https://github.com/apache/arrow/pull/11996#discussion_r780443584 ## File path: cpp/src/arrow/filesystem/gcsfs.cc ## @@ -310,93 +318,107 @@ class GcsFileSystem::Impl { Result GetFileInfo(const GcsPath& path) { if

[GitHub] [arrow-datafusion] realno commented on a change in pull request #1525: Add stddev operator

2022-01-07 Thread GitBox
realno commented on a change in pull request #1525: URL: https://github.com/apache/arrow-datafusion/pull/1525#discussion_r780444768 ## File path: datafusion/src/physical_plan/expressions/variance.rs ## @@ -0,0 +1,376 @@ +// Licensed to the Apache Software Foundation (ASF) under

[GitHub] [arrow] westonpace commented on pull request #11991: ARROW-13554: [C++] Remove deprecated Scanner::Scan

2022-01-07 Thread GitBox
westonpace commented on pull request #11991: URL: https://github.com/apache/arrow/pull/11991#issuecomment-1007638718 > Did we file a JIRA for removing the deprecated flags in 8.0.0? I just created ARROW-15283 -- This is an automated message from the Apache Git Service. To respond t

[GitHub] [arrow] lidavidm opened a new pull request #12099: ARROW-15265: [C++] Fix hang in dataset writer with kDeleteMatchingPartitions and #partitions >= 8

2022-01-07 Thread GitBox
lidavidm opened a new pull request #12099: URL: https://github.com/apache/arrow/pull/12099 When the dataset writer is configured to delete existing data before writing, the target directory is on S3, the dataset is partitioned, and there are at least as many partitions as threads in the I/

[GitHub] [arrow] github-actions[bot] commented on pull request #12099: ARROW-15265: [C++] Fix hang in dataset writer with kDeleteMatchingPartitions and #partitions >= 8

2022-01-07 Thread GitBox
github-actions[bot] commented on pull request #12099: URL: https://github.com/apache/arrow/pull/12099#issuecomment-1007645730 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [arrow] lidavidm commented on pull request #12099: ARROW-15265: [C++] Fix hang in dataset writer with kDeleteMatchingPartitions and #partitions >= 8

2022-01-07 Thread GitBox
lidavidm commented on pull request #12099: URL: https://github.com/apache/arrow/pull/12099#issuecomment-1007647894 Hmm, this sometimes hangs on >8 partitions, taking another look. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [arrow] jonkeane commented on pull request #11360: ARROW-13610: [R] Unvendor cpp11

2022-01-07 Thread GitBox
jonkeane commented on pull request #11360: URL: https://github.com/apache/arrow/pull/11360#issuecomment-1007649170 @github-actions crossbow submit -g r -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow] github-actions[bot] commented on pull request #11360: ARROW-13610: [R] Unvendor cpp11

2022-01-07 Thread GitBox
github-actions[bot] commented on pull request #11360: URL: https://github.com/apache/arrow/pull/11360#issuecomment-1007649766 Revision: 43bda65ad113d82079ebfb83241895212058464c Submitted crossbow builds: [ursacomputing/crossbow @ actions-1380](https://github.com/ursacomputing/crossbo

[GitHub] [arrow] westonpace commented on a change in pull request #12099: ARROW-15265: [C++] Fix hang in dataset writer with kDeleteMatchingPartitions and #partitions >= 8

2022-01-07 Thread GitBox
westonpace commented on a change in pull request #12099: URL: https://github.com/apache/arrow/pull/12099#discussion_r780457910 ## File path: cpp/src/arrow/dataset/dataset_writer.cc ## @@ -328,12 +328,12 @@ class DatasetWriterDirectoryQueue : public util::AsyncDestroyable {

[GitHub] [arrow] lidavidm commented on pull request #12099: ARROW-15265: [C++] Fix hang in dataset writer with kDeleteMatchingPartitions and #partitions >= 8

2022-01-07 Thread GitBox
lidavidm commented on pull request #12099: URL: https://github.com/apache/arrow/pull/12099#issuecomment-1007654362 Ah, there's _another_ deadlock: finishing a FileWriter closes the underlying file. This is done as a continuation that runs on the I/O thread pool (I think). (On a side note,

[GitHub] [arrow] lidavidm commented on pull request #12099: ARROW-15265: [C++] Fix hang in dataset writer with kDeleteMatchingPartitions and #partitions >= 8

2022-01-07 Thread GitBox
lidavidm commented on pull request #12099: URL: https://github.com/apache/arrow/pull/12099#issuecomment-1007657502 And there's definitely a race condition somewhere… (using the reproducer from JIRA) ``` 8 partitions Traceback (most recent call last): File "/home/lidavidm/C

[GitHub] [arrow] westonpace commented on pull request #12099: ARROW-15265: [C++] Fix hang in dataset writer with kDeleteMatchingPartitions and #partitions >= 8

2022-01-07 Thread GitBox
westonpace commented on pull request #12099: URL: https://github.com/apache/arrow/pull/12099#issuecomment-1007657899 I think finishing a file actually happens on the CPU thread pool at the moment. Although it's at the mercy of the writer. -- This is an automated message from the Apache

[GitHub] [arrow] wjones127 commented on a change in pull request #11714: ARROW-8605: [R] Add brotli to Windows R build

2022-01-07 Thread GitBox
wjones127 commented on a change in pull request #11714: URL: https://github.com/apache/arrow/pull/11714#discussion_r780466218 ## File path: ci/scripts/r_windows_build.sh ## @@ -97,15 +97,15 @@ if [ -d mingw32/lib/ ]; then mkdir -p $DST_DIR/lib/i386 mv mingw32/lib/*.a $DST

[GitHub] [arrow] westonpace commented on pull request #12099: ARROW-15265: [C++] Fix hang in dataset writer with kDeleteMatchingPartitions and #partitions >= 8

2022-01-07 Thread GitBox
westonpace commented on pull request #12099: URL: https://github.com/apache/arrow/pull/12099#issuecomment-1007659299 > On a side note, would anyone complain if I #ifdef'd in the pthread calls to name threads on Linux to make debugging easier? Please do. -- This is an automated mes

[GitHub] [arrow] westonpace commented on pull request #12099: ARROW-15265: [C++] Fix hang in dataset writer with kDeleteMatchingPartitions and #partitions >= 8

2022-01-07 Thread GitBox
westonpace commented on pull request #12099: URL: https://github.com/apache/arrow/pull/12099#issuecomment-1007661614 > I think finishing a file actually happens on the CPU thread pool at the moment. Although it's at the mercy of the writer. Ah, but the background Close/Wait also bloc

[GitHub] [arrow] nealrichardson commented on a change in pull request #11360: ARROW-13610: [R] Unvendor cpp11

2022-01-07 Thread GitBox
nealrichardson commented on a change in pull request #11360: URL: https://github.com/apache/arrow/pull/11360#discussion_r780469400 ## File path: .github/workflows/r.yml ## @@ -324,6 +324,10 @@ jobs: cd r/tests sed -i.bak -E -e 's/"arrow"/"arrow", reporter

[GitHub] [arrow] westonpace commented on pull request #12099: ARROW-15265: [C++] Fix hang in dataset writer with kDeleteMatchingPartitions and #partitions >= 8

2022-01-07 Thread GitBox
westonpace commented on pull request #12099: URL: https://github.com/apache/arrow/pull/12099#issuecomment-1007663490 Moving the Close/Wait to the I/O thread pool will probably be an easy fix. Then the rules we are building are... * If you are going to call a synchronous filesystem m

[GitHub] [arrow] nealrichardson commented on a change in pull request #11360: ARROW-13610: [R] Unvendor cpp11

2022-01-07 Thread GitBox
nealrichardson commented on a change in pull request #11360: URL: https://github.com/apache/arrow/pull/11360#discussion_r780470569 ## File path: r/DESCRIPTION ## @@ -59,6 +59,7 @@ Suggests: testthat (>= 3.1.0), tibble, withr +LinkingTo: cpp11 (>= 0.4.2) Review c

[GitHub] [arrow] nealrichardson commented on a change in pull request #11360: ARROW-13610: [R] Unvendor cpp11

2022-01-07 Thread GitBox
nealrichardson commented on a change in pull request #11360: URL: https://github.com/apache/arrow/pull/11360#discussion_r780471592 ## File path: .github/workflows/r.yml ## @@ -324,6 +324,10 @@ jobs: cd r/tests sed -i.bak -E -e 's/"arrow"/"arrow", reporter

[GitHub] [arrow] jonkeane commented on a change in pull request #11360: ARROW-13610: [R] Unvendor cpp11

2022-01-07 Thread GitBox
jonkeane commented on a change in pull request #11360: URL: https://github.com/apache/arrow/pull/11360#discussion_r780471562 ## File path: .github/workflows/r.yml ## @@ -324,6 +324,10 @@ jobs: cd r/tests sed -i.bak -E -e 's/"arrow"/"arrow", reporter = "loc

[GitHub] [arrow] lidavidm commented on pull request #12099: ARROW-15265: [C++] Fix hang in dataset writer with kDeleteMatchingPartitions and #partitions >= 8

2022-01-07 Thread GitBox
lidavidm commented on pull request #12099: URL: https://github.com/apache/arrow/pull/12099#issuecomment-1007667064 Surprisingly it is the I/O thread pool: ``` (gdb) info thread Id Target Id Frame * 1Thread 0x7f8856a04740 (LWP 27248) "python" 0x7f88565f5ad

[GitHub] [arrow] nealrichardson commented on a change in pull request #11360: ARROW-13610: [R] Unvendor cpp11

2022-01-07 Thread GitBox
nealrichardson commented on a change in pull request #11360: URL: https://github.com/apache/arrow/pull/11360#discussion_r780472530 ## File path: .github/workflows/r.yml ## @@ -324,6 +324,10 @@ jobs: cd r/tests sed -i.bak -E -e 's/"arrow"/"arrow", reporter

[GitHub] [arrow] jonkeane commented on a change in pull request #11360: ARROW-13610: [R] Unvendor cpp11

2022-01-07 Thread GitBox
jonkeane commented on a change in pull request #11360: URL: https://github.com/apache/arrow/pull/11360#discussion_r780472665 ## File path: r/DESCRIPTION ## @@ -59,6 +59,7 @@ Suggests: testthat (>= 3.1.0), tibble, withr +LinkingTo: cpp11 (>= 0.4.2) Review comment

[GitHub] [arrow] jonkeane commented on a change in pull request #11360: ARROW-13610: [R] Unvendor cpp11

2022-01-07 Thread GitBox
jonkeane commented on a change in pull request #11360: URL: https://github.com/apache/arrow/pull/11360#discussion_r780473234 ## File path: .github/workflows/r.yml ## @@ -324,6 +324,10 @@ jobs: cd r/tests sed -i.bak -E -e 's/"arrow"/"arrow", reporter = "loc

[GitHub] [arrow] westonpace commented on a change in pull request #11991: ARROW-13554: [C++] Remove deprecated Scanner::Scan

2022-01-07 Thread GitBox
westonpace commented on a change in pull request #11991: URL: https://github.com/apache/arrow/pull/11991#discussion_r780473351 ## File path: python/pyarrow/_dataset.pyx ## @@ -2239,10 +2233,6 @@ cdef class Scanner(_Weakrefable): use_threads : bool, default True If

[GitHub] [arrow] westonpace commented on a change in pull request #11991: ARROW-13554: [C++] Remove deprecated Scanner::Scan

2022-01-07 Thread GitBox
westonpace commented on a change in pull request #11991: URL: https://github.com/apache/arrow/pull/11991#discussion_r780473574 ## File path: cpp/src/arrow/dataset/scanner.cc ## @@ -823,10 +584,77 @@ Result AsyncScanner::CountRows() { return total.load(); } +Result> AsyncS

[GitHub] [arrow] westonpace commented on a change in pull request #11991: ARROW-13554: [C++] Remove deprecated Scanner::Scan

2022-01-07 Thread GitBox
westonpace commented on a change in pull request #11991: URL: https://github.com/apache/arrow/pull/11991#discussion_r780475401 ## File path: cpp/src/arrow/dataset/scanner.h ## @@ -138,41 +133,46 @@ struct ARROW_DS_EXPORT ScanOptions { // This is used by Fragment implementati

[GitHub] [arrow] lidavidm edited a comment on pull request #12099: ARROW-15265: [C++] Fix hang in dataset writer with kDeleteMatchingPartitions and #partitions >= 8

2022-01-07 Thread GitBox
lidavidm edited a comment on pull request #12099: URL: https://github.com/apache/arrow/pull/12099#issuecomment-1007667064 Surprisingly it is the I/O thread pool: ``` (gdb) info thread Id Target Id Frame * 1Thread 0x7f8856a04740 (LWP 27248) "python" 0x7f88

[GitHub] [arrow-rs] alamb commented on pull request #1127: *_dyn_scalar kernels: Support Float32Array and Float64Array,

2022-01-07 Thread GitBox
alamb commented on pull request #1127: URL: https://github.com/apache/arrow-rs/pull/1127#issuecomment-1007678179 > I missed it 😿 No worries @liukun4515 -- I would be happy to make any changes / suggestions you may have. -- This is an automated message from the Apache Git Service.

[GitHub] [arrow] lidavidm commented on pull request #12099: ARROW-15265: [C++] Fix hang in dataset writer with kDeleteMatchingPartitions and #partitions >= 8

2022-01-07 Thread GitBox
lidavidm commented on pull request #12099: URL: https://github.com/apache/arrow/pull/12099#issuecomment-1007681528 Thanks for the help Weston, this last commit should fix the deadlock…though I still occasionally see that `OSError`, but I think we can try to track that down separately. --

[GitHub] [arrow] lidavidm commented on a change in pull request #11991: ARROW-13554: [C++] Remove deprecated Scanner::Scan

2022-01-07 Thread GitBox
lidavidm commented on a change in pull request #11991: URL: https://github.com/apache/arrow/pull/11991#discussion_r780484531 ## File path: cpp/src/arrow/dataset/scanner.h ## @@ -138,41 +133,46 @@ struct ARROW_DS_EXPORT ScanOptions { // This is used by Fragment implementation

[GitHub] [arrow] emkornfield commented on a change in pull request #11996: ARROW-15114: [C++] GcsFileSystem uses metadata for directory markers

2022-01-07 Thread GitBox
emkornfield commented on a change in pull request #11996: URL: https://github.com/apache/arrow/pull/11996#discussion_r780487058 ## File path: cpp/src/arrow/filesystem/gcsfs.cc ## @@ -310,93 +318,107 @@ class GcsFileSystem::Impl { Result GetFileInfo(const GcsPath& path) {

[GitHub] [arrow] ursabot edited a comment on pull request #12098: ARROW-14592: [C++] list_parent_indices output type should not depend on input type

2022-01-07 Thread GitBox
ursabot edited a comment on pull request #12098: URL: https://github.com/apache/arrow/pull/12098#issuecomment-1007587829 Benchmark runs are scheduled for baseline = 66832557006a39c356d8608ad9cbbdb773bed0c7 and contender = ddea0c9e5d50d0c147b1577e4aa0dc3cf9e64831. ddea0c9e5d50d0c147b1577e4

[GitHub] [arrow] emkornfield commented on pull request #11996: ARROW-15114: [C++] GcsFileSystem uses metadata for directory markers

2022-01-07 Thread GitBox
emkornfield commented on pull request #11996: URL: https://github.com/apache/arrow/pull/11996#issuecomment-1007687049 Looks like all failures are unrelated. I'll merge this on Monday if there are no further comments. -- This is an automated message from the Apache Git Service. To respon

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1401: Fix bugs with nullability during rewrites: Combine `simplify` and `Simplifier`

2022-01-07 Thread GitBox
alamb commented on a change in pull request #1401: URL: https://github.com/apache/arrow-datafusion/pull/1401#discussion_r780489370 ## File path: datafusion/src/optimizer/simplify_expressions.rs ## @@ -554,212 +416,250 @@ impl<'a> Simplifier<'a> { false } -fn

[GitHub] [arrow] ursabot edited a comment on pull request #12098: ARROW-14592: [C++] list_parent_indices output type should not depend on input type

2022-01-07 Thread GitBox
ursabot edited a comment on pull request #12098: URL: https://github.com/apache/arrow/pull/12098#issuecomment-1007587829 Benchmark runs are scheduled for baseline = 66832557006a39c356d8608ad9cbbdb773bed0c7 and contender = ddea0c9e5d50d0c147b1577e4aa0dc3cf9e64831. ddea0c9e5d50d0c147b1577e4

[GitHub] [arrow] lidavidm commented on pull request #12099: ARROW-15265: [C++] Fix hang in dataset writer with kDeleteMatchingPartitions and #partitions >= 8

2022-01-07 Thread GitBox
lidavidm commented on pull request #12099: URL: https://github.com/apache/arrow/pull/12099#issuecomment-1007703162 1) Filed ARROW-15285 for the OSError, though, it seems quite rare (2/200 runs) 2) Increasing partitions to 16 causes it to hang again…taking a look… -- This is an automat

[GitHub] [arrow] lidavidm commented on pull request #12099: ARROW-15265: [C++] Fix hang in dataset writer with kDeleteMatchingPartitions and #partitions >= 8

2022-01-07 Thread GitBox
lidavidm commented on pull request #12099: URL: https://github.com/apache/arrow/pull/12099#issuecomment-1007706164 It's the same hang with 16 partitions so I think we will need a CloseAsync(). -- This is an automated message from the Apache Git Service. To respond to the message, please l

[GitHub] [arrow] lidavidm commented on pull request #12099: ARROW-15265: [C++] Fix hang in dataset writer with kDeleteMatchingPartitions and #partitions >= 8

2022-01-07 Thread GitBox
lidavidm commented on pull request #12099: URL: https://github.com/apache/arrow/pull/12099#issuecomment-1007726131 Ah, the fundamental issue is S3FS implements writes asynchronously (unless background_writes=False), but our file interfaces are still mostly synchronous, and the dataset writ

[GitHub] [arrow] mbrobbel opened a new pull request #12100: ARROW-15061: [C++] Add logging for kernel functions and exec plan nodes

2022-01-07 Thread GitBox
mbrobbel opened a new pull request #12100: URL: https://github.com/apache/arrow/pull/12100 I'm still working on this but opening this for visibility. @lidavidm @westonpace -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [arrow] github-actions[bot] commented on pull request #12100: ARROW-15061: [C++] Add logging for kernel functions and exec plan nodes

2022-01-07 Thread GitBox
github-actions[bot] commented on pull request #12100: URL: https://github.com/apache/arrow/pull/12100#issuecomment-1007745646 https://issues.apache.org/jira/browse/ARROW-15061 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[GitHub] [arrow] kou closed pull request #12093: ARROW-15273: [GLib] add garrow_function_get_options_type()

2022-01-07 Thread GitBox
kou closed pull request #12093: URL: https://github.com/apache/arrow/pull/12093 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

[GitHub] [arrow] kou opened a new pull request #12101: ARROW-15274: [Ruby] Improve Arrow::Function#execute usability

2022-01-07 Thread GitBox
kou opened a new pull request #12101: URL: https://github.com/apache/arrow/pull/12101 * Raw Hash is accepted as options * #call-able -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [arrow] github-actions[bot] commented on pull request #12101: ARROW-15274: [Ruby] Improve Arrow::Function#execute usability

2022-01-07 Thread GitBox
github-actions[bot] commented on pull request #12101: URL: https://github.com/apache/arrow/pull/12101#issuecomment-1007747559 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1525: Add stddev operator

2022-01-07 Thread GitBox
alamb commented on a change in pull request #1525: URL: https://github.com/apache/arrow-datafusion/pull/1525#discussion_r780516560 ## File path: datafusion/src/scalar.rs ## @@ -526,6 +526,282 @@ macro_rules! eq_array_primitive { } impl ScalarValue { +/// Return true if

[GitHub] [arrow] ursabot commented on pull request #12093: ARROW-15273: [GLib] add garrow_function_get_options_type()

2022-01-07 Thread GitBox
ursabot commented on pull request #12093: URL: https://github.com/apache/arrow/pull/12093#issuecomment-1007749487 Benchmark runs are scheduled for baseline = b325ef7f95f8348cc7b3230dd65a172bfd0ce650 and contender = 79436648baed0d5b26f7b10f362e6136efc3f4f4. 79436648baed0d5b26f7b10f362e6136

[GitHub] [arrow] lidavidm commented on pull request #12100: ARROW-15061: [C++] Add logging for kernel functions and exec plan nodes

2022-01-07 Thread GitBox
lidavidm commented on pull request #12100: URL: https://github.com/apache/arrow/pull/12100#issuecomment-1007754548 Cool! The macro definitions look fairly useful. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[GitHub] [arrow] lidavidm commented on pull request #12100: ARROW-15061: [C++] Add logging for kernel functions and exec plan nodes

2022-01-07 Thread GitBox
lidavidm commented on pull request #12100: URL: https://github.com/apache/arrow/pull/12100#issuecomment-1007754899 If you have a screenshot or any quick example of the output to share here it would also be useful, I think. -- This is an automated message from the Apache Git Service. To r

  1   2   >