[GitHub] [arrow-rs] viirya opened a new issue #1153: support more mathematics kernels for array and scalar value

2022-01-11 Thread GitBox
viirya opened a new issue #1153: URL: https://github.com/apache/arrow-rs/issues/1153 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** We only have `powf_scalar`, `modulus_scalar` and `divide_scalar`. We need more mathematics k

[GitHub] [arrow-datafusion] Igosuki commented on pull request #68: Experimenting with arrow2

2022-01-11 Thread GitBox
Igosuki commented on pull request #68: URL: https://github.com/apache/arrow-datafusion/pull/68#issuecomment-1009708490 Ok I didn't see it because I was looking for a port of DecimalArray, my bad. On Tue, Jan 11, 2022 at 2:57 AM QP Hou ***@***.***> wrote: > @Igosuki

[GitHub] [arrow] vibhatha commented on a change in pull request #12033: ARROW-15091: [C++][Doc] Document nodes in C++ streaming execution engine

2022-01-11 Thread GitBox
vibhatha commented on a change in pull request #12033: URL: https://github.com/apache/arrow/pull/12033#discussion_r781901435 ## File path: cpp/examples/arrow/execution_plan_documentation_examples.cc ## @@ -0,0 +1,1125 @@ +// Licensed to the Apache Software Foundation (ASF) unde

[GitHub] [arrow] ursabot edited a comment on pull request #12009: ARROW-15172: [Go] Add Arm64 Neon implementation for Arrow-math

2022-01-11 Thread GitBox
ursabot edited a comment on pull request #12009: URL: https://github.com/apache/arrow/pull/12009#issuecomment-1009006161 Benchmark runs are scheduled for baseline = 4ddcb352dc49f7a91ffd160c8a708908cf003f33 and contender = da5b0360aac308e15dd058b594a17224c8eb7e93. da5b0360aac308e15dd058b59

[GitHub] [arrow] vibhatha commented on a change in pull request #12033: ARROW-15091: [C++][Doc] Document nodes in C++ streaming execution engine

2022-01-11 Thread GitBox
vibhatha commented on a change in pull request #12033: URL: https://github.com/apache/arrow/pull/12033#discussion_r781909250 ## File path: docs/source/cpp/streaming_execution.rst ## @@ -305,3 +305,451 @@ Datasets may be scanned multiple times; just make multiple scan nodes fr

[GitHub] [arrow] thisisnic closed pull request #12117: ARROW-15295: [R] Add 6.0.0 to our old versions to check

2022-01-11 Thread GitBox
thisisnic closed pull request #12117: URL: https://github.com/apache/arrow/pull/12117 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsub

[GitHub] [arrow] ursabot commented on pull request #12117: ARROW-15295: [R] Add 6.0.0 to our old versions to check

2022-01-11 Thread GitBox
ursabot commented on pull request #12117: URL: https://github.com/apache/arrow/pull/12117#issuecomment-1009733881 Benchmark runs are scheduled for baseline = 540dbf6d58c4c17d772583d2516f5847ef7d34fd and contender = 123a798288b59c080a2b624384313d390ceef9d7. 123a798288b59c080a2b624384313d39

[GitHub] [arrow] vibhatha commented on a change in pull request #12033: ARROW-15091: [C++][Doc] Document nodes in C++ streaming execution engine

2022-01-11 Thread GitBox
vibhatha commented on a change in pull request #12033: URL: https://github.com/apache/arrow/pull/12033#discussion_r781936458 ## File path: cpp/examples/arrow/execution_plan_documentation_examples.cc ## @@ -0,0 +1,1125 @@ +// Licensed to the Apache Software Foundation (ASF) unde

[GitHub] [arrow] AlenkaF commented on a change in pull request #11942: ARROW-14762: [Doc] Additional info and resources

2022-01-11 Thread GitBox
AlenkaF commented on a change in pull request #11942: URL: https://github.com/apache/arrow/pull/11942#discussion_r781943911 ## File path: docs/source/developers/guide/resources.rst ## @@ -27,3 +27,51 @@ Additional information and resources

[GitHub] [arrow] vibhatha commented on a change in pull request #12033: ARROW-15091: [C++][Doc] Document nodes in C++ streaming execution engine

2022-01-11 Thread GitBox
vibhatha commented on a change in pull request #12033: URL: https://github.com/apache/arrow/pull/12033#discussion_r781945495 ## File path: cpp/examples/arrow/execution_plan_documentation_examples.cc ## @@ -0,0 +1,1125 @@ +// Licensed to the Apache Software Foundation (ASF) unde

[GitHub] [arrow] jorisvandenbossche closed pull request #12007: ARROW-15087: [Python][Docs] Document MapArray and update parent class to ListArray

2022-01-11 Thread GitBox
jorisvandenbossche closed pull request #12007: URL: https://github.com/apache/arrow/pull/12007 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: git

[GitHub] [arrow] jorisvandenbossche commented on pull request #12007: ARROW-15087: [Python][Docs] Document MapArray and update parent class to ListArray

2022-01-11 Thread GitBox
jorisvandenbossche commented on pull request #12007: URL: https://github.com/apache/arrow/pull/12007#issuecomment-1009756532 Thanks @wjones127 for the nice PR! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [arrow] vibhatha commented on a change in pull request #12033: ARROW-15091: [C++][Doc] Document nodes in C++ streaming execution engine

2022-01-11 Thread GitBox
vibhatha commented on a change in pull request #12033: URL: https://github.com/apache/arrow/pull/12033#discussion_r781950741 ## File path: docs/source/cpp/streaming_execution.rst ## @@ -305,3 +305,451 @@ Datasets may be scanned multiple times; just make multiple scan nodes fr

[GitHub] [arrow] jorisvandenbossche commented on pull request #12105: ARROW-14098: [C++] subtract(time, time) -> interval kernel

2022-01-11 Thread GitBox
jorisvandenbossche commented on pull request #12105: URL: https://github.com/apache/arrow/pull/12105#issuecomment-1009760541 So if `"subtract"` for timestamp results in duration type, I think `subtract(time, time)` should also give duration? (and not interval) And it seems that for s

[GitHub] [arrow] vibhatha commented on a change in pull request #12033: ARROW-15091: [C++][Doc] Document nodes in C++ streaming execution engine

2022-01-11 Thread GitBox
vibhatha commented on a change in pull request #12033: URL: https://github.com/apache/arrow/pull/12033#discussion_r781955582 ## File path: docs/source/cpp/streaming_execution.rst ## @@ -305,3 +305,451 @@ Datasets may be scanned multiple times; just make multiple scan nodes fr

[GitHub] [arrow] vibhatha commented on a change in pull request #12033: ARROW-15091: [C++][Doc] Document nodes in C++ streaming execution engine

2022-01-11 Thread GitBox
vibhatha commented on a change in pull request #12033: URL: https://github.com/apache/arrow/pull/12033#discussion_r781958388 ## File path: docs/source/cpp/streaming_execution.rst ## @@ -305,3 +305,451 @@ Datasets may be scanned multiple times; just make multiple scan nodes fr

[GitHub] [arrow] vibhatha commented on a change in pull request #12033: ARROW-15091: [C++][Doc] Document nodes in C++ streaming execution engine

2022-01-11 Thread GitBox
vibhatha commented on a change in pull request #12033: URL: https://github.com/apache/arrow/pull/12033#discussion_r781958914 ## File path: docs/source/cpp/streaming_execution.rst ## @@ -305,3 +305,451 @@ Datasets may be scanned multiple times; just make multiple scan nodes fr

[GitHub] [arrow] jorisvandenbossche commented on pull request #12010: ARROW-6001 [Python]: Add from_pylist() and to_pylist() to pyarrow.Table to convert list of records

2022-01-11 Thread GitBox
jorisvandenbossche commented on pull request #12010: URL: https://github.com/apache/arrow/pull/12010#issuecomment-1009767550 @AlenkaF there is also a linter error (probably due to my suggestions) -- This is an automated message from the Apache Git Service. To respond to the message, pleas

[GitHub] [arrow] vibhatha commented on a change in pull request #12033: ARROW-15091: [C++][Doc] Document nodes in C++ streaming execution engine

2022-01-11 Thread GitBox
vibhatha commented on a change in pull request #12033: URL: https://github.com/apache/arrow/pull/12033#discussion_r781965392 ## File path: docs/source/cpp/streaming_execution.rst ## @@ -305,3 +305,451 @@ Datasets may be scanned multiple times; just make multiple scan nodes fr

[GitHub] [arrow] AlenkaF commented on pull request #12010: ARROW-6001 [Python]: Add from_pylist() and to_pylist() to pyarrow.Table to convert list of records

2022-01-11 Thread GitBox
AlenkaF commented on pull request #12010: URL: https://github.com/apache/arrow/pull/12010#issuecomment-1009768805 Will pull the changes and correct, thanks for the ping! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [arrow] sanjibansg commented on a change in pull request #12104: ARROW-15269: [C++][Docs] Clarify that not all compute functions are invocable via CallFunction

2022-01-11 Thread GitBox
sanjibansg commented on a change in pull request #12104: URL: https://github.com/apache/arrow/pull/12104#discussion_r781968947 ## File path: docs/source/cpp/compute.rst ## @@ -98,6 +98,8 @@ exact semantics of the function:: min_value = min_max.scalar_as().value[0]; max_

[GitHub] [arrow] jorisvandenbossche closed pull request #12078: ARROW-14448: [Python] Update pyarrow.array() docstring note on timestamp (timezone) conversion

2022-01-11 Thread GitBox
jorisvandenbossche closed pull request #12078: URL: https://github.com/apache/arrow/pull/12078 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: git

[GitHub] [arrow] jorisvandenbossche commented on pull request #12078: ARROW-14448: [Python] Update pyarrow.array() docstring note on timestamp (timezone) conversion

2022-01-11 Thread GitBox
jorisvandenbossche commented on pull request #12078: URL: https://github.com/apache/arrow/pull/12078#issuecomment-1009776167 Thanks @sanjibansg ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #11978: ARROW-15137: [Dev] Update archery crossbow latest-prefix to work with nightly dates

2022-01-11 Thread GitBox
jorisvandenbossche commented on a change in pull request #11978: URL: https://github.com/apache/arrow/pull/11978#discussion_r781982186 ## File path: dev/archery/archery/crossbow/core.py ## @@ -537,17 +537,36 @@ def _latest_prefix_id(self, prefix): latest = -1

[GitHub] [arrow] JasperSch commented on issue #11934: [R] errors when downloading parquet files from s3.

2022-01-11 Thread GitBox
JasperSch commented on issue #11934: URL: https://github.com/apache/arrow/issues/11934#issuecomment-1009778419 @paleolimbot Yes, that would be reasonable. Decided to open it here in the first place since I've got the feeling that the root cause of the issues lies in the way `arrow:

[GitHub] [arrow] vibhatha commented on a change in pull request #12033: ARROW-15091: [C++][Doc] Document nodes in C++ streaming execution engine

2022-01-11 Thread GitBox
vibhatha commented on a change in pull request #12033: URL: https://github.com/apache/arrow/pull/12033#discussion_r781981956 ## File path: docs/source/cpp/streaming_execution.rst ## @@ -305,3 +305,451 @@ Datasets may be scanned multiple times; just make multiple scan nodes fr

[GitHub] [arrow] jorisvandenbossche closed pull request #11978: ARROW-15137: [Dev] Update archery crossbow latest-prefix to work with nightly dates

2022-01-11 Thread GitBox
jorisvandenbossche closed pull request #11978: URL: https://github.com/apache/arrow/pull/11978 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: git

[GitHub] [arrow] vibhatha commented on a change in pull request #12033: ARROW-15091: [C++][Doc] Document nodes in C++ streaming execution engine

2022-01-11 Thread GitBox
vibhatha commented on a change in pull request #12033: URL: https://github.com/apache/arrow/pull/12033#discussion_r781984653 ## File path: docs/source/cpp/streaming_execution.rst ## @@ -305,3 +305,451 @@ Datasets may be scanned multiple times; just make multiple scan nodes fr

[GitHub] [arrow] vibhatha commented on a change in pull request #12033: ARROW-15091: [C++][Doc] Document nodes in C++ streaming execution engine

2022-01-11 Thread GitBox
vibhatha commented on a change in pull request #12033: URL: https://github.com/apache/arrow/pull/12033#discussion_r781988046 ## File path: docs/source/cpp/streaming_execution.rst ## @@ -305,3 +305,451 @@ Datasets may be scanned multiple times; just make multiple scan nodes fr

[GitHub] [arrow] vibhatha commented on a change in pull request #12033: ARROW-15091: [C++][Doc] Document nodes in C++ streaming execution engine

2022-01-11 Thread GitBox
vibhatha commented on a change in pull request #12033: URL: https://github.com/apache/arrow/pull/12033#discussion_r781989744 ## File path: docs/source/cpp/streaming_execution.rst ## @@ -305,3 +305,451 @@ Datasets may be scanned multiple times; just make multiple scan nodes fr

[GitHub] [arrow] thisisnic commented on a change in pull request #11921: ARROW-12743 [R] Add DESCRIPTION fields for dev dependencies

2022-01-11 Thread GitBox
thisisnic commented on a change in pull request #11921: URL: https://github.com/apache/arrow/pull/11921#discussion_r781995880 ## File path: r/vignettes/developers/workflow.Rmd ## @@ -4,6 +4,22 @@ knitr::opts_chunk$set(error = TRUE, eval = FALSE) ``` +The Arrow R package use

[GitHub] [arrow-rs] tustvold opened a new pull request #1154: POC: Async parquet reader

2022-01-11 Thread GitBox
tustvold opened a new pull request #1154: URL: https://github.com/apache/arrow-rs/pull/1154 **Proof of concept, tests are currently extremely limited** # Which issue does this PR close? Closes #111 . # Rationale for this change See ticket, in particular I wanted t

[GitHub] [arrow-rs] tustvold commented on a change in pull request #1154: POC: Async parquet reader

2022-01-11 Thread GitBox
tustvold commented on a change in pull request #1154: URL: https://github.com/apache/arrow-rs/pull/1154#discussion_r782005696 ## File path: parquet/src/file/footer.rs ## @@ -78,7 +78,6 @@ pub fn parse_metadata(chunk_reader: &R) -> Result; Review comment: Drive by clean

[GitHub] [arrow-rs] tustvold commented on a change in pull request #1154: POC: Async parquet reader

2022-01-11 Thread GitBox
tustvold commented on a change in pull request #1154: URL: https://github.com/apache/arrow-rs/pull/1154#discussion_r782006587 ## File path: parquet/src/arrow/array_reader.rs ## @@ -100,6 +100,26 @@ pub trait ArrayReader { fn get_rep_levels(&self) -> Option<&[i16]>; } +/

[GitHub] [arrow] dhruv9vats commented on a change in pull request #11946: ARROW-13663: [C++] RecordBatchReader STL-like iteration

2022-01-11 Thread GitBox
dhruv9vats commented on a change in pull request #11946: URL: https://github.com/apache/arrow/pull/11946#discussion_r782012491 ## File path: cpp/src/arrow/record_batch.h ## @@ -234,6 +234,67 @@ class ARROW_EXPORT RecordBatchReader { return batch; } + class RecordBatc

[GitHub] [arrow] ursabot edited a comment on pull request #12117: ARROW-15295: [R] Add 6.0.0 to our old versions to check

2022-01-11 Thread GitBox
ursabot edited a comment on pull request #12117: URL: https://github.com/apache/arrow/pull/12117#issuecomment-1009733881 Benchmark runs are scheduled for baseline = 540dbf6d58c4c17d772583d2516f5847ef7d34fd and contender = 123a798288b59c080a2b624384313d390ceef9d7. 123a798288b59c080a2b62438

[GitHub] [arrow] ursabot commented on pull request #12078: ARROW-14448: [Python] Update pyarrow.array() docstring note on timestamp (timezone) conversion

2022-01-11 Thread GitBox
ursabot commented on pull request #12078: URL: https://github.com/apache/arrow/pull/12078#issuecomment-1009832533 Benchmark runs are scheduled for baseline = 0363df1b44274707228af7274102bbe50cdb68be and contender = 488f084280fa5e2acea76dcb02dd0c3ee655f55b. 488f084280fa5e2acea76dcb02dd0c3e

[GitHub] [arrow] ursabot commented on pull request #12007: ARROW-15087: [Python][Docs] Document MapArray and update parent class to ListArray

2022-01-11 Thread GitBox
ursabot commented on pull request #12007: URL: https://github.com/apache/arrow/pull/12007#issuecomment-1009832519 Benchmark runs are scheduled for baseline = 123a798288b59c080a2b624384313d390ceef9d7 and contender = 0363df1b44274707228af7274102bbe50cdb68be. 0363df1b44274707228af7274102bbe5

[GitHub] [arrow] ursabot commented on pull request #11978: ARROW-15137: [Dev] Update archery crossbow latest-prefix to work with nightly dates

2022-01-11 Thread GitBox
ursabot commented on pull request #11978: URL: https://github.com/apache/arrow/pull/11978#issuecomment-1009832550 Benchmark runs are scheduled for baseline = 488f084280fa5e2acea76dcb02dd0c3ee655f55b and contender = d88e23273fd4eb7945a5fb94cfdb6315f412ea83. d88e23273fd4eb7945a5fb94cfdb6315

[GitHub] [arrow] everron opened a new issue #12118: [R] Error when reading parquet file using FileSystem object

2022-01-11 Thread GitBox
everron opened a new issue #12118: URL: https://github.com/apache/arrow/issues/12118 Hello, I am encountering an issue when trying to read a parquet file using `read_parquet` with an `S3FileSystem` created with `s3_bucket()`. I created a worker that get the last parquet file

[GitHub] [arrow-rs] codecov-commenter commented on pull request #1154: POC: Async parquet reader

2022-01-11 Thread GitBox
codecov-commenter commented on pull request #1154: URL: https://github.com/apache/arrow-rs/pull/1154#issuecomment-1009835577 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1154?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=T

[GitHub] [arrow] thisisnic commented on a change in pull request #11942: ARROW-14762: [Doc] Additional info and resources

2022-01-11 Thread GitBox
thisisnic commented on a change in pull request #11942: URL: https://github.com/apache/arrow/pull/11942#discussion_r782023020 ## File path: docs/source/developers/guide/resources.rst ## @@ -27,3 +27,51 @@ Additional information and resourc

[GitHub] [arrow] vibhatha commented on a change in pull request #12033: ARROW-15091: [C++][Doc] Document nodes in C++ streaming execution engine

2022-01-11 Thread GitBox
vibhatha commented on a change in pull request #12033: URL: https://github.com/apache/arrow/pull/12033#discussion_r782034480 ## File path: cpp/examples/arrow/execution_plan_documentation_examples.cc ## @@ -0,0 +1,1125 @@ +// Licensed to the Apache Software Foundation (ASF) unde

[GitHub] [arrow] ursabot edited a comment on pull request #12115: ARROW-15286: [Python] Convert indices passed to FileSystemDataset.take to array to avoid segfault

2022-01-11 Thread GitBox
ursabot edited a comment on pull request #12115: URL: https://github.com/apache/arrow/pull/12115#issuecomment-1009160125 Benchmark runs are scheduled for baseline = da5b0360aac308e15dd058b594a17224c8eb7e93 and contender = 43bc33b05660c5909a96e1d05850faf2fbd9752a. 43bc33b05660c5909a96e1d05

[GitHub] [arrow] github-actions[bot] commented on pull request #12119: ARROW-14754: [Doc] Steps in making your first PR - building R package

2022-01-11 Thread GitBox
github-actions[bot] commented on pull request #12119: URL: https://github.com/apache/arrow/pull/12119#issuecomment-1009856333 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [arrow] dragosmg commented on a change in pull request #11921: ARROW-12743 [R] Add DESCRIPTION fields for dev dependencies

2022-01-11 Thread GitBox
dragosmg commented on a change in pull request #11921: URL: https://github.com/apache/arrow/pull/11921#discussion_r782045709 ## File path: r/vignettes/developers/workflow.Rmd ## @@ -4,6 +4,22 @@ knitr::opts_chunk$set(error = TRUE, eval = FALSE) ``` +The Arrow R package uses

[GitHub] [arrow] thisisnic commented on pull request #12119: ARROW-14754: [Doc] Steps in making your first PR - building R package

2022-01-11 Thread GitBox
thisisnic commented on pull request #12119: URL: https://github.com/apache/arrow/pull/12119#issuecomment-1009859790 @AlenkaF Please can you review this and let me know if this is the kind of thing you'd like in this section or if you'd prefer I add more detail? -- This is an automated

[GitHub] [arrow-datafusion] tustvold commented on issue #1532: Discussion: Switch DataFusion to using arrow2?

2022-01-11 Thread GitBox
tustvold commented on issue #1532: URL: https://github.com/apache/arrow-datafusion/issues/1532#issuecomment-1009868992 > Will arrow-rs eventually support async file IO? Requiring a synchronous ChuckReader is currently a major limitation in supporting alternate ObjectStores FWIW it w

[GitHub] [arrow] dhruv9vats commented on a change in pull request #11946: ARROW-13663: [C++] RecordBatchReader STL-like iteration

2022-01-11 Thread GitBox
dhruv9vats commented on a change in pull request #11946: URL: https://github.com/apache/arrow/pull/11946#discussion_r782060241 ## File path: cpp/src/arrow/record_batch.h ## @@ -234,6 +234,67 @@ class ARROW_EXPORT RecordBatchReader { return batch; } + class RecordBatc

[GitHub] [arrow] github-actions[bot] commented on pull request #12120: ARROW-15279: [R] Update "writing bindings" dev docs based on user feedback

2022-01-11 Thread GitBox
github-actions[bot] commented on pull request #12120: URL: https://github.com/apache/arrow/pull/12120#issuecomment-1009875845 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [arrow] vibhatha commented on a change in pull request #12033: ARROW-15091: [C++][Doc] Document nodes in C++ streaming execution engine

2022-01-11 Thread GitBox
vibhatha commented on a change in pull request #12033: URL: https://github.com/apache/arrow/pull/12033#discussion_r782061680 ## File path: cpp/examples/arrow/execution_plan_documentation_examples.cc ## @@ -0,0 +1,1125 @@ +// Licensed to the Apache Software Foundation (ASF) unde

[GitHub] [arrow] rok commented on pull request #12105: ARROW-14098: [C++] subtract(time, time) -> interval kernel

2022-01-11 Thread GitBox
rok commented on pull request #12105: URL: https://github.com/apache/arrow/pull/12105#issuecomment-1009879791 Switched to duration output. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [arrow] rok commented on a change in pull request #12105: ARROW-14098: [C++] subtract(time, time) -> interval kernel

2022-01-11 Thread GitBox
rok commented on a change in pull request #12105: URL: https://github.com/apache/arrow/pull/12105#discussion_r782066770 ## File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc ## @@ -190,6 +190,13 @@ struct Subtract { } }; +struct SubtractTemporal32 { + template

[GitHub] [arrow] rok commented on a change in pull request #12105: ARROW-14098: [C++] subtract(time, time) -> interval kernel

2022-01-11 Thread GitBox
rok commented on a change in pull request #12105: URL: https://github.com/apache/arrow/pull/12105#discussion_r782066770 ## File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc ## @@ -190,6 +190,13 @@ struct Subtract { } }; +struct SubtractTemporal32 { + template

[GitHub] [arrow-datafusion] alamb commented on issue #1532: Discussion: Switch DataFusion to using arrow2?

2022-01-11 Thread GitBox
alamb commented on issue #1532: URL: https://github.com/apache/arrow-datafusion/issues/1532#issuecomment-1009884544 > I am happy to help out with this if there are things people would particularly like to see ported across? I have heard lots of excitement about `async` IO (for parq

[GitHub] [arrow-datafusion] alamb closed issue #118: Add SQL support for IN expression

2022-01-11 Thread GitBox
alamb closed issue #118: URL: https://github.com/apache/arrow-datafusion/issues/118 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

[GitHub] [arrow-datafusion] alamb commented on issue #118: Add SQL support for IN expression

2022-01-11 Thread GitBox
alamb commented on issue #118: URL: https://github.com/apache/arrow-datafusion/issues/118#issuecomment-1009891051 Thanks @Ted-Jiang you are right! @seddonm1 added support in https://github.com/apache/arrow-datafusion/commit/93c6c8879c22f96481b5ce7b6057466986f8d340 I think @xudong9

[GitHub] [arrow-datafusion] alamb commented on issue #118: Add SQL support for IN expression

2022-01-11 Thread GitBox
alamb commented on issue #118: URL: https://github.com/apache/arrow-datafusion/issues/118#issuecomment-1009891655 There were some attempts to improve the performance of INLIST here: https://github.com/apache/arrow-datafusion/pull/806 https://github.com/apache/arrow-datafusion/pull/278 fr

[GitHub] [arrow-datafusion] alamb commented on pull request #1547: Add batch operations to stddev

2022-01-11 Thread GitBox
alamb commented on pull request #1547: URL: https://github.com/apache/arrow-datafusion/pull/1547#issuecomment-1009893054 Thank you @realno -- I will try and review this later today -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[GitHub] [arrow] AlenkaF commented on a change in pull request #11942: ARROW-14762: [Doc] Additional info and resources

2022-01-11 Thread GitBox
AlenkaF commented on a change in pull request #11942: URL: https://github.com/apache/arrow/pull/11942#discussion_r782076351 ## File path: docs/source/developers/guide/resources.rst ## @@ -27,3 +27,51 @@ Additional information and resources

[GitHub] [arrow] dhruv9vats commented on a change in pull request #11946: ARROW-13663: [C++] RecordBatchReader STL-like iteration

2022-01-11 Thread GitBox
dhruv9vats commented on a change in pull request #11946: URL: https://github.com/apache/arrow/pull/11946#discussion_r782081749 ## File path: cpp/src/arrow/record_batch.h ## @@ -234,6 +234,68 @@ class ARROW_EXPORT RecordBatchReader { return batch; } + class RecordBatc

[GitHub] [arrow-rs] jhorstmann commented on pull request #1150: Move simd right out of for_each loop

2022-01-11 Thread GitBox
jhorstmann commented on pull request #1150: URL: https://github.com/apache/arrow-rs/pull/1150#issuecomment-1009905929 Probably doesn't change the performance since the compiler would do this automatically, but makes the inner loop a bit easier to read :+1: -- This is an automated messa

[GitHub] [arrow] paleolimbot commented on pull request #11730: ARROW-14745: [R] Enable true duckdb streaming

2022-01-11 Thread GitBox
paleolimbot commented on pull request #11730: URL: https://github.com/apache/arrow/pull/11730#issuecomment-1009906240 Trying another angle...only using `to_arrow()`, I can't get a segfault or any weird behaviour: ``` r library(arrow, warn.conflicts = FALSE) library(dply

[GitHub] [arrow-datafusion] xudong963 commented on issue #118: Add SQL support for IN expression

2022-01-11 Thread GitBox
xudong963 commented on issue #118: URL: https://github.com/apache/arrow-datafusion/issues/118#issuecomment-1009910254 > I think @xudong963 has been working on some subqueries (e.g. #1209 and #1373 / #1492) so it might be good to catch up on those efforts. Yes, I'm sure I'll continue

[GitHub] [arrow-datafusion] alamb commented on pull request #1542: Clarify docs about `Accumulator::update` and `Accumulator::update_batch`

2022-01-11 Thread GitBox
alamb commented on pull request #1542: URL: https://github.com/apache/arrow-datafusion/pull/1542#issuecomment-1009913175 > Perhaps we can also add the same notes on merge_batch, I think it applies there too. Added in e37679f97 -- This is an automated message from the Apache Git S

[GitHub] [arrow-datafusion] xudong963 commented on issue #1544: Streaming support for DataFusion

2022-01-11 Thread GitBox
xudong963 commented on issue #1544: URL: https://github.com/apache/arrow-datafusion/issues/1544#issuecomment-1009914854 WOW, recently I also studied streaming, lol. I can join you and learn more about streaming with you. -- This is an automated message from the Apache Git Service. To re

[GitHub] [arrow-rs] jhorstmann commented on a change in pull request #1151: Add add_scalar kernel

2022-01-11 Thread GitBox
jhorstmann commented on a change in pull request #1151: URL: https://github.com/apache/arrow-rs/pull/1151#discussion_r782096256 ## File path: arrow/src/compute/kernels/arithmetic.rs ## @@ -1010,6 +1073,28 @@ where return math_op(left, right, |a, b| a + b); } +/// Add ev

[GitHub] [arrow-datafusion] alamb commented on issue #1544: Streaming support for DataFusion

2022-01-11 Thread GitBox
alamb commented on issue #1544: URL: https://github.com/apache/arrow-datafusion/issues/1544#issuecomment-1009919116 @hntd187 -- I think the proposal sounds good. I think you were asking about mechanics here: > we will at least have to develop the API inside datafusion for now

[GitHub] [arrow-datafusion] alamb edited a comment on issue #1544: Streaming support for DataFusion

2022-01-11 Thread GitBox
alamb edited a comment on issue #1544: URL: https://github.com/apache/arrow-datafusion/issues/1544#issuecomment-1009919116 @hntd187 -- I think the proposal sounds good. I think you were asking about mechanics here: > we will at least have to develop the API inside datafusion

[GitHub] [arrow-rs] tustvold opened a new pull request #1155: Restrict RecordReader and friends to POD types (#1132)

2022-01-11 Thread GitBox
tustvold opened a new pull request #1155: URL: https://github.com/apache/arrow-rs/pull/1155 # Which issue does this PR close? Closes #1132. # Rationale for this change See ticket # What changes are included in this PR? This restricts RecordReader and frien

[GitHub] [arrow-rs] tustvold commented on a change in pull request #1155: Restrict RecordReader and friends to POD types (#1132)

2022-01-11 Thread GitBox
tustvold commented on a change in pull request #1155: URL: https://github.com/apache/arrow-rs/pull/1155#discussion_r782102315 ## File path: parquet/src/data_type.rs ## @@ -1032,6 +1033,21 @@ pub(crate) mod private { self } } + +/// A marker trait

[GitHub] [arrow-datafusion] alamb commented on pull request #1526: A simplified memory manager for query execution

2022-01-11 Thread GitBox
alamb commented on pull request #1526: URL: https://github.com/apache/arrow-datafusion/pull/1526#issuecomment-1009924630 > I think there is a gap between ExecPlan and MemoryConsumer. Since an execute method would be called multiple times with different partition, it's always the SendableR

[GitHub] [arrow-datafusion] tustvold commented on pull request #1526: A simplified memory manager for query execution

2022-01-11 Thread GitBox
tustvold commented on pull request #1526: URL: https://github.com/apache/arrow-datafusion/pull/1526#issuecomment-1009925733 > Should I make SendableRecordBatchStream pin arc instead of pin box and register each stream arc to runtime at each execute() last line? Not fully caught up,

[GitHub] [arrow-datafusion] tustvold edited a comment on pull request #1526: A simplified memory manager for query execution

2022-01-11 Thread GitBox
tustvold edited a comment on pull request #1526: URL: https://github.com/apache/arrow-datafusion/pull/1526#issuecomment-1009925733 > Should I make SendableRecordBatchStream pin arc instead of pin box and register each stream arc to runtime at each execute() last line? Not fully caug

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1537: Make call SchedulerServer::new once in ballista-scheduler process

2022-01-11 Thread GitBox
alamb commented on a change in pull request #1537: URL: https://github.com/apache/arrow-datafusion/pull/1537#discussion_r782112615 ## File path: ballista/rust/scheduler/src/main.rs ## @@ -62,14 +63,18 @@ async fn start_server( "Ballista v{} Scheduler listening on {:?}"

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1537: Make call SchedulerServer::new once in ballista-scheduler process

2022-01-11 Thread GitBox
alamb commented on a change in pull request #1537: URL: https://github.com/apache/arrow-datafusion/pull/1537#discussion_r782113988 ## File path: ballista/rust/scheduler/src/main.rs ## @@ -62,14 +63,18 @@ async fn start_server( "Ballista v{} Scheduler listening on {:?}"

[GitHub] [arrow-rs] tustvold commented on a change in pull request #1041: Generify ColumnReaderImpl and RecordReader (#1040)

2022-01-11 Thread GitBox
tustvold commented on a change in pull request #1041: URL: https://github.com/apache/arrow-rs/pull/1041#discussion_r782119288 ## File path: parquet/src/arrow/record_reader/buffer.rs ## @@ -0,0 +1,196 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

[GitHub] [arrow-datafusion] Ted-Jiang commented on a change in pull request #1537: Make call SchedulerServer::new once in ballista-scheduler process

2022-01-11 Thread GitBox
Ted-Jiang commented on a change in pull request #1537: URL: https://github.com/apache/arrow-datafusion/pull/1537#discussion_r782122335 ## File path: ballista/rust/scheduler/src/main.rs ## @@ -62,14 +63,18 @@ async fn start_server( "Ballista v{} Scheduler listening on {

[GitHub] [arrow] zhihuiwan opened a new issue #12121: A MemoryError occurred when reading Arrow streams

2022-01-11 Thread GitBox
zhihuiwan opened a new issue #12121: URL: https://github.com/apache/arrow/issues/12121 The hdfs file size is 160G, and the machine memory is 64G. pyarrow.NativeFile does not support readline and readlines methods, but I want to read the content by line, so I did the following test:

[GitHub] [arrow-datafusion] Ted-Jiang commented on a change in pull request #1537: Make call SchedulerServer::new once in ballista-scheduler process

2022-01-11 Thread GitBox
Ted-Jiang commented on a change in pull request #1537: URL: https://github.com/apache/arrow-datafusion/pull/1537#discussion_r782124292 ## File path: ballista/rust/scheduler/src/main.rs ## @@ -62,14 +63,18 @@ async fn start_server( "Ballista v{} Scheduler listening on {

[GitHub] [arrow-cookbook] davisusanibar commented on a change in pull request #113: [Java]: Java cookbook recipes

2022-01-11 Thread GitBox
davisusanibar commented on a change in pull request #113: URL: https://github.com/apache/arrow-cookbook/pull/113#discussion_r782127914 ## File path: java/source/schema.rst ## @@ -0,0 +1,330 @@ +=== +Working with schema +=== + +Common definition o

[GitHub] [arrow-cookbook] davisusanibar commented on a change in pull request #113: [Java]: Java cookbook recipes

2022-01-11 Thread GitBox
davisusanibar commented on a change in pull request #113: URL: https://github.com/apache/arrow-cookbook/pull/113#discussion_r782129089 ## File path: java/source/schema.rst ## @@ -0,0 +1,330 @@ +=== +Working with schema +=== + +Common definition o

[GitHub] [arrow-cookbook] davisusanibar commented on a change in pull request #113: [Java]: Java cookbook recipes

2022-01-11 Thread GitBox
davisusanibar commented on a change in pull request #113: URL: https://github.com/apache/arrow-cookbook/pull/113#discussion_r782129280 ## File path: java/source/io.rst ## @@ -0,0 +1,354 @@ + +Reading and writing data + + +Recipes

[GitHub] [arrow-cookbook] davisusanibar commented on a change in pull request #113: [Java]: Java cookbook recipes

2022-01-11 Thread GitBox
davisusanibar commented on a change in pull request #113: URL: https://github.com/apache/arrow-cookbook/pull/113#discussion_r782132581 ## File path: java/source/io.rst ## @@ -0,0 +1,354 @@ + +Reading and writing data + + +Recipes

[GitHub] [arrow-datafusion] yjshen commented on pull request #1526: A simplified memory manager for query execution

2022-01-11 Thread GitBox
yjshen commented on pull request #1526: URL: https://github.com/apache/arrow-datafusion/pull/1526#issuecomment-1009955495 @tustvold Thanks for bringing it up. I find the stream a single place to have all runtime entities be auto-registered to the memory manager at once. Maybe a wrapper ov

[GitHub] [arrow] pitrou commented on pull request #11876: ARROW-14479: [C++] Hash Join Microbenchmarks

2022-01-11 Thread GitBox
pitrou commented on pull request #11876: URL: https://github.com/apache/arrow/pull/11876#issuecomment-1009957069 No particular concern from me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [arrow] lidavidm commented on pull request #12014: ARROW-10924: [C++] Validate temporal data in ValidateArrayFull

2022-01-11 Thread GitBox
lidavidm commented on pull request #12014: URL: https://github.com/apache/arrow/pull/12014#issuecomment-1009961789 FYI, that test failure in `GroupBy.MinMaxTypes` is here: https://github.com/apache/arrow/blob/d88e23273fd4eb7945a5fb94cfdb6315f412ea83/cpp/src/arrow/compute/kernels/hash_aggreg

[GitHub] [arrow] pitrou commented on pull request #11996: ARROW-15114: [C++] GcsFileSystem uses metadata for directory markers

2022-01-11 Thread GitBox
pitrou commented on pull request #11996: URL: https://github.com/apache/arrow/pull/11996#issuecomment-1009962326 > I think the really interesting case is working when there are no markers at all, which we can get to work in both cases. Great, I have no problem with this approach then

[GitHub] [arrow] pitrou commented on a change in pull request #11996: ARROW-15114: [C++] GcsFileSystem uses metadata for directory markers

2022-01-11 Thread GitBox
pitrou commented on a change in pull request #11996: URL: https://github.com/apache/arrow/pull/11996#discussion_r782141998 ## File path: cpp/src/arrow/filesystem/gcsfs.cc ## @@ -310,108 +318,127 @@ class GcsFileSystem::Impl { Result GetFileInfo(const GcsPath& path) { if

[GitHub] [arrow-rs] tustvold commented on a change in pull request #1039: BooleanBufferBuilder::append_packed (#1038)

2022-01-11 Thread GitBox
tustvold commented on a change in pull request #1039: URL: https://github.com/apache/arrow-rs/pull/1039#discussion_r782142570 ## File path: arrow/src/array/builder.rs ## @@ -398,6 +399,95 @@ impl BooleanBufferBuilder { } } +/// Append `count` bits from `to_s

[GitHub] [arrow] pitrou commented on a change in pull request #11946: ARROW-13663: [C++] RecordBatchReader STL-like iteration

2022-01-11 Thread GitBox
pitrou commented on a change in pull request #11946: URL: https://github.com/apache/arrow/pull/11946#discussion_r782142890 ## File path: cpp/src/arrow/record_batch.h ## @@ -234,6 +234,67 @@ class ARROW_EXPORT RecordBatchReader { return batch; } + class RecordBatchRea

[GitHub] [arrow] pitrou commented on a change in pull request #11946: ARROW-13663: [C++] RecordBatchReader STL-like iteration

2022-01-11 Thread GitBox
pitrou commented on a change in pull request #11946: URL: https://github.com/apache/arrow/pull/11946#discussion_r782143512 ## File path: cpp/src/arrow/record_batch.h ## @@ -234,6 +234,68 @@ class ARROW_EXPORT RecordBatchReader { return batch; } + class RecordBatchRea

[GitHub] [arrow] lidavidm closed pull request #12104: ARROW-15269: [C++][Docs] Clarify that not all compute functions are invocable via CallFunction

2022-01-11 Thread GitBox
lidavidm closed pull request #12104: URL: https://github.com/apache/arrow/pull/12104 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubs

[GitHub] [arrow] pitrou commented on pull request #12076: ARROW-10317: [Python] Document compute function options

2022-01-11 Thread GitBox
pitrou commented on pull request #12076: URL: https://github.com/apache/arrow/pull/12076#issuecomment-1009966373 Ping @jorisvandenbossche -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [arrow] jorisvandenbossche commented on pull request #11726: ARROW-14738: [Python][Doc] Make return types clickable

2022-01-11 Thread GitBox
jorisvandenbossche commented on pull request #11726: URL: https://github.com/apache/arrow/pull/11726#issuecomment-1009968625 Personally, I find https://github.com/apache/arrow/pull/11726#issuecomment-999340607 a somewhat annoying issue to ship as is (the strange formatting of the type list

[GitHub] [arrow] jorisvandenbossche edited a comment on pull request #11726: ARROW-14738: [Python][Doc] Make return types clickable

2022-01-11 Thread GitBox
jorisvandenbossche edited a comment on pull request #11726: URL: https://github.com/apache/arrow/pull/11726#issuecomment-1009968625 Personally, I find https://github.com/apache/arrow/pull/11726#issuecomment-999340607 a somewhat annoying issue to ship as is (the strange formatting of the ty

[GitHub] [arrow] jorisvandenbossche closed pull request #12010: ARROW-6001 [Python]: Add from_pylist() and to_pylist() to pyarrow.Table to convert list of records

2022-01-11 Thread GitBox
jorisvandenbossche closed pull request #12010: URL: https://github.com/apache/arrow/pull/12010 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: git

[GitHub] [arrow] jorisvandenbossche commented on pull request #12010: ARROW-6001 [Python]: Add from_pylist() and to_pylist() to pyarrow.Table to convert list of records

2022-01-11 Thread GitBox
jorisvandenbossche commented on pull request #12010: URL: https://github.com/apache/arrow/pull/12010#issuecomment-1009970810 Thanks @AlenkaF ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] ursabot commented on pull request #12104: ARROW-15269: [C++][Docs] Clarify that not all compute functions are invocable via CallFunction

2022-01-11 Thread GitBox
ursabot commented on pull request #12104: URL: https://github.com/apache/arrow/pull/12104#issuecomment-1009973329 Benchmark runs are scheduled for baseline = d88e23273fd4eb7945a5fb94cfdb6315f412ea83 and contender = 7a0141a8cc867e5b406ed97e5decc227923eb3f5. 7a0141a8cc867e5b406ed97e5decc227

[GitHub] [arrow-rs] tustvold commented on a change in pull request #1039: BooleanBufferBuilder::append_packed (#1038)

2022-01-11 Thread GitBox
tustvold commented on a change in pull request #1039: URL: https://github.com/apache/arrow-rs/pull/1039#discussion_r782152854 ## File path: arrow/src/util/mod.rs ## @@ -18,6 +18,7 @@ #[cfg(feature = "test_utils")] pub mod bench_util; pub mod bit_chunk_iterator; +pub(crate) m

  1   2   3   4   >