[GitHub] [arrow] seddonm1 commented on pull request #8860: ARROW-10783: [Rust] [DataFusion] Implement row count statistics for Parquet TableProviderParquet statistics [WIP]

2020-12-08 Thread GitBox
seddonm1 commented on pull request #8860: URL: https://github.com/apache/arrow/pull/8860#issuecomment-740452777 Here is a link to the WIP i was doing: https://github.com/seddonm1/arrow/compare/master...seddonm1:parquet-statistics?expand=1 --

[GitHub] [arrow] jorisvandenbossche closed pull request #8775: ARROW-10742: [Python] Check mask when creating array from numpy

2020-12-08 Thread GitBox
jorisvandenbossche closed pull request #8775: URL: https://github.com/apache/arrow/pull/8775 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [arrow] jorisvandenbossche commented on pull request #8775: ARROW-10742: [Python] Check mask when creating array from numpy

2020-12-08 Thread GitBox
jorisvandenbossche commented on pull request #8775: URL: https://github.com/apache/arrow/pull/8775#issuecomment-740469065 Thanks @chrisavl ! This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] XiaokunDing closed pull request #8866: ARROW-10781:[Rust] [DataFusion] add the 'Statistics' interface in data source

2020-12-08 Thread GitBox
XiaokunDing closed pull request #8866: URL: https://github.com/apache/arrow/pull/8866 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [arrow] xhochy closed pull request #8868: ARROW-10843: [C++] Add support for temporal types in sort family kernels

2020-12-08 Thread GitBox
xhochy closed pull request #8868: URL: https://github.com/apache/arrow/pull/8868 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] romainfrancois commented on pull request #8650: ARROW-10530: [R] Use Converter API to convert SEXP to Array/ChunkedArray

2020-12-08 Thread GitBox
romainfrancois commented on pull request #8650: URL: https://github.com/apache/arrow/pull/8650#issuecomment-740504045 I believe `AppendMultiple()` is what I would be looking for. It would e.g. solve my dilemma about converting data frames to struct types ... I had missed the more eff

[GitHub] [arrow] XiaokunDing commented on pull request #8866: ARROW-10781:[Rust] [DataFusion] add the 'Statistics' interface in data source

2020-12-08 Thread GitBox
XiaokunDing commented on pull request #8866: URL: https://github.com/apache/arrow/pull/8866#issuecomment-740507022 @seddonm1 , Thanks for your help This is an automated message from the Apache Git Service. To respond to the m

[GitHub] [arrow] pitrou commented on pull request #8472: ARROW-8113: [C++] Lighter weight variant<>

2020-12-08 Thread GitBox
pitrou commented on pull request #8472: URL: https://github.com/apache/arrow/pull/8472#issuecomment-740511215 Ok, I'll rebase a last time to make sure this doesn't break anything. This is an automated message from the Apache

[GitHub] [arrow] pitrou commented on a change in pull request #8818: ARROW-10788: [C++] Make S3 recursive tree walks parallel

2020-12-08 Thread GitBox
pitrou commented on a change in pull request #8818: URL: https://github.com/apache/arrow/pull/8818#discussion_r538196698 ## File path: cpp/src/arrow/filesystem/s3fs.cc ## @@ -1080,6 +1082,134 @@ void FileObjectToInfo(const S3Model::Object& obj, FileInfo* info) { info->set_m

[GitHub] [arrow] liyafan82 commented on pull request #6208: ARROW-7533: [Java] Move ArrowBufPointer out of the java the memory package

2020-12-08 Thread GitBox
liyafan82 commented on pull request #6208: URL: https://github.com/apache/arrow/pull/6208#issuecomment-740518112 I am going to close this PR, at least for now, as it has been a long time since the last comment was received.

[GitHub] [arrow] liyafan82 closed pull request #6208: ARROW-7533: [Java] Move ArrowBufPointer out of the java the memory package

2020-12-08 Thread GitBox
liyafan82 closed pull request #6208: URL: https://github.com/apache/arrow/pull/6208 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] kiszk commented on a change in pull request #7507: ARROW-8797: [C++] Read RecordBatch in a different endian

2020-12-08 Thread GitBox
kiszk commented on a change in pull request #7507: URL: https://github.com/apache/arrow/pull/7507#discussion_r538208202 ## File path: cpp/src/arrow/array/util.cc ## @@ -84,6 +283,12 @@ std::shared_ptr MakeArray(const std::shared_ptr& data) { return out; } +void SwapEndia

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #8657: ARROW-7363: [Python] add combine_chunks method to ChunkedArray

2020-12-08 Thread GitBox
jorisvandenbossche commented on a change in pull request #8657: URL: https://github.com/apache/arrow/pull/8657#discussion_r538246392 ## File path: docs/source/python/api/tables.rst ## @@ -29,6 +29,7 @@ Factory Functions :toctree: ../generated/ chunked_array + combin

[GitHub] [arrow] jorisvandenbossche commented on pull request #8657: ARROW-7363: [Python] add combine_chunks method to ChunkedArray

2020-12-08 Thread GitBox
jorisvandenbossche commented on pull request #8657: URL: https://github.com/apache/arrow/pull/8657#issuecomment-740551800 Just a small doc comment, sorry for the delay here This is an automated message from the Apache Git Ser

[GitHub] [arrow] jorisvandenbossche commented on pull request #8504: [Python][CI] Build with nightly numpy and pandas artifacts

2020-12-08 Thread GitBox
jorisvandenbossche commented on pull request #8504: URL: https://github.com/apache/arrow/pull/8504#issuecomment-740553237 @kszucs Looking at the PR, this is actually seems working and ready to merge? This is an automated mess

[GitHub] [arrow] pitrou commented on a change in pull request #8868: ARROW-10843: [C++] Add support for temporal types in sort family kernels

2020-12-08 Thread GitBox
pitrou commented on a change in pull request #8868: URL: https://github.com/apache/arrow/pull/8868#discussion_r538249607 ## File path: cpp/src/arrow/compute/kernels/vector_sort.cc ## @@ -267,14 +267,19 @@ uint64_t* PartitionNulls(uint64_t* indices_begin, uint64_t* indices_end,

[GitHub] [arrow] jorisvandenbossche commented on pull request #8504: ARROW-10845: [Python][CI] Build with nightly numpy and pandas artifacts

2020-12-08 Thread GitBox
jorisvandenbossche commented on pull request #8504: URL: https://github.com/apache/arrow/pull/8504#issuecomment-740556225 Rebased + created a JIRA ticket for it This is an automated message from the Apache Git Service. To res

[GitHub] [arrow] alamb closed pull request #8839: ARROW-10732: [Rust] [DataFusion] Integrate DFSchema as a step towards supporting qualified column names

2020-12-08 Thread GitBox
alamb closed pull request #8839: URL: https://github.com/apache/arrow/pull/8839 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] jorisvandenbossche commented on pull request #8504: ARROW-10845: [Python][CI] Build with nightly numpy and pandas artifacts

2020-12-08 Thread GitBox
jorisvandenbossche commented on pull request #8504: URL: https://github.com/apache/arrow/pull/8504#issuecomment-740561767 @github-actions crossbow submit test-conda-python-3.8-pandas-nightly This is an automated message from

[GitHub] [arrow] github-actions[bot] commented on pull request #8504: ARROW-10845: [Python][CI] Build with nightly numpy and pandas artifacts

2020-12-08 Thread GitBox
github-actions[bot] commented on pull request #8504: URL: https://github.com/apache/arrow/pull/8504#issuecomment-740564301 Revision: 5d770f9b7bdf225e4227439b0ebbd875f7af8bfb Submitted crossbow builds: [ursa-labs/crossbow @ actions-744](https://github.com/ursa-labs/crossbow/branches/a

[GitHub] [arrow] Dandandan commented on pull request #8748: ARROW-10703: [Rust] [DataFusion] Make join not collect on every right part

2020-12-08 Thread GitBox
Dandandan commented on pull request #8748: URL: https://github.com/apache/arrow/pull/8748#issuecomment-740564846 Would be nice if we can continue with this work. @alamb @andygrove any idea how this fits in the current design, any alternatives? -

[GitHub] [arrow] kszucs commented on pull request #8504: ARROW-10845: [Python][CI] Build with nightly numpy and pandas artifacts

2020-12-08 Thread GitBox
kszucs commented on pull request #8504: URL: https://github.com/apache/arrow/pull/8504#issuecomment-740566945 Thanks @jorisvandenbossche! Yes, if everything is green then it should be good to go. This is an automated message

[GitHub] [arrow] kszucs commented on a change in pull request #8650: ARROW-10530: [R] Use Converter API to convert SEXP to Array/ChunkedArray

2020-12-08 Thread GitBox
kszucs commented on a change in pull request #8650: URL: https://github.com/apache/arrow/pull/8650#discussion_r538269663 ## File path: r/src/r_to_arrow.cpp ## @@ -0,0 +1,814 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreem

[GitHub] [arrow] kszucs commented on a change in pull request #8650: ARROW-10530: [R] Use Converter API to convert SEXP to Array/ChunkedArray

2020-12-08 Thread GitBox
kszucs commented on a change in pull request #8650: URL: https://github.com/apache/arrow/pull/8650#discussion_r538277187 ## File path: r/src/r_to_arrow.cpp ## @@ -0,0 +1,814 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreem

[GitHub] [arrow] alamb commented on pull request #8698: ARROW-10636: [Rust][Parquet] Switch to Rust Stable by removing specialization in parquet

2020-12-08 Thread GitBox
alamb commented on pull request #8698: URL: https://github.com/apache/arrow/pull/8698#issuecomment-740585522 Well, I left the benchmarks running overnight on my cloud instance and they haven't finished yet... I am going to spend some time trying to pair them down to run in a more reasonabl

[GitHub] [arrow] pitrou closed pull request #8818: ARROW-10788: [C++] Make S3 recursive tree walks parallel

2020-12-08 Thread GitBox
pitrou closed pull request #8818: URL: https://github.com/apache/arrow/pull/8818 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] pitrou closed pull request #8472: ARROW-8113: [C++] Lighter weight variant<>

2020-12-08 Thread GitBox
pitrou closed pull request #8472: URL: https://github.com/apache/arrow/pull/8472 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] xhochy commented on pull request #8504: ARROW-10845: [Python][CI] Build with nightly numpy and pandas artifacts

2020-12-08 Thread GitBox
xhochy commented on pull request #8504: URL: https://github.com/apache/arrow/pull/8504#issuecomment-740594431 It would be nice to have this in a second flavour: build with old, test with nightly numpy. Feel free to merge on green, I'll try to look into how the other could be achieved. --

[GitHub] [arrow] alamb commented on pull request #8821: ARROW-10792: [Rust] [CI] Modularize builds for faster build and smaller caches

2020-12-08 Thread GitBox
alamb commented on pull request #8821: URL: https://github.com/apache/arrow/pull/8821#issuecomment-740602719 I retriggered the C++ CI tests on github in hopes it passes on a subsequent run. @jorgecarleitao I propose we merge this PR in and then send a note to the dev@arrow mailing l

[GitHub] [arrow] romainfrancois commented on a change in pull request #8650: ARROW-10530: [R] Use Converter API to convert SEXP to Array/ChunkedArray

2020-12-08 Thread GitBox
romainfrancois commented on a change in pull request #8650: URL: https://github.com/apache/arrow/pull/8650#discussion_r538349309 ## File path: r/src/r_to_arrow.cpp ## @@ -0,0 +1,814 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licens

[GitHub] [arrow] romainfrancois commented on a change in pull request #8650: ARROW-10530: [R] Use Converter API to convert SEXP to Array/ChunkedArray

2020-12-08 Thread GitBox
romainfrancois commented on a change in pull request #8650: URL: https://github.com/apache/arrow/pull/8650#discussion_r538351779 ## File path: r/src/r_to_arrow.cpp ## @@ -0,0 +1,814 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licens

[GitHub] [arrow] alamb commented on a change in pull request #8867: ARROW-10842 [Rust] decouple IO from json schema inference code

2020-12-08 Thread GitBox
alamb commented on a change in pull request #8867: URL: https://github.com/apache/arrow/pull/8867#discussion_r538364348 ## File path: rust/arrow/src/json/reader.rs ## @@ -250,19 +326,17 @@ pub fn infer_json_schema( reader: &mut BufReader, max_read_records: Option, )

[GitHub] [arrow] jorgecarleitao commented on pull request #8821: ARROW-10792: [Rust] [CI] Modularize builds for faster build and smaller caches

2020-12-08 Thread GitBox
jorgecarleitao commented on pull request #8821: URL: https://github.com/apache/arrow/pull/8821#issuecomment-740619423 Let's do that. I can write the note, as I can also describe the changes in more depth, as IMO they are relevant to other parts of the project also. ---

[GitHub] [arrow] alamb commented on a change in pull request #8866: ARROW-10781:[Rust] [DataFusion] add the 'Statistics' interface in data source

2020-12-08 Thread GitBox
alamb commented on a change in pull request #8866: URL: https://github.com/apache/arrow/pull/8866#discussion_r538371128 ## File path: rust/datafusion/src/datasource/datasource.rs ## @@ -24,6 +24,15 @@ use crate::arrow::datatypes::SchemaRef; use crate::error::Result; use crate

[GitHub] [arrow] alamb commented on pull request #8821: ARROW-10792: [Rust] [CI] Modularize builds for faster build and smaller caches

2020-12-08 Thread GitBox
alamb commented on pull request #8821: URL: https://github.com/apache/arrow/pull/8821#issuecomment-740627085 @jorgecarleitao -- sounds good (with you writing the note) The windows C++ test is still failing in a strange way -- I can't seem to find `AMD64 Windows 2019 C++` running on

[GitHub] [arrow] jorgecarleitao commented on pull request #8821: ARROW-10792: [Rust] [CI] Modularize builds for faster build and smaller caches

2020-12-08 Thread GitBox
jorgecarleitao commented on pull request #8821: URL: https://github.com/apache/arrow/pull/8821#issuecomment-740634083 @alamb , these are unrelated and I (unfortunately) see them often whenever I need to change dockerfiles and other CI files. I have been ignoring them, as I expect / hope th

[GitHub] [arrow] alamb commented on pull request #8821: ARROW-10792: [Rust] [CI] Modularize builds for faster build and smaller caches

2020-12-08 Thread GitBox
alamb commented on pull request #8821: URL: https://github.com/apache/arrow/pull/8821#issuecomment-740635592 Ok, 🚀 it is then! cc @nevi-me @andygrove @kszucs This is an automated message from the Apache Git Service.

[GitHub] [arrow] alamb closed pull request #8821: ARROW-10792: [Rust] [CI] Modularize builds for faster build and smaller caches

2020-12-08 Thread GitBox
alamb closed pull request #8821: URL: https://github.com/apache/arrow/pull/8821 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] andygrove commented on a change in pull request #8866: ARROW-10781:[Rust] [DataFusion] add the 'Statistics' interface in data source

2020-12-08 Thread GitBox
andygrove commented on a change in pull request #8866: URL: https://github.com/apache/arrow/pull/8866#discussion_r538442723 ## File path: rust/datafusion/src/datasource/datasource.rs ## @@ -24,6 +24,15 @@ use crate::arrow::datatypes::SchemaRef; use crate::error::Result; use c

[GitHub] [arrow] andygrove commented on pull request #8760: ARROW-10712: [Rust] Add tests to TPC-H benchmarks

2020-12-08 Thread GitBox
andygrove commented on pull request #8760: URL: https://github.com/apache/arrow/pull/8760#issuecomment-740658677 I will be getting back to this PR in the next day or two. This is an automated message from the Apache Git Servi

[GitHub] [arrow] jorisvandenbossche closed pull request #8504: ARROW-10845: [Python][CI] Build with nightly numpy and pandas artifacts

2020-12-08 Thread GitBox
jorisvandenbossche closed pull request #8504: URL: https://github.com/apache/arrow/pull/8504 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [arrow] jorisvandenbossche commented on pull request #8504: ARROW-10845: [Python][CI] Build with nightly numpy and pandas artifacts

2020-12-08 Thread GitBox
jorisvandenbossche commented on pull request #8504: URL: https://github.com/apache/arrow/pull/8504#issuecomment-740665982 Indeed, that would be nice as well. I am not sure how easy it is with the current docker-compose workflow .. --

[GitHub] [arrow] jorisvandenbossche opened a new pull request #8869: ARROW-10849: [Python] Handle numpy deprecation warnings for builtin type aliases

2020-12-08 Thread GitBox
jorisvandenbossche opened a new pull request #8869: URL: https://github.com/apache/arrow/pull/8869 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow] jorisvandenbossche commented on pull request #8869: ARROW-10849: [Python] Handle numpy deprecation warnings for builtin type aliases

2020-12-08 Thread GitBox
jorisvandenbossche commented on pull request #8869: URL: https://github.com/apache/arrow/pull/8869#issuecomment-740679201 @github-actions crossbow submit test-conda-python-3.8-pandas-nightly This is an automated message from

[GitHub] [arrow] github-actions[bot] commented on pull request #8869: ARROW-10849: [Python] Handle numpy deprecation warnings for builtin type aliases

2020-12-08 Thread GitBox
github-actions[bot] commented on pull request #8869: URL: https://github.com/apache/arrow/pull/8869#issuecomment-740681502 Revision: 98e93a999e93fd4bc84cc9f708921f565d368b37 Submitted crossbow builds: [ursa-labs/crossbow @ actions-745](https://github.com/ursa-labs/crossbow/branches/a

[GitHub] [arrow] github-actions[bot] commented on pull request #8869: ARROW-10849: [Python] Handle numpy deprecation warnings for builtin type aliases

2020-12-08 Thread GitBox
github-actions[bot] commented on pull request #8869: URL: https://github.com/apache/arrow/pull/8869#issuecomment-740690559 https://issues.apache.org/jira/browse/ARROW-10849 This is an automated message from the Apache Git Ser

[GitHub] [arrow] rj-atw commented on pull request #7767: ARROW-9453: [Rust] Wasm32 compilation support

2020-12-08 Thread GitBox
rj-atw commented on pull request #7767: URL: https://github.com/apache/arrow/pull/7767#issuecomment-740767675 I am still intersted. Started new job, so diverted my attention. But hoping to close this out during holiday break. On Mon, Dec 7, 2020, 10:10 PM Jorge Leitao wrote

[GitHub] [arrow] houqp commented on a change in pull request #8867: ARROW-10842 [Rust] decouple IO from json schema inference code

2020-12-08 Thread GitBox
houqp commented on a change in pull request #8867: URL: https://github.com/apache/arrow/pull/8867#discussion_r538675155 ## File path: rust/arrow/src/json/reader.rs ## @@ -250,19 +326,17 @@ pub fn infer_json_schema( reader: &mut BufReader, max_read_records: Option, )

[GitHub] [arrow] houqp commented on pull request #8867: ARROW-10842 [Rust] decouple IO from json schema inference code

2020-12-08 Thread GitBox
houqp commented on pull request #8867: URL: https://github.com/apache/arrow/pull/8867#issuecomment-740808479 Thanks @alamb for the review, I will add more tests as well as also change to decouple IO from record batch reader.

[GitHub] [arrow] codecov-io commented on pull request #8863: ARROW-10837: [Rust][DataFusion] Use `Vec` for hash keys

2020-12-08 Thread GitBox
codecov-io commented on pull request #8863: URL: https://github.com/apache/arrow/pull/8863#issuecomment-740808850 # [Codecov](https://codecov.io/gh/apache/arrow/pull/8863?src=pr&el=h1) Report > Merging [#8863](https://codecov.io/gh/apache/arrow/pull/8863?src=pr&el=desc) (3559d97) into

[GitHub] [arrow] Dandandan commented on pull request #8863: ARROW-10837: [Rust][DataFusion] Use `Vec` for hash keys

2020-12-08 Thread GitBox
Dandandan commented on pull request #8863: URL: https://github.com/apache/arrow/pull/8863#issuecomment-740809969 @jorgecarleitao Just extended the change for hash aggregates as well. Turns out, a good speedup as well for hash aggregate queries! [This version] ```

[GitHub] [arrow] jorgecarleitao closed pull request #8852: ARROW-10826: [Rust] Add support for FixedSizeBinaryArray to MutableArrayData

2020-12-08 Thread GitBox
jorgecarleitao closed pull request #8852: URL: https://github.com/apache/arrow/pull/8852 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [arrow] fsaintjacques commented on pull request #8855: ARROW-9630: [Go] Export JSON reader and writer

2020-12-08 Thread GitBox
fsaintjacques commented on pull request #8855: URL: https://github.com/apache/arrow/pull/8855#issuecomment-740820889 This is not a JSON reader like the C++'s json reader. This is only meant to support the internal custom format used for integration testing between implementation languages.

[GitHub] [arrow] Dandandan commented on pull request #8865: ARROW-10839: [Rust] [Data Fusion] Implement BETWEEN operator

2020-12-08 Thread GitBox
Dandandan commented on pull request #8865: URL: https://github.com/apache/arrow/pull/8865#issuecomment-740824472 @seddonm1 could you rebase again? This is an automated message from the Apache Git Service. To respond to the me

[GitHub] [arrow] houqp commented on a change in pull request #8853: ARROW-10827: [Rust] Extended concat and made it faster (2-6x)

2020-12-08 Thread GitBox
houqp commented on a change in pull request #8853: URL: https://github.com/apache/arrow/pull/8853#discussion_r538703517 ## File path: rust/arrow/src/array/builder.rs ## @@ -3969,579 +3296,4 @@ mod tests { // Special error if the key overflows (256th entry) bui

[GitHub] [arrow] nealrichardson commented on a change in pull request #8824: ARROW-10798: [CI] Made certain workflows be less triggered on changes of dockerfiles

2020-12-08 Thread GitBox
nealrichardson commented on a change in pull request #8824: URL: https://github.com/apache/arrow/pull/8824#discussion_r538745060 ## File path: .github/workflows/cpp.yml ## @@ -21,7 +21,7 @@ on: push: paths: - '.github/workflows/cpp.yml' - - 'ci/docker/**' +

[GitHub] [arrow] xhochy commented on pull request #8855: ARROW-9630: [Go] Export JSON reader and writer

2020-12-08 Thread GitBox
xhochy commented on pull request #8855: URL: https://github.com/apache/arrow/pull/8855#issuecomment-740901677 > This is not a JSON reader like the C++'s json reader. This is only meant to support the internal custom format used for integration testing between implementation languages. >

[GitHub] [arrow] xhochy edited a comment on pull request #8855: ARROW-9630: [Go] Export JSON reader and writer

2020-12-08 Thread GitBox
xhochy edited a comment on pull request #8855: URL: https://github.com/apache/arrow/pull/8855#issuecomment-740901677 > This is not a JSON reader like the C++'s json reader. This is only meant to support the internal custom format used for integration testing between implementation language

[GitHub] [arrow] seddonm1 commented on pull request #8865: ARROW-10839: [Rust] [Data Fusion] Implement BETWEEN operator

2020-12-08 Thread GitBox
seddonm1 commented on pull request #8865: URL: https://github.com/apache/arrow/pull/8865#issuecomment-740904102 rebased. This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8824: ARROW-10798: [CI] Made certain workflows be less triggered on changes of dockerfiles

2020-12-08 Thread GitBox
jorgecarleitao commented on a change in pull request #8824: URL: https://github.com/apache/arrow/pull/8824#discussion_r538749024 ## File path: .github/workflows/cpp.yml ## @@ -21,7 +21,7 @@ on: push: paths: - '.github/workflows/cpp.yml' - - 'ci/docker/**' +

[GitHub] [arrow] seddonm1 commented on a change in pull request #8866: ARROW-10781:[Rust] [DataFusion] add the 'Statistics' interface in data source

2020-12-08 Thread GitBox
seddonm1 commented on a change in pull request #8866: URL: https://github.com/apache/arrow/pull/8866#discussion_r538750212 ## File path: rust/datafusion/src/datasource/datasource.rs ## @@ -24,6 +24,15 @@ use crate::arrow::datatypes::SchemaRef; use crate::error::Result; use cr

[GitHub] [arrow] seddonm1 removed a comment on pull request #8865: ARROW-10839: [Rust] [Data Fusion] Implement BETWEEN operator

2020-12-08 Thread GitBox
seddonm1 removed a comment on pull request #8865: URL: https://github.com/apache/arrow/pull/8865#issuecomment-740904102 rebased. This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [arrow] kiszk commented on pull request #7507: ARROW-8797: [C++] Read RecordBatch in a different endian

2020-12-08 Thread GitBox
kiszk commented on pull request #7507: URL: https://github.com/apache/arrow/pull/7507#issuecomment-740923764 Addressed comments except the following two items - be addressing https://github.com/apache/arrow/pull/7507#discussion_r513711536 - postpone https://github.com/apache/arrow/pul

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8824: ARROW-10798: [CI] Made certain workflows be less triggered on changes of dockerfiles

2020-12-08 Thread GitBox
jorgecarleitao commented on a change in pull request #8824: URL: https://github.com/apache/arrow/pull/8824#discussion_r538756496 ## File path: .github/workflows/cpp.yml ## @@ -21,7 +21,7 @@ on: push: paths: - '.github/workflows/cpp.yml' - - 'ci/docker/**' +

[GitHub] [arrow] seddonm1 commented on pull request #8865: ARROW-10839: [Rust] [Data Fusion] Implement BETWEEN operator

2020-12-08 Thread GitBox
seddonm1 commented on pull request #8865: URL: https://github.com/apache/arrow/pull/8865#issuecomment-740930440 rebased. This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[GitHub] [arrow] seddonm1 commented on pull request #8760: ARROW-10712: [Rust] Add tests to TPC-H benchmarks

2020-12-08 Thread GitBox
seddonm1 commented on pull request #8760: URL: https://github.com/apache/arrow/pull/8760#issuecomment-740934197 I can help with this if you can describe your plans. This is an automated message from the Apache Git Service. To

[GitHub] [arrow] alamb commented on a change in pull request #8698: ARROW-10636: [Rust][Parquet] Switch to Rust Stable by removing specialization in parquet

2020-12-08 Thread GitBox
alamb commented on a change in pull request #8698: URL: https://github.com/apache/arrow/pull/8698#discussion_r538756445 ## File path: rust/parquet/src/data_type.rs ## @@ -327,7 +422,12 @@ pub trait AsBytes { pub trait SliceAsBytes: Sized { /// Returns slice of bytes for a

[GitHub] [arrow] alamb commented on pull request #8698: ARROW-10636: [Rust][Parquet] Switch to Rust Stable by removing specialization in parquet

2020-12-08 Thread GitBox
alamb commented on pull request #8698: URL: https://github.com/apache/arrow/pull/8698#issuecomment-740944598 FYI @sunchao @nevi-me @jorgecarleitao -- I spent quite a while verifying the performance of this approach and reviewing the code (treatise is here: https://github.com/apache/arro

[GitHub] [arrow] andygrove commented on pull request #8760: ARROW-10712: [Rust] Add tests to TPC-H benchmarks

2020-12-08 Thread GitBox
andygrove commented on pull request #8760: URL: https://github.com/apache/arrow/pull/8760#issuecomment-740950757 Thanks @seddonm1 that would be great if you have the time. I was really just planning on addressing feedback. Feel free to push to this PR or create a new one to replace this.

[GitHub] [arrow] seddonm1 commented on pull request #8760: ARROW-10712: [Rust] Add tests to TPC-H benchmarks

2020-12-08 Thread GitBox
seddonm1 commented on pull request #8760: URL: https://github.com/apache/arrow/pull/8760#issuecomment-740954932 @andygrove No worries. Hopefully I can help on some of these easier tasks to free you up for the harder ones. Th

[GitHub] [arrow] jorgecarleitao commented on pull request #8698: ARROW-10636: [Rust][Parquet] Switch to Rust Stable by removing specialization in parquet

2020-12-08 Thread GitBox
jorgecarleitao commented on pull request #8698: URL: https://github.com/apache/arrow/pull/8698#issuecomment-740958408 @alamb benchmarks formatted as a table (from worse to best, statistical insignificant and changes <5% ignored). No judgement, just info: | benchmark | variation (%)

[GitHub] [arrow] jorgecarleitao edited a comment on pull request #8698: ARROW-10636: [Rust][Parquet] Switch to Rust Stable by removing specialization in parquet

2020-12-08 Thread GitBox
jorgecarleitao edited a comment on pull request #8698: URL: https://github.com/apache/arrow/pull/8698#issuecomment-740958408 @alamb benchmarks formatted as a table (from worse to best, statistical insignificant and changes <5% ignored). No judgement, just info: | benchmark | variati

[GitHub] [arrow] Dandandan commented on pull request #8685: ARROW-10216: [Rust] Simd implementation for primitive min/max kernels

2020-12-08 Thread GitBox
Dandandan commented on pull request #8685: URL: https://github.com/apache/arrow/pull/8685#issuecomment-740997894 @jhorstmann I wonder if we should support a standardized ordering like IEEE 754 instead? There is a (unstable) implementation in rust std for it. https://doc.rust-l

[GitHub] [arrow] alamb commented on pull request #8698: ARROW-10636: [Rust][Parquet] Switch to Rust Stable by removing specialization in parquet

2020-12-08 Thread GitBox
alamb commented on pull request #8698: URL: https://github.com/apache/arrow/pull/8698#issuecomment-741052690 Thank you @jorgecarleitao -- that table is much better than my hand wavy analysis. This is an automated message f

[GitHub] [arrow] jorgecarleitao opened a new pull request #8870: ARROW-10854: [Rust] [DataFusion] Simplify logical plan

2020-12-08 Thread GitBox
jorgecarleitao opened a new pull request #8870: URL: https://github.com/apache/arrow/pull/8870 This PR simplifies the logical plan by removing `CsvScan`, `ParquetScan` and `InMemoryScan` and encapsulating all of them into `TableScan`. The underlying aspect here is that all these node

[GitHub] [arrow] github-actions[bot] commented on pull request #8870: ARROW-10854: [Rust] [DataFusion] Simplify logical plan scans

2020-12-08 Thread GitBox
github-actions[bot] commented on pull request #8870: URL: https://github.com/apache/arrow/pull/8870#issuecomment-741072900 https://issues.apache.org/jira/browse/ARROW-10854 This is an automated message from the Apache Git Ser

[GitHub] [arrow] kou commented on a change in pull request #8868: ARROW-10843: [C++] Add support for temporal types in sort family kernels

2020-12-08 Thread GitBox
kou commented on a change in pull request #8868: URL: https://github.com/apache/arrow/pull/8868#discussion_r538898690 ## File path: cpp/src/arrow/compute/kernels/vector_sort.cc ## @@ -267,14 +267,19 @@ uint64_t* PartitionNulls(uint64_t* indices_begin, uint64_t* indices_end, /

[GitHub] [arrow] kou opened a new pull request #8871: ARROW-10857: [Packaging] Follow PowerTools repository name change on CentOS 8

2020-12-08 Thread GitBox
kou opened a new pull request #8871: URL: https://github.com/apache/arrow/pull/8871 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] kou commented on pull request #8871: ARROW-10857: [Packaging] Follow PowerTools repository name change on CentOS 8

2020-12-08 Thread GitBox
kou commented on pull request #8871: URL: https://github.com/apache/arrow/pull/8871#issuecomment-741341269 @github-actions crossbow submit centos-8-* This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] github-actions[bot] commented on pull request #8871: ARROW-10857: [Packaging] Follow PowerTools repository name change on CentOS 8

2020-12-08 Thread GitBox
github-actions[bot] commented on pull request #8871: URL: https://github.com/apache/arrow/pull/8871#issuecomment-741343305 Revision: ffc14991c0fcca0a038e291b5f1214c6942ea49c Submitted crossbow builds: [ursa-labs/crossbow @ actions-746](https://github.com/ursa-labs/crossbow/branches/a

[GitHub] [arrow] github-actions[bot] commented on pull request #8871: ARROW-10857: [Packaging] Follow PowerTools repository name change on CentOS 8

2020-12-08 Thread GitBox
github-actions[bot] commented on pull request #8871: URL: https://github.com/apache/arrow/pull/8871#issuecomment-741381752 https://issues.apache.org/jira/browse/ARROW-10857 This is an automated message from the Apache Git Ser

[GitHub] [arrow] kou commented on pull request #8871: ARROW-10857: [Packaging] Follow PowerTools repository name change on CentOS 8

2020-12-08 Thread GitBox
kou commented on pull request #8871: URL: https://github.com/apache/arrow/pull/8871#issuecomment-741458568 +1 g++ crash is another problem: ```text FAILED: src/arrow/CMakeFiles/arrow_objlib.dir/compute/kernels/codegen_internal.cc.o /usr/bin/c++ -DARROW_EXPORTING -DARR

[GitHub] [arrow] kou closed pull request #8871: ARROW-10857: [Packaging] Follow PowerTools repository name change on CentOS 8

2020-12-08 Thread GitBox
kou closed pull request #8871: URL: https://github.com/apache/arrow/pull/8871 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [arrow] kou opened a new pull request #8872: ARROW-10858: [C++] Add missing Boost dependency with Visual C++

2020-12-08 Thread GitBox
kou opened a new pull request #8872: URL: https://github.com/apache/arrow/pull/8872 `cpp/src/arrow/util/int128_internal.h` requires Boost. This is an automated message from the Apache Git Service. To respond to the message, p

[GitHub] [arrow] github-actions[bot] commented on pull request #8872: ARROW-10858: [C++] Add missing Boost dependency with Visual C++

2020-12-08 Thread GitBox
github-actions[bot] commented on pull request #8872: URL: https://github.com/apache/arrow/pull/8872#issuecomment-741491345 https://issues.apache.org/jira/browse/ARROW-10858 This is an automated message from the Apache Git Ser

[GitHub] [arrow] polm opened a new issue #8873: pyarrow 3.9 wheels?

2020-12-08 Thread GitBox
polm opened a new issue #8873: URL: https://github.com/apache/arrow/issues/8873 Thanks for your work on pyarrow. Are there plans to release Python 3.9 wheels for pyarrow? The lack of wheels makes installing software that relies on pyarrow difficult; here's an example that bit me:

[GitHub] [arrow] polm closed issue #8873: pyarrow 3.9 wheels?

2020-12-08 Thread GitBox
polm closed issue #8873: URL: https://github.com/apache/arrow/issues/8873 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

[GitHub] [arrow] polm commented on issue #8873: pyarrow 3.9 wheels?

2020-12-08 Thread GitBox
polm commented on issue #8873: URL: https://github.com/apache/arrow/issues/8873#issuecomment-741509919 Ah, just saw #8610, sorry for the noise. This is an automated message from the Apache Git Service. To respond to the messa

[GitHub] [arrow] jorgecarleitao opened a new pull request #8874: ARROW-10859: [Rust] [DataFusion] Made collect not require ExecutionContext

2020-12-08 Thread GitBox
jorgecarleitao opened a new pull request #8874: URL: https://github.com/apache/arrow/pull/8874 This PR observes that `ExecutionContext::collect(&self, plan: Arc)` does not use `self` on its implementation. Using this observation, it refactors out `collect` out of `ExecutionContext` (

[GitHub] [arrow] github-actions[bot] commented on pull request #8874: ARROW-10859: [Rust] [DataFusion] Made collect not require ExecutionContext

2020-12-08 Thread GitBox
github-actions[bot] commented on pull request #8874: URL: https://github.com/apache/arrow/pull/8874#issuecomment-741529902 https://issues.apache.org/jira/browse/ARROW-10859 This is an automated message from the Apache Git Ser

[GitHub] [arrow] codecov-io commented on pull request #8874: ARROW-10859: [Rust] [DataFusion] Made collect not require ExecutionContext

2020-12-08 Thread GitBox
codecov-io commented on pull request #8874: URL: https://github.com/apache/arrow/pull/8874#issuecomment-741530586 # [Codecov](https://codecov.io/gh/apache/arrow/pull/8874?src=pr&el=h1) Report > Merging [#8874](https://codecov.io/gh/apache/arrow/pull/8874?src=pr&el=desc) (562bc1f) into

[GitHub] [arrow] jorgecarleitao opened a new pull request #8875: ARROW-10844: [Rust] [DataFusion] Allow joins after a table registration

2020-12-08 Thread GitBox
jorgecarleitao opened a new pull request #8875: URL: https://github.com/apache/arrow/pull/8875 This PR is built on top of #8874 , and provides a modification to the `ExecutionContext` necessary to run joins where `register_table` is called between creation of `DataFrame`. The underl

[GitHub] [arrow] codecov-io edited a comment on pull request #8874: ARROW-10859: [Rust] [DataFusion] Made collect not require ExecutionContext

2020-12-08 Thread GitBox
codecov-io edited a comment on pull request #8874: URL: https://github.com/apache/arrow/pull/8874#issuecomment-741530586 # [Codecov](https://codecov.io/gh/apache/arrow/pull/8874?src=pr&el=h1) Report > Merging [#8874](https://codecov.io/gh/apache/arrow/pull/8874?src=pr&el=desc) (976878f)

[GitHub] [arrow] codecov-io commented on pull request #8875: ARROW-10844: [Rust] [DataFusion] Allow joins after a table registration

2020-12-08 Thread GitBox
codecov-io commented on pull request #8875: URL: https://github.com/apache/arrow/pull/8875#issuecomment-741535928 # [Codecov](https://codecov.io/gh/apache/arrow/pull/8875?src=pr&el=h1) Report > Merging [#8875](https://codecov.io/gh/apache/arrow/pull/8875?src=pr&el=desc) (4dd056c) into

[GitHub] [arrow] github-actions[bot] commented on pull request #8875: ARROW-10844: [Rust] [DataFusion] Allow joins after a table registration

2020-12-08 Thread GitBox
github-actions[bot] commented on pull request #8875: URL: https://github.com/apache/arrow/pull/8875#issuecomment-741536605 https://issues.apache.org/jira/browse/ARROW-10844 This is an automated message from the Apache Git Ser

[GitHub] [arrow] codecov-io edited a comment on pull request #8875: ARROW-10844: [Rust] [DataFusion] Allow joins after a table registration

2020-12-08 Thread GitBox
codecov-io edited a comment on pull request #8875: URL: https://github.com/apache/arrow/pull/8875#issuecomment-741535928 # [Codecov](https://codecov.io/gh/apache/arrow/pull/8875?src=pr&el=h1) Report > Merging [#8875](https://codecov.io/gh/apache/arrow/pull/8875?src=pr&el=desc) (925caf9)

[GitHub] [arrow] codecov-io edited a comment on pull request #8874: ARROW-10859: [Rust] [DataFusion] Made collect not require ExecutionContext

2020-12-08 Thread GitBox
codecov-io edited a comment on pull request #8874: URL: https://github.com/apache/arrow/pull/8874#issuecomment-741530586 # [Codecov](https://codecov.io/gh/apache/arrow/pull/8874?src=pr&el=h1) Report > Merging [#8874](https://codecov.io/gh/apache/arrow/pull/8874?src=pr&el=desc) (af52402)

[GitHub] [arrow] XiaokunDing commented on a change in pull request #8866: ARROW-10781:[Rust] [DataFusion] add the 'Statistics' interface in data source

2020-12-08 Thread GitBox
XiaokunDing commented on a change in pull request #8866: URL: https://github.com/apache/arrow/pull/8866#discussion_r539020866 ## File path: rust/datafusion/src/datasource/datasource.rs ## @@ -24,6 +24,15 @@ use crate::arrow::datatypes::SchemaRef; use crate::error::Result; use

[GitHub] [arrow] XiaokunDing commented on a change in pull request #8866: ARROW-10781:[Rust] [DataFusion] add the 'Statistics' interface in data source

2020-12-08 Thread GitBox
XiaokunDing commented on a change in pull request #8866: URL: https://github.com/apache/arrow/pull/8866#discussion_r539020866 ## File path: rust/datafusion/src/datasource/datasource.rs ## @@ -24,6 +24,15 @@ use crate::arrow::datatypes::SchemaRef; use crate::error::Result; use

[GitHub] [arrow] seddonm1 commented on a change in pull request #8866: ARROW-10781:[Rust] [DataFusion] add the 'Statistics' interface in data source

2020-12-08 Thread GitBox
seddonm1 commented on a change in pull request #8866: URL: https://github.com/apache/arrow/pull/8866#discussion_r539028012 ## File path: rust/datafusion/src/datasource/datasource.rs ## @@ -24,6 +24,15 @@ use crate::arrow::datatypes::SchemaRef; use crate::error::Result; use cr

  1   2   >