[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8630: ARROW-10540 [Rust] Improve filtering

2020-11-13 Thread GitBox
jorgecarleitao commented on a change in pull request #8630: URL: https://github.com/apache/arrow/pull/8630#discussion_r523390298 ## File path: rust/arrow/benches/filter_kernels.rs ## @@ -14,137 +14,136 @@ // KIND, either express or implied. See the License for the // specifi

[GitHub] [arrow] jorgecarleitao closed pull request #8645: ARROW-10561: [Rust] Simplified Buffer's `write` and `write_bytes` and fixed undefined behavior

2020-11-13 Thread GitBox
jorgecarleitao closed pull request #8645: URL: https://github.com/apache/arrow/pull/8645 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [arrow] nevi-me commented on a change in pull request #8660: ARROW-10173: [Rust][DataFusion] Implement support for direct comparison to scalar values

2020-11-13 Thread GitBox
nevi-me commented on a change in pull request #8660: URL: https://github.com/apache/arrow/pull/8660#discussion_r523384710 ## File path: rust/datafusion/src/physical_plan/mod.rs ## @@ -100,6 +100,30 @@ pub enum Distribution { SinglePartition, } +/// Represents the result

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8660: ARROW-10173: [Rust][DataFusion] Implement support for direct comparison to scalar values

2020-11-13 Thread GitBox
jorgecarleitao commented on a change in pull request #8660: URL: https://github.com/apache/arrow/pull/8660#discussion_r523374868 ## File path: rust/datafusion/src/physical_plan/filter.rs ## @@ -24,7 +24,7 @@ use std::sync::{Arc, Mutex}; use crate::error::{ExecutionError, Resul

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8660: ARROW-10173: [Rust][DataFusion] Implement support for direct comparison to scalar values

2020-11-13 Thread GitBox
jorgecarleitao commented on a change in pull request #8660: URL: https://github.com/apache/arrow/pull/8660#discussion_r523374604 ## File path: rust/datafusion/src/physical_plan/mod.rs ## @@ -100,6 +100,30 @@ pub enum Distribution { SinglePartition, } +/// Represents the

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8660: ARROW-10173: [Rust][DataFusion] Implement support for direct comparison to scalar values

2020-11-13 Thread GitBox
jorgecarleitao commented on a change in pull request #8660: URL: https://github.com/apache/arrow/pull/8660#discussion_r523374868 ## File path: rust/datafusion/src/physical_plan/filter.rs ## @@ -24,7 +24,7 @@ use std::sync::{Arc, Mutex}; use crate::error::{ExecutionError, Resul

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8660: ARROW-10173: [Rust][DataFusion] Implement support for direct comparison to scalar values

2020-11-13 Thread GitBox
jorgecarleitao commented on a change in pull request #8660: URL: https://github.com/apache/arrow/pull/8660#discussion_r523374604 ## File path: rust/datafusion/src/physical_plan/mod.rs ## @@ -100,6 +100,30 @@ pub enum Distribution { SinglePartition, } +/// Represents the

[GitHub] [arrow] nevi-me commented on a change in pull request #8660: ARROW-10173: [Rust][DataFusion] Implement support for direct comparison to scalar values

2020-11-13 Thread GitBox
nevi-me commented on a change in pull request #8660: URL: https://github.com/apache/arrow/pull/8660#discussion_r523368564 ## File path: rust/datafusion/src/physical_plan/mod.rs ## @@ -100,6 +100,30 @@ pub enum Distribution { SinglePartition, } +/// Represents the result

[GitHub] [arrow] arw2019 commented on a change in pull request #8474: ARROW-10301: [C++][Compute] Implement "all" reduction kernel for boolean data

2020-11-13 Thread GitBox
arw2019 commented on a change in pull request #8474: URL: https://github.com/apache/arrow/pull/8474#discussion_r523365931 ## File path: cpp/src/arrow/compute/kernels/aggregate_basic.cc ## @@ -151,6 +151,45 @@ std::unique_ptr MinMaxInit(KernelContext* ctx, const KernelInitArgs

[GitHub] [arrow] arw2019 commented on pull request #8474: ARROW-10301: [C++][Compute] Implement "all" reduction kernel for boolean data

2020-11-13 Thread GitBox
arw2019 commented on pull request #8474: URL: https://github.com/apache/arrow/pull/8474#issuecomment-727132950 > Since this is similar to #8294, and most review on the code happened there, does it makes sense to get that PR merged first? Yes, agreed - since this follows the pattern i

[GitHub] [arrow] emkornfield commented on pull request #8644: ARROW-10573: [C++] Align written buffers to specified value

2020-11-13 Thread GitBox
emkornfield commented on pull request #8644: URL: https://github.com/apache/arrow/pull/8644#issuecomment-727073502 I'm -1 on allowing non-conforming IPC implementations. This is an automated message from the Apache Git Servic

[GitHub] [arrow] wesm commented on pull request #8644: ARROW-10573: [C++] Align written buffers to specified value

2020-11-13 Thread GitBox
wesm commented on pull request #8644: URL: https://github.com/apache/arrow/pull/8644#issuecomment-727068271 From the specification > Implementations are recommended to allocate memory on aligned addresses (multiple of 8- or 64-bytes) and pad (overallocate) to a length that is a mult

[GitHub] [arrow] andygrove commented on a change in pull request #8660: ARROW-10173: [Rust][DataFusion] Implement support for direct comparison to scalar values

2020-11-13 Thread GitBox
andygrove commented on a change in pull request #8660: URL: https://github.com/apache/arrow/pull/8660#discussion_r523264105 ## File path: rust/datafusion/src/physical_plan/expressions.rs ## @@ -1288,18 +1363,84 @@ impl PhysicalExpr for BinaryExpr { Ok(self.left.nullabl

[GitHub] [arrow] github-actions[bot] commented on pull request #8661: ARROW-10581: [website] IPC dictionary reference to relevant section

2020-11-13 Thread GitBox
github-actions[bot] commented on pull request #8661: URL: https://github.com/apache/arrow/pull/8661#issuecomment-727064616 https://issues.apache.org/jira/browse/ARROW-10581 This is an automated message from the Apache Git Ser

[GitHub] [arrow] njwhite commented on pull request #8644: ARROW-10573: [C++] Align written buffers to specified value

2020-11-13 Thread GitBox
njwhite commented on pull request #8644: URL: https://github.com/apache/arrow/pull/8644#issuecomment-727063708 @wesm I disagree with your assertion that it's only useful in an extraordinarily narrow use case - I've added a test case `test_contiguous_buffers_mixed_types` to show a zero-copy

[GitHub] [arrow] github-actions[bot] commented on pull request #8661: [doc] IPC dictionary reference to relevant section

2020-11-13 Thread GitBox
github-actions[bot] commented on pull request #8661: URL: https://github.com/apache/arrow/pull/8661#issuecomment-727050993 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then could

[GitHub] [arrow] carols10cents commented on pull request #8641: ARROW-8853: [Rust] [Integration Testing] Enable Flight tests

2020-11-13 Thread GitBox
carols10cents commented on pull request #8641: URL: https://github.com/apache/arrow/pull/8641#issuecomment-727048909 Ok, so, now the integration test job got cancelled after 360 min, and suspiciously it appears to be cancelled [during the Flight tests](https://github.com/apache/arrow/pull/

[GitHub] [arrow] yordan-pavlov commented on a change in pull request #8660: ARROW-10173: [Rust][DataFusion] Implement support for direct comparison to scalar values

2020-11-13 Thread GitBox
yordan-pavlov commented on a change in pull request #8660: URL: https://github.com/apache/arrow/pull/8660#discussion_r523245452 ## File path: rust/datafusion/src/physical_plan/expressions.rs ## @@ -969,6 +975,42 @@ macro_rules! compute_utf8_op { }}; } +/// Invoke a comp

[GitHub] [arrow] Fonsan opened a new pull request #8661: [doc] IPC dictionary reference to relevant section

2020-11-13 Thread GitBox
Fonsan opened a new pull request #8661: URL: https://github.com/apache/arrow/pull/8661 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] yordan-pavlov commented on a change in pull request #8660: ARROW-10173: [Rust][DataFusion] Implement support for direct comparison to scalar values

2020-11-13 Thread GitBox
yordan-pavlov commented on a change in pull request #8660: URL: https://github.com/apache/arrow/pull/8660#discussion_r523243341 ## File path: rust/datafusion/src/physical_plan/mod.rs ## @@ -100,6 +100,30 @@ pub enum Distribution { SinglePartition, } +/// Represents the

[GitHub] [arrow] yordan-pavlov commented on a change in pull request #8660: ARROW-10173: [Rust][DataFusion] Implement support for direct comparison to scalar values

2020-11-13 Thread GitBox
yordan-pavlov commented on a change in pull request #8660: URL: https://github.com/apache/arrow/pull/8660#discussion_r523241911 ## File path: rust/datafusion/src/physical_plan/expressions.rs ## @@ -1288,18 +1363,84 @@ impl PhysicalExpr for BinaryExpr { Ok(self.left.nul

[GitHub] [arrow] andygrove commented on pull request #8654: ARROW-10572: [Rust][Datafusion] Use aHash instead of FnvHashMap

2020-11-13 Thread GitBox
andygrove commented on pull request #8654: URL: https://github.com/apache/arrow/pull/8654#issuecomment-727043583 @Dandandan Thanks for the links. That addresses my concern. We do have a benchmark crate in this repo with instructions for running a TPC-H with larger data sets but I don't see

[GitHub] [arrow] andygrove commented on pull request #8660: ARROW-10173: [Rust][DataFusion] Implement support for direct comparison to scalar values

2020-11-13 Thread GitBox
andygrove commented on pull request #8660: URL: https://github.com/apache/arrow/pull/8660#issuecomment-727041322 @yordan-pavlov I took a quick skim through and this is looking really good! Could you rebase? This is an autom

[GitHub] [arrow] andygrove commented on a change in pull request #8660: ARROW-10173: [Rust][DataFusion] Implement support for direct comparison to scalar values

2020-11-13 Thread GitBox
andygrove commented on a change in pull request #8660: URL: https://github.com/apache/arrow/pull/8660#discussion_r523238084 ## File path: rust/datafusion/src/physical_plan/mod.rs ## @@ -100,6 +100,30 @@ pub enum Distribution { SinglePartition, } +/// Represents the resu

[GitHub] [arrow] andygrove commented on a change in pull request #8660: ARROW-10173: [Rust][DataFusion] Implement support for direct comparison to scalar values

2020-11-13 Thread GitBox
andygrove commented on a change in pull request #8660: URL: https://github.com/apache/arrow/pull/8660#discussion_r523237664 ## File path: rust/datafusion/src/physical_plan/expressions.rs ## @@ -969,6 +975,42 @@ macro_rules! compute_utf8_op { }}; } +/// Invoke a compute

[GitHub] [arrow] andygrove commented on a change in pull request #8660: ARROW-10173: [Rust][DataFusion] Implement support for direct comparison to scalar values

2020-11-13 Thread GitBox
andygrove commented on a change in pull request #8660: URL: https://github.com/apache/arrow/pull/8660#discussion_r523237229 ## File path: rust/datafusion/src/physical_plan/expressions.rs ## @@ -1288,18 +1363,84 @@ impl PhysicalExpr for BinaryExpr { Ok(self.left.nullabl

[GitHub] [arrow] github-actions[bot] commented on pull request #8660: ARROW-10173: [Rust][DataFusion] Implement support for direct comparison to scalar values

2020-11-13 Thread GitBox
github-actions[bot] commented on pull request #8660: URL: https://github.com/apache/arrow/pull/8660#issuecomment-727004972 https://issues.apache.org/jira/browse/ARROW-10173 This is an automated message from the Apache Git Ser

[GitHub] [arrow] yordan-pavlov opened a new pull request #8660: ARROW-10173: [Rust][DataFusion] Implement support for direct comparison to scalar values

2020-11-13 Thread GitBox
yordan-pavlov opened a new pull request #8660: URL: https://github.com/apache/arrow/pull/8660 This PR addresses the inefficient comparison to scalar values, where an array is built with the scalar value repeated, by changing the return value of expressions from `Result` to `Result` wh

[GitHub] [arrow] BryanCutler commented on pull request #8057: ARROW-9862: [Java] Enable UnsafeDirectLittleEndian on a big-endian platform

2020-11-13 Thread GitBox
BryanCutler commented on pull request #8057: URL: https://github.com/apache/arrow/pull/8057#issuecomment-726960913 merged to master This is an automated message from the Apache Git Service. To respond to the message, please l

[GitHub] [arrow] BryanCutler closed pull request #8057: ARROW-9862: [Java] Enable UnsafeDirectLittleEndian on a big-endian platform

2020-11-13 Thread GitBox
BryanCutler closed pull request #8057: URL: https://github.com/apache/arrow/pull/8057 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[GitHub] [arrow] BryanCutler commented on a change in pull request #8057: ARROW-9862: [Java] Enable UnsafeDirectLittleEndian on a big-endian platform

2020-11-13 Thread GitBox
BryanCutler commented on a change in pull request #8057: URL: https://github.com/apache/arrow/pull/8057#discussion_r523152926 ## File path: java/memory/memory-netty/src/main/java/io/netty/buffer/UnsafeDirectLittleEndian.java ## @@ -60,9 +59,6 @@ private UnsafeDirectLittle

[GitHub] [arrow] arw2019 commented on pull request #8657: ARROW-7363: [Python] add combine_chunks method to ChunkedArray

2020-11-13 Thread GitBox
arw2019 commented on pull request #8657: URL: https://github.com/apache/arrow/pull/8657#issuecomment-726943228 > I don't know if there is interest in having a C++ `ChunkedArray::CombineChunks()` method as well (similarly as there is a `Table::CombineChunks`), but that can also be added lat

[GitHub] [arrow] arw2019 commented on a change in pull request #8657: ARROW-7363: [Python] add combine_chunks method to ChunkedArray

2020-11-13 Thread GitBox
arw2019 commented on a change in pull request #8657: URL: https://github.com/apache/arrow/pull/8657#discussion_r523124148 ## File path: docs/source/python/api/tables.rst ## @@ -29,6 +29,7 @@ Factory Functions :toctree: ../generated/ chunked_array + combine_chunks R

[GitHub] [arrow] alamb commented on pull request #8553: ARROW-10366: [Rust][DataFusion] Do not buffer intermediate results in merge or HashAggregate

2020-11-13 Thread GitBox
alamb commented on pull request #8553: URL: https://github.com/apache/arrow/pull/8553#issuecomment-726900488 I plan to merge this tomorrow unless i hear otherwise. @jorgecarleitao / @andygrove let me know if you have any concerns -

[GitHub] [arrow] rdettai commented on pull request #8553: ARROW-10366: [Rust][DataFusion] Do not buffer intermediate results in merge or HashAggregate

2020-11-13 Thread GitBox
rdettai commented on pull request #8553: URL: https://github.com/apache/arrow/pull/8553#issuecomment-726895657 I had some code that was crashing because of the behavior aggregate had when the wrapped exec first returned `Pending` when being polled. It know works perfectly with this PR! Tha

[GitHub] [arrow] rdettai commented on pull request #8658: ARROW-10577: [Rust][DataFusion] Hash Aggregator stream finishes unexpectedly after going to Pending state

2020-11-13 Thread GitBox
rdettai commented on pull request #8658: URL: https://github.com/apache/arrow/pull/8658#issuecomment-726879119 Well, it is great that I did not start fixing the problem and I first focused on building a test that pinpointed the issue. Long live the TDD! 😄 I'll rebase this as soon as

[GitHub] [arrow] alamb closed pull request #8567: ARROW-10455: [Rust] [CI] Fixed error in caching files

2020-11-13 Thread GitBox
alamb closed pull request #8567: URL: https://github.com/apache/arrow/pull/8567 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] alamb commented on pull request #8658: ARROW-10577: [Rust][DataFusion] Hash Aggregator stream finishes unexpectedly after going to Pending state

2020-11-13 Thread GitBox
alamb commented on pull request #8658: URL: https://github.com/apache/arrow/pull/8658#issuecomment-726864487 I will try and review it later today or tomorrow morning (UTC+5) time. This is an automated message from the Apache

[GitHub] [arrow] alamb commented on pull request #8658: ARROW-10577: [Rust][DataFusion] Hash Aggregator stream finishes unexpectedly after going to Pending state

2020-11-13 Thread GitBox
alamb commented on pull request #8658: URL: https://github.com/apache/arrow/pull/8658#issuecomment-726864283 https://github.com/apache/arrow/pull/8553 maybe be also related -- but it is a much more invasive change than this PR --

[GitHub] [arrow] github-actions[bot] commented on pull request #8659: ARROW-10480: [Python] don't infer compression by extension for Parquet

2020-11-13 Thread GitBox
github-actions[bot] commented on pull request #8659: URL: https://github.com/apache/arrow/pull/8659#issuecomment-726863521 https://issues.apache.org/jira/browse/ARROW-10480 This is an automated message from the Apache Git Ser

[GitHub] [arrow] lidavidm opened a new pull request #8659: ARROW-10480: [Python] don't infer compression by extension for Parquet

2020-11-13 Thread GitBox
lidavidm opened a new pull request #8659: URL: https://github.com/apache/arrow/pull/8659 While files like "foo.parquet.gz" are nonstandard, we nonetheless shouldn't autodetect compression due to such a naming scheme. This is

[GitHub] [arrow] Dandandan edited a comment on pull request #8654: ARROW-10572: [Rust][Datafusion] Use aHash instead of FnvHashMap

2020-11-13 Thread GitBox
Dandandan edited a comment on pull request #8654: URL: https://github.com/apache/arrow/pull/8654#issuecomment-726843795 A nice overview is listed here: https://github.com/tkaitchuck/aHash/blob/master/compare/readme.md comparing aHash to other algorithms. aHash passes the full suite of htt

[GitHub] [arrow] arw2019 commented on a change in pull request #8294: ARROW-1846: [C++][Compute] Implement "any" reduction kernel for boolean data

2020-11-13 Thread GitBox
arw2019 commented on a change in pull request #8294: URL: https://github.com/apache/arrow/pull/8294#discussion_r523059162 ## File path: cpp/src/arrow/compute/api_aggregate.cc ## @@ -41,8 +41,12 @@ Result MinMax(const Datum& value, const MinMaxOptions& options, ExecConte ret

[GitHub] [arrow] Dandandan edited a comment on pull request #8654: ARROW-10572: [Rust][Datafusion] Use aHash instead of FnvHashMap

2020-11-13 Thread GitBox
Dandandan edited a comment on pull request #8654: URL: https://github.com/apache/arrow/pull/8654#issuecomment-726843795 A nice overview is listed here: https://github.com/tkaitchuck/aHash/blob/master/compare/readme.md comparing aHash to other algorithms. aHash passes the full suit of http

[GitHub] [arrow] Dandandan commented on pull request #8654: ARROW-10572: [Rust][Datafusion] Use aHash instead of FnvHashMap

2020-11-13 Thread GitBox
Dandandan commented on pull request #8654: URL: https://github.com/apache/arrow/pull/8654#issuecomment-726843795 A nice overview is listed here: https://github.com/tkaitchuck/aHash/blob/master/compare/readme.md comparing aHash to other algorithms. aHash passes the full suit of https://git

[GitHub] [arrow] andygrove commented on pull request #8654: ARROW-10572: [Rust][Datafusion] Use aHash instead of FnvHashMap

2020-11-13 Thread GitBox
andygrove commented on pull request #8654: URL: https://github.com/apache/arrow/pull/8654#issuecomment-726839597 Do we have a feel for the performance implications of this change for large data sets as opposed to the micro benchmarks? --

[GitHub] [arrow] kszucs commented on pull request #8567: ARROW-10455: [Rust] [CI] Fixed error in caching files

2020-11-13 Thread GitBox
kszucs commented on pull request #8567: URL: https://github.com/apache/arrow/pull/8567#issuecomment-726838793 @alamb nope, it's good to go. This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] github-actions[bot] commented on pull request #8658: ARROW-10577: [Rust][DataFusion] Hash Aggregator stream finishes unexpectedly after going to Pending state

2020-11-13 Thread GitBox
github-actions[bot] commented on pull request #8658: URL: https://github.com/apache/arrow/pull/8658#issuecomment-726825009 https://issues.apache.org/jira/browse/ARROW-10577 This is an automated message from the Apache Git Ser

[GitHub] [arrow] rdettai edited a comment on pull request #8658: ARROW-10577: [Rust][DataFusion] Hash Aggregator stream finishes unexpectedly after going to Pending state

2020-11-13 Thread GitBox
rdettai edited a comment on pull request #8658: URL: https://github.com/apache/arrow/pull/8658#issuecomment-726822182 I initially considered creating a first class citizen `YieldingExec` (https://gist.github.com/rdettai/c2045be688d457cb346c41e8769ed5d8), but as it will likely only be used

[GitHub] [arrow] rdettai commented on pull request #8658: ARROW-10577: [Rust][DataFusion] Hash Aggregator stream finishes unexpectedly after going to Pending state

2020-11-13 Thread GitBox
rdettai commented on pull request #8658: URL: https://github.com/apache/arrow/pull/8658#issuecomment-726822182 I initially considered creating a first class citizen `YieldingExec` (https://gist.github.com/rdettai/c2045be688d457cb346c41e8769ed5d8), but as it will likely only be used in the

[GitHub] [arrow] rdettai opened a new pull request #8658: ARROW-10577: [Rust][DataFusion] Hash Aggregator stream finishes unexpectedly after going to Pending state

2020-11-13 Thread GitBox
rdettai opened a new pull request #8658: URL: https://github.com/apache/arrow/pull/8658 > This happens when executing a DataFusion query plan with hash aggregation where the data source is not ready on the first call by the Executor, and the async state machine is passed to a pending state

[GitHub] [arrow] kszucs commented on pull request #8650: ARROW-10530: [R] Use Converter API to convert SEXP to Array/ChunkedArray

2020-11-13 Thread GitBox
kszucs commented on pull request #8650: URL: https://github.com/apache/arrow/pull/8650#issuecomment-726818274 cc @bkietz since we co-authored the python-side refactor This is an automated message from the Apache Git Service.

[GitHub] [arrow] andygrove commented on pull request #8283: ARROW-9707: [Rust] [DataFusion] DataFusion Scheduler Prototype [WIP]

2020-11-13 Thread GitBox
andygrove commented on pull request #8283: URL: https://github.com/apache/arrow/pull/8283#issuecomment-726811184 @alamb That is a good question. I have been too busy at work lately to work on Arrow/DataFusion/Ballista but I have been spending some time contemplating where to go next.

[GitHub] [arrow] bkietz commented on pull request #8472: ARROW-8113: [C++][WIP] Lighter weight variant<>

2020-11-13 Thread GitBox
bkietz commented on pull request #8472: URL: https://github.com/apache/arrow/pull/8472#issuecomment-726799854 @pitrou yes but not soon This is an automated message from the Apache Git Service. To respond to the message, pleas

[GitHub] [arrow] bkietz closed pull request #8652: ARROW-10566: [C++] Allow validating ArrayData directly

2020-11-13 Thread GitBox
bkietz closed pull request #8652: URL: https://github.com/apache/arrow/pull/8652 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] bkietz commented on a change in pull request #8652: ARROW-10566: [C++] Allow validating ArrayData directly

2020-11-13 Thread GitBox
bkietz commented on a change in pull request #8652: URL: https://github.com/apache/arrow/pull/8652#discussion_r522984668 ## File path: cpp/src/arrow/array/validate.cc ## @@ -392,96 +376,159 @@ Status ValidateArray(const Array& array) { type.ToString(

[GitHub] [arrow] romainfrancois edited a comment on pull request #8650: ARROW-10530: [R] Use Converter API to convert SEXP to Array/ChunkedArray

2020-11-13 Thread GitBox
romainfrancois edited a comment on pull request #8650: URL: https://github.com/apache/arrow/pull/8650#issuecomment-726789546 ``` r library(arrow, warn.conflicts = FALSE) arrow:::vec_to_arrow(1:2, int32()) #> Array #> #> [ #> 1, #> 2 #> ] arrow:::vec_to_arr

[GitHub] [arrow] paddyhoran commented on pull request #8654: ARROW-10572: [Rust][Datafusion] Use aHash instead of FnvHashMap

2020-11-13 Thread GitBox
paddyhoran commented on pull request #8654: URL: https://github.com/apache/arrow/pull/8654#issuecomment-726793208 > I prefer using t1ha than ahash, which proven to be sound. The `t1ha` crate is `Licensed under zlib License`. I don't think that is compatible with Apache (`ahash` is).

[GitHub] [arrow] romainfrancois commented on pull request #8650: ARROW-10530: [R] Use Converter API to convert SEXP to Array/ChunkedArray

2020-11-13 Thread GitBox
romainfrancois commented on pull request #8650: URL: https://github.com/apache/arrow/pull/8650#issuecomment-726789546 ``` r library(arrow, warn.conflicts = FALSE) arrow:::vec_to_arrow(1:2, int32()) #> Array #> #> [ #> 1, #> 2 #> ] arrow:::vec_to_arrow(c(1,

[GitHub] [arrow] romainfrancois commented on pull request #8650: ARROW-10530: [R] Use Converter API to convert SEXP to Array/ChunkedArray

2020-11-13 Thread GitBox
romainfrancois commented on pull request #8650: URL: https://github.com/apache/arrow/pull/8650#issuecomment-726775206 Thanks @kszucs for the direct help. This is very far from done, but it's a start, and perhaps we can resume the conversation here. AFAIK, There is no R equivalent to

[GitHub] [arrow] jorisvandenbossche closed pull request #8212: ARROW-9636: [Python] Update documentation about 'LZO' compression in parquet.write_table

2020-11-13 Thread GitBox
jorisvandenbossche closed pull request #8212: URL: https://github.com/apache/arrow/pull/8212 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [arrow] jorisvandenbossche commented on pull request #8212: ARROW-9636: [Python] Update documentation about 'LZO' compression in parquet.write_table

2020-11-13 Thread GitBox
jorisvandenbossche commented on pull request #8212: URL: https://github.com/apache/arrow/pull/8212#issuecomment-726769806 Since the Java implementation of Parquet has LZO, I suppose that's a good enough confirmation ;) This

[GitHub] [arrow] vertexclique edited a comment on pull request #8598: ARROW-10500: [Rust] Refactor bit slice, bit view iterator for array buffers

2020-11-13 Thread GitBox
vertexclique edited a comment on pull request #8598: URL: https://github.com/apache/arrow/pull/8598#issuecomment-726767196 Yes, it won't until that method rewritten using the bit-slice iterator :) written here: https://github.com/apache/arrow/pull/8645#issuecomment-725957761 p.s: totally

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #8474: ARROW-10301: [C++][Compute] Implement "all" reduction kernel for boolean data

2020-11-13 Thread GitBox
jorisvandenbossche commented on a change in pull request #8474: URL: https://github.com/apache/arrow/pull/8474#discussion_r522953360 ## File path: cpp/src/arrow/compute/kernels/aggregate_basic.cc ## @@ -151,6 +151,45 @@ std::unique_ptr MinMaxInit(KernelContext* ctx, const Kern

[GitHub] [arrow] vertexclique edited a comment on pull request #8598: ARROW-10500: [Rust] Refactor bit slice, bit view iterator for array buffers

2020-11-13 Thread GitBox
vertexclique edited a comment on pull request #8598: URL: https://github.com/apache/arrow/pull/8598#issuecomment-726767196 Yes it won't until that method rewritten using the bit slice iterator :) p.s: totally unrelated topic, how do you run Valgrind on mac? -

[GitHub] [arrow] vertexclique commented on pull request #8598: ARROW-10500: [Rust] Refactor bit slice, bit view iterator for array buffers

2020-11-13 Thread GitBox
vertexclique commented on pull request #8598: URL: https://github.com/apache/arrow/pull/8598#issuecomment-726767196 Yes it won't until that method rewritten using the bit slice iterator :) This is an automated message from th

[GitHub] [arrow] vertexclique commented on pull request #8654: ARROW-10572: [Rust][Datafusion] Use aHash instead of FnvHashMap

2020-11-13 Thread GitBox
vertexclique commented on pull request #8654: URL: https://github.com/apache/arrow/pull/8654#issuecomment-726764889 I prefer using t1ha than ahash, which proven to be sound. This is an automated message from the Apache Git Se

[GitHub] [arrow] alamb closed pull request #8656: ARROW-10575: [Rust] Rename union.rs to be cosistent with other arrays

2020-11-13 Thread GitBox
alamb closed pull request #8656: URL: https://github.com/apache/arrow/pull/8656 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] paddyhoran commented on pull request #8656: ARROW-10575: [Rust] Rename union.rs to be cosistent with other arrays

2020-11-13 Thread GitBox
paddyhoran commented on pull request #8656: URL: https://github.com/apache/arrow/pull/8656#issuecomment-726763619 > The https://github.com/apache/arrow/pull/8656/checks?check_run_id=1394039681 build on travis seems to have been queued for many hours at this point. I am thinking that mergi

[GitHub] [arrow] alamb commented on pull request #8654: ARROW-10572: [Rust][Datafusion] Use aHash instead of FnvHashMap

2020-11-13 Thread GitBox
alamb commented on pull request #8654: URL: https://github.com/apache/arrow/pull/8654#issuecomment-726760544 There appears to be a diff int he CI tests: https://github.com/apache/arrow/pull/8654/checks?check_run_id=1392391221 ``` execution::context::tests::count_distin

[GitHub] [arrow] alamb commented on pull request #8656: ARROW-10575: [Rust] Rename union.rs to be cosistent with other arrays

2020-11-13 Thread GitBox
alamb commented on pull request #8656: URL: https://github.com/apache/arrow/pull/8656#issuecomment-726759319 The https://github.com/apache/arrow/pull/8656/checks?check_run_id=1394039681 build on travis seems to have been queued for many hours at this point. I am thinking that merging this

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #8294: ARROW-1846: [C++][Compute] Implement "any" reduction kernel for boolean data

2020-11-13 Thread GitBox
jorisvandenbossche commented on a change in pull request #8294: URL: https://github.com/apache/arrow/pull/8294#discussion_r522942123 ## File path: cpp/src/arrow/compute/api_aggregate.cc ## @@ -41,8 +41,12 @@ Result MinMax(const Datum& value, const MinMaxOptions& options, ExecC

[GitHub] [arrow] alamb commented on pull request #8567: ARROW-10455: [Rust] [CI] Fixed error in caching files

2020-11-13 Thread GitBox
alamb commented on pull request #8567: URL: https://github.com/apache/arrow/pull/8567#issuecomment-726757507 @jorgecarleitao / @nevi-me / @kszucs is there any outstanding work for this PR or shall we merge it in? This is a

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #8294: ARROW-1846: [C++][Compute] Implement "any" reduction kernel for boolean data

2020-11-13 Thread GitBox
jorisvandenbossche commented on a change in pull request #8294: URL: https://github.com/apache/arrow/pull/8294#discussion_r522941736 ## File path: cpp/src/arrow/compute/api_aggregate.h ## @@ -154,7 +154,21 @@ Result MinMax(const Datum& value, const MinMaxO

[GitHub] [arrow] jorgecarleitao edited a comment on pull request #8645: ARROW-10561: [Rust] Simplified Buffer's `write` and `write_bytes` and fixed undefined behavior

2020-11-13 Thread GitBox
jorgecarleitao edited a comment on pull request #8645: URL: https://github.com/apache/arrow/pull/8645#issuecomment-726756578 Thanks a lot, @alamb , really useful data points ❤️ For me that is enough of a reason: fix UB with `safe` code, and figure out a way to perform multi-bit assig

[GitHub] [arrow] jorgecarleitao commented on pull request #8645: ARROW-10561: [Rust] Simplified Buffer's `write` and `write_bytes` and fixed undefined behavior

2020-11-13 Thread GitBox
jorgecarleitao commented on pull request #8645: URL: https://github.com/apache/arrow/pull/8645#issuecomment-726756578 Thanks a lot, @alamb , really useful data points ❤️ For me that is enough of a reason: fix UB with `safe` code, and figure out a way to perform multi-bit assignment o

[GitHub] [arrow] alamb commented on pull request #8283: ARROW-9707: [Rust] [DataFusion] DataFusion Scheduler Prototype [WIP]

2020-11-13 Thread GitBox
alamb commented on pull request #8283: URL: https://github.com/apache/arrow/pull/8283#issuecomment-726756633 @andygrove I wonder what, if anything, you plan to do with this PR now This is an automated message from the Apache

[GitHub] [arrow] alamb commented on a change in pull request #8645: ARROW-10561: [Rust] Simplified Buffer's `write` and `write_bytes` and fixed undefined behavior

2020-11-13 Thread GitBox
alamb commented on a change in pull request #8645: URL: https://github.com/apache/arrow/pull/8645#discussion_r522935301 ## File path: rust/arrow/src/util/bit_util.rs ## @@ -99,36 +99,6 @@ pub unsafe fn unset_bit_raw(data: *mut u8, i: usize) { *data.add(i >> 3) ^= BIT_MASK[

[GitHub] [arrow] alamb commented on pull request #8645: ARROW-10561: [Rust] Simplified Buffer's `write` and `write_bytes` and fixed undefined behavior

2020-11-13 Thread GitBox
alamb commented on pull request #8645: URL: https://github.com/apache/arrow/pull/8645#issuecomment-726749635 FWIW I ran the code in #8598 under valgrind and it does not appear to fix the issue https://github.com/apache/arrow/pull/8598#issuecomment-726749085 ---

[GitHub] [arrow] alamb edited a comment on pull request #8645: ARROW-10561: [Rust] Simplified Buffer's `write` and `write_bytes` and fixed undefined behavior

2020-11-13 Thread GitBox
alamb edited a comment on pull request #8645: URL: https://github.com/apache/arrow/pull/8645#issuecomment-726749635 FWIW I ran the code in #8598 under valgrind and it does not appear to fix the issue -- see details in https://github.com/apache/arrow/pull/8598#issuecomment-726749085 -

[GitHub] [arrow] alamb commented on pull request #8598: ARROW-10500: [Rust] Refactor bit slice, bit view iterator for array buffers

2020-11-13 Thread GitBox
alamb commented on pull request #8598: URL: https://github.com/apache/arrow/pull/8598#issuecomment-726749085 Some additional data: I ran the tests under valgrind (as described in https://github.com/apache/arrow/pull/8645#issuecomment-726736494) on this branch after rebasing against master.

[GitHub] [arrow] maartenbreddels commented on pull request #8628: ARROW-9489: [C++] Add fill_null kernel implementation for (array[string], scalar[string])

2020-11-13 Thread GitBox
maartenbreddels commented on pull request #8628: URL: https://github.com/apache/arrow/pull/8628#issuecomment-726748561 @pitrou i think this is ready to go/review. This is an automated message from the Apache Git Service. To r

[GitHub] [arrow] alamb commented on pull request #8645: ARROW-10561: [Rust] Simplified Buffer's `write` and `write_bytes` and fixed undefined behavior

2020-11-13 Thread GitBox
alamb commented on pull request #8645: URL: https://github.com/apache/arrow/pull/8645#issuecomment-726742048 > (if you be so kind, could you quickly run fd75933 , just to test whether this PR addresses the issue?) @jorgecarleitao -- I did so. There are no errors reported by valgrind

[GitHub] [arrow] jorgecarleitao commented on pull request #8645: ARROW-10561: [Rust] Simplified Buffer's `write` and `write_bytes` and fixed undefined behavior

2020-11-13 Thread GitBox
jorgecarleitao commented on pull request #8645: URL: https://github.com/apache/arrow/pull/8645#issuecomment-726737738 (if you be so kind, could you quickly run fd75933 , just to test whether this PR addresses the issue?) Thi

[GitHub] [arrow] alamb commented on pull request #8645: ARROW-10561: [Rust] Simplified Buffer's `write` and `write_bytes` and fixed undefined behavior

2020-11-13 Thread GitBox
alamb commented on pull request #8645: URL: https://github.com/apache/arrow/pull/8645#issuecomment-726736494 In terms of evidence that there is a problem on master, I ran the arrow test suite under `valgind` @ 30516049522c1a527ffb375e7790102f58edb4f9 on master and it does flag an invalid r

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #8657: ARROW-7363: [Python] add combine_chunks method to ChunkedArray

2020-11-13 Thread GitBox
jorisvandenbossche commented on a change in pull request #8657: URL: https://github.com/apache/arrow/pull/8657#discussion_r522913647 ## File path: python/pyarrow/tests/test_array.py ## @@ -2643,6 +2643,15 @@ def test_concat_array_invalid_type(): pa.concat_arrays(arr)

[GitHub] [arrow] alamb commented on pull request #8645: ARROW-10561: [Rust] Simplified Buffer's `write` and `write_bytes` and fixed undefined behavior

2020-11-13 Thread GitBox
alamb commented on pull request #8645: URL: https://github.com/apache/arrow/pull/8645#issuecomment-726724609 I am going to take another hard look at https://github.com/apache/arrow/pull/8598 and see if we can get enough consensus to get it merged -

[GitHub] [arrow] alamb commented on a change in pull request #8656: ARROW-10575: [Rust] Rename union.rs to be cosistent with other arrays

2020-11-13 Thread GitBox
alamb commented on a change in pull request #8656: URL: https://github.com/apache/arrow/pull/8656#discussion_r522898144 ## File path: rust/arrow/src/array/mod.rs ## @@ -121,8 +121,8 @@ pub use self::array_primitive::PrimitiveArray; pub use self::array_string::LargeStringArray;

[GitHub] [arrow] liyafan82 commented on a change in pull request #8210: ARROW-10031: [CI][Java] Support Java benchmark in Ursabot

2020-11-13 Thread GitBox
liyafan82 commented on a change in pull request #8210: URL: https://github.com/apache/arrow/pull/8210#discussion_r522824604 ## File path: java/performance/pom.xml ## @@ -169,10 +173,17 @@ ${benchmark.filter} -f

[GitHub] [arrow] bkietz closed pull request #8582: ARROW-10483: [C++] Move Executor to future.h

2020-11-13 Thread GitBox
bkietz closed pull request #8582: URL: https://github.com/apache/arrow/pull/8582 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] chr1st1ank commented on issue #8607: Deletion of existing file when write_table fails

2020-11-13 Thread GitBox
chr1st1ank commented on issue #8607: URL: https://github.com/apache/arrow/issues/8607#issuecomment-726609910 This can be reproduced with the following commands in ipython. In effect the attempt to write to a file without write permissions to it results in the deletion of this file (of co