[GitHub] [arrow] Dandandan commented on pull request #9038: ARROW-10356: [Rust][DataFusion] Add support for is_in (WIP)

2020-12-30 Thread GitBox
Dandandan commented on pull request #9038: URL: https://github.com/apache/arrow/pull/9038#issuecomment-752382480 1. Yes I think there should be a different/more efficient implementation that handles the "scalar" case, where the scalar in this case is the list with values. 2. I believe t

[GitHub] [arrow] jorgecarleitao opened a new pull request #9044: ARROW-11045: [Rust] Fix performance issues of allocator

2020-12-30 Thread GitBox
jorgecarleitao opened a new pull request #9044: URL: https://github.com/apache/arrow/pull/9044 This PR addresses a performance issue in how we allocate and reallocate the `MutableBuffer`. # Problem See #9032 # This PR This PR changes `MutableBuffer::reserve` to c

[GitHub] [arrow] mqy commented on a change in pull request #9025: ARROW-10259: [Rust] Add custom metadata to Field

2020-12-30 Thread GitBox
mqy commented on a change in pull request #9025: URL: https://github.com/apache/arrow/pull/9025#discussion_r55043 ## File path: rust/arrow/src/datatypes.rs ## @@ -1594,7 +1620,7 @@ impl Field { impl fmt::Display for Field { fn fmt(&self, f: &mut fmt::Formatter) -> f

[GitHub] [arrow] mqy commented on a change in pull request #9025: ARROW-10259: [Rust] Add custom metadata to Field

2020-12-30 Thread GitBox
mqy commented on a change in pull request #9025: URL: https://github.com/apache/arrow/pull/9025#discussion_r55043 ## File path: rust/arrow/src/datatypes.rs ## @@ -1594,7 +1620,7 @@ impl Field { impl fmt::Display for Field { fn fmt(&self, f: &mut fmt::Formatter) -> f

[GitHub] [arrow] github-actions[bot] commented on pull request #9044: ARROW-11045: [Rust] Fix performance issues of allocator

2020-12-30 Thread GitBox
github-actions[bot] commented on pull request #9044: URL: https://github.com/apache/arrow/pull/9044#issuecomment-752403701 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then could

[GitHub] [arrow] mqy commented on a change in pull request #9025: ARROW-10259: [Rust] Add custom metadata to Field

2020-12-30 Thread GitBox
mqy commented on a change in pull request #9025: URL: https://github.com/apache/arrow/pull/9025#discussion_r550113231 ## File path: rust/arrow/src/util/integration_util.rs ## @@ -60,13 +60,22 @@ pub struct ArrowJsonField { impl From<&Field> for ArrowJsonField { fn from(

[GitHub] [arrow] mqy commented on a change in pull request #9025: ARROW-10259: [Rust] Add custom metadata to Field

2020-12-30 Thread GitBox
mqy commented on a change in pull request #9025: URL: https://github.com/apache/arrow/pull/9025#discussion_r550113422 ## File path: rust/arrow/src/datatypes.rs ## @@ -1903,9 +1929,20 @@ mod tests { #[test] fn serde_struct_type() { +let kv_array = [("k".to_st

[GitHub] [arrow] mqy commented on pull request #9041: ARROW-11063: [Rust] [Breaking] Validate null counts when building arrays

2020-12-30 Thread GitBox
mqy commented on pull request #9041: URL: https://github.com/apache/arrow/pull/9041#issuecomment-752413596 @nevi-me format in this way should avoid the problem I have fixed it in https://github.com/apache/arrow/pull/9025 Also I'm thinking about add `cargo +nightly-2020-11-24-x8

[GitHub] [arrow] mqy commented on a change in pull request #9011: ARROW-9777: [Rust] [IPC] write custom metadata [WIP]

2020-12-30 Thread GitBox
mqy commented on a change in pull request #9011: URL: https://github.com/apache/arrow/pull/9011#discussion_r550138436 ## File path: rust/arrow/src/ipc/writer.rs ## @@ -355,41 +381,83 @@ pub struct FileWriter { impl FileWriter { /// Try create a new writer, with the sche

[GitHub] [arrow] mqy commented on pull request #9011: ARROW-9777: [Rust] [IPC] write custom metadata [WIP]

2020-12-30 Thread GitBox
mqy commented on pull request #9011: URL: https://github.com/apache/arrow/pull/9011#issuecomment-752415900 > Hey @mqy, I'm on vacation, so I haven't been checking the project much. > > [ARROW-10299](https://issues.apache.org/jira/browse/ARROW-10299) is for implementing ipc::MetadataV

[GitHub] [arrow] mqy commented on a change in pull request #9025: ARROW-10259: [Rust] Add custom metadata to Field

2020-12-30 Thread GitBox
mqy commented on a change in pull request #9025: URL: https://github.com/apache/arrow/pull/9025#discussion_r549939736 ## File path: rust/arrow/src/datatypes.rs ## @@ -22,6 +22,7 @@ //! * [`Field`](crate::datatypes::Field) to describe one field within a schema. //! * [`Data

[GitHub] [arrow] Dandandan commented on pull request #9042: ARROW-11064: [Rust][DataFusion] Speed up hash join on smaller batches

2020-12-30 Thread GitBox
Dandandan commented on pull request #9042: URL: https://github.com/apache/arrow/pull/9042#issuecomment-752418069 @jorgecarleitao Yes, small things can have a big effect... The difference is a bit "magnified" though because of the `n*n` behavior in the join. If you have ideas of ho

[GitHub] [arrow] jhorstmann commented on pull request #9040: ARROW-11055: [Rust] [DataFusion] Support date_trunc function

2020-12-30 Thread GitBox
jhorstmann commented on pull request #9040: URL: https://github.com/apache/arrow/pull/9040#issuecomment-752424442 Cool! A few small comments: - In postgres the argument order is the other way around [`date_trunc('week', timestamp)`][1]. I haven't compared with other databases, but

[GitHub] [arrow] Dandandan commented on a change in pull request #9044: ARROW-11045: [Rust] Fix performance issues of allocator

2020-12-30 Thread GitBox
Dandandan commented on a change in pull request #9044: URL: https://github.com/apache/arrow/pull/9044#discussion_r550162644 ## File path: rust/arrow/src/memory.rs ## @@ -180,65 +187,62 @@ pub unsafe fn free_aligned(ptr: *mut u8, size: usize) { /// /// * new_size, when rounded

[GitHub] [arrow] alamb commented on a change in pull request #9043: ARROW-11058: [Rust] [DataFusion] Implement coalesce batches operator

2020-12-30 Thread GitBox
alamb commented on a change in pull request #9043: URL: https://github.com/apache/arrow/pull/9043#discussion_r550163355 ## File path: rust/datafusion/src/physical_plan/planner.rs ## @@ -110,6 +111,16 @@ impl DefaultPhysicalPlanner { // leaf node, children cannot be

[GitHub] [arrow] MarcoGorelli opened a new pull request #9045: Fixup pre-commit-config.yaml so that cmake format runs

2020-12-30 Thread GitBox
MarcoGorelli opened a new pull request #9045: URL: https://github.com/apache/arrow/pull/9045 Previously, cmake format wasn't running (likely because there were two `entry` keys, so the command getting overridden) - furthermore, it was unnecessarily slow as it didn't take advantage of pre-c

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9044: ARROW-11045: [Rust] Fix performance issues of allocator

2020-12-30 Thread GitBox
jorgecarleitao commented on a change in pull request #9044: URL: https://github.com/apache/arrow/pull/9044#discussion_r550168432 ## File path: rust/arrow/src/memory.rs ## @@ -180,65 +187,62 @@ pub unsafe fn free_aligned(ptr: *mut u8, size: usize) { /// /// * new_size, when ro

[GitHub] [arrow] github-actions[bot] commented on pull request #9045: Fixup pre-commit-config.yaml so that cmake format runs

2020-12-30 Thread GitBox
github-actions[bot] commented on pull request #9045: URL: https://github.com/apache/arrow/pull/9045#issuecomment-752475426 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then could

[GitHub] [arrow] alamb opened a new pull request #9046: fix lint error

2020-12-30 Thread GitBox
alamb opened a new pull request #9046: URL: https://github.com/apache/arrow/pull/9046 Rustfmt error was introduced in this PR: https://github.com/apache/arrow/commit/30ce2eb5d4dc6136594f005f6b7ec7315afc9a88 This PR fixes the issue

[GitHub] [arrow] alamb commented on pull request #9041: ARROW-11063: [Rust] [Breaking] Validate null counts when building arrays

2020-12-30 Thread GitBox
alamb commented on pull request #9041: URL: https://github.com/apache/arrow/pull/9041#issuecomment-752490751 🤦 Sorry @nevi-me -- I also have a PR here to fix it: https://github.com/apache/arrow/pull/9046 I think it is blocking other PRs so I will merge https://github.com/apache/ar

[GitHub] [arrow] alamb commented on pull request #9046: ARROW-11073: [Rust] fix lint error in in /arrow/rust/arrow/src/ipc/reader.rs

2020-12-30 Thread GitBox
alamb commented on pull request #9046: URL: https://github.com/apache/arrow/pull/9046#issuecomment-752495469 Lint is passed, so I am merging this in to unblock master / other PRs. This is an automated message from the Apache

[GitHub] [arrow] alamb closed pull request #9046: ARROW-11073: [Rust] fix lint error in in /arrow/rust/arrow/src/ipc/reader.rs

2020-12-30 Thread GitBox
alamb closed pull request #9046: URL: https://github.com/apache/arrow/pull/9046 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] mqy commented on a change in pull request #9025: ARROW-10259: [Rust] Add custom metadata to Field

2020-12-30 Thread GitBox
mqy commented on a change in pull request #9025: URL: https://github.com/apache/arrow/pull/9025#discussion_r550173817 ## File path: rust/arrow/src/datatypes.rs ## @@ -1903,9 +1929,20 @@ mod tests { #[test] fn serde_struct_type() { +let kv_array = [("k".to_st

[GitHub] [arrow] mqy commented on a change in pull request #9025: ARROW-10259: [Rust] Add custom metadata to Field

2020-12-30 Thread GitBox
mqy commented on a change in pull request #9025: URL: https://github.com/apache/arrow/pull/9025#discussion_r550173817 ## File path: rust/arrow/src/datatypes.rs ## @@ -1903,9 +1929,20 @@ mod tests { #[test] fn serde_struct_type() { +let kv_array = [("k".to_st

[GitHub] [arrow] sweb opened a new pull request #9047: Arrow 11072: [Rust] [Parquet] Support reading decimal from physical int types

2020-12-30 Thread GitBox
sweb opened a new pull request #9047: URL: https://github.com/apache/arrow/pull/9047 This PR adds capabilities to read decimal columns in parquet files that store them as i32 or i64. I tried to follow the approach in #8926 by using casts. However, there is an issue with my solution

[GitHub] [arrow] github-actions[bot] commented on pull request #9047: Arrow 11072: [Rust] [Parquet] Support reading decimal from physical int types

2020-12-30 Thread GitBox
github-actions[bot] commented on pull request #9047: URL: https://github.com/apache/arrow/pull/9047#issuecomment-752563024 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then could

[GitHub] [arrow] nevi-me commented on pull request #9046: ARROW-11073: [Rust] fix lint error in in /arrow/rust/arrow/src/ipc/reader.rs

2020-12-30 Thread GitBox
nevi-me commented on pull request #9046: URL: https://github.com/apache/arrow/pull/9046#issuecomment-752593629 Thanks @alamb This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [arrow] mqy commented on pull request #9047: Arrow 11072: [Rust] [Parquet] Support reading decimal from physical int types

2020-12-30 Thread GitBox
mqy commented on pull request #9047: URL: https://github.com/apache/arrow/pull/9047#issuecomment-752615117 ARROW-${JIRA_ID} :) This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [arrow] nevi-me commented on a change in pull request #9025: ARROW-10259: [Rust] Add custom metadata to Field

2020-12-30 Thread GitBox
nevi-me commented on a change in pull request #9025: URL: https://github.com/apache/arrow/pull/9025#discussion_r550200770 ## File path: rust/arrow/src/datatypes.rs ## @@ -1594,7 +1620,7 @@ impl Field { impl fmt::Display for Field { fn fmt(&self, f: &mut fmt::Formatter)

[GitHub] [arrow] nevi-me commented on a change in pull request #9025: ARROW-10259: [Rust] Add custom metadata to Field

2020-12-30 Thread GitBox
nevi-me commented on a change in pull request #9025: URL: https://github.com/apache/arrow/pull/9025#discussion_r550201222 ## File path: rust/arrow/src/datatypes.rs ## @@ -1903,9 +1929,20 @@ mod tests { #[test] fn serde_struct_type() { +let kv_array = [("k".t

[GitHub] [arrow] andygrove commented on a change in pull request #9043: ARROW-11058: [Rust] [DataFusion] Implement coalesce batches operator

2020-12-30 Thread GitBox
andygrove commented on a change in pull request #9043: URL: https://github.com/apache/arrow/pull/9043#discussion_r550216625 ## File path: rust/datafusion/src/physical_plan/coalesce_batches.rs ## @@ -0,0 +1,295 @@ +// Licensed to the Apache Software Foundation (ASF) under one +/

[GitHub] [arrow] github-actions[bot] commented on pull request #9047: ARROW-11072: [Rust] [Parquet] Support reading decimal from physical int types

2020-12-30 Thread GitBox
github-actions[bot] commented on pull request #9047: URL: https://github.com/apache/arrow/pull/9047#issuecomment-752645838 https://issues.apache.org/jira/browse/ARROW-11072 This is an automated message from the Apache Git Ser

[GitHub] [arrow] andygrove commented on a change in pull request #9043: ARROW-11058: [Rust] [DataFusion] Implement coalesce batches operator

2020-12-30 Thread GitBox
andygrove commented on a change in pull request #9043: URL: https://github.com/apache/arrow/pull/9043#discussion_r550217442 ## File path: rust/datafusion/src/physical_plan/planner.rs ## @@ -110,6 +111,16 @@ impl DefaultPhysicalPlanner { // leaf node, children canno

[GitHub] [arrow] github-actions[bot] removed a comment on pull request #9047: ARROW-11072: [Rust] [Parquet] Support reading decimal from physical int types

2020-12-30 Thread GitBox
github-actions[bot] removed a comment on pull request #9047: URL: https://github.com/apache/arrow/pull/9047#issuecomment-752563024 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Th

[GitHub] [arrow] Dandandan opened a new pull request #9048: ARROW-11076: [Rust][DataFusion] Join right refactor

2020-12-30 Thread GitBox
Dandandan opened a new pull request #9048: URL: https://github.com/apache/arrow/pull/9048 This applies some refactoring to `build_batch_from_indices` which is supposed to make further changes easier, e.g. solving https://issues.apache.org/jira/browse/ARROW-11030 * This starts handli

[GitHub] [arrow] Dandandan commented on pull request #9036: ARROW-11053: [Rust] [DataFusion] Optimize joins with dynamic capacity for output batches

2020-12-30 Thread GitBox
Dandandan commented on pull request #9036: URL: https://github.com/apache/arrow/pull/9036#issuecomment-752675210 @andygrove thought more about this, I think we are able to use `indices.len()` for the *exact* required capacity rather than using previous sizes. I included the change among ot

[GitHub] [arrow] github-actions[bot] commented on pull request #9048: ARROW-11076: [Rust][DataFusion] Refactor usage of right indices in hash join

2020-12-30 Thread GitBox
github-actions[bot] commented on pull request #9048: URL: https://github.com/apache/arrow/pull/9048#issuecomment-752678852 https://issues.apache.org/jira/browse/ARROW-11076 This is an automated message from the Apache Git Ser

[GitHub] [arrow] mqy commented on a change in pull request #9025: ARROW-10259: [Rust] Add custom metadata to Field

2020-12-30 Thread GitBox
mqy commented on a change in pull request #9025: URL: https://github.com/apache/arrow/pull/9025#discussion_r550250838 ## File path: rust/arrow/src/datatypes.rs ## @@ -1903,9 +1929,20 @@ mod tests { #[test] fn serde_struct_type() { +let kv_array = [("k".to_st

[GitHub] [arrow] nealrichardson commented on a change in pull request #9034: ARROW-10733: [R] Improvements to Linux installation troubleshooting

2020-12-30 Thread GitBox
nealrichardson commented on a change in pull request #9034: URL: https://github.com/apache/arrow/pull/9034#discussion_r550251046 ## File path: r/vignettes/install.Rmd ## @@ -175,25 +181,24 @@ tune one of several parameters. Here are some known complications and ways to ad If

[GitHub] [arrow] nealrichardson closed pull request #9033: ARROW-11050: [R] Handle RecordBatch in write_parquet()

2020-12-30 Thread GitBox
nealrichardson closed pull request #9033: URL: https://github.com/apache/arrow/pull/9033 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [arrow] nealrichardson commented on a change in pull request #9039: ARROW-10416: [R] Support Tables in Flight

2020-12-30 Thread GitBox
nealrichardson commented on a change in pull request #9039: URL: https://github.com/apache/arrow/pull/9039#discussion_r550251893 ## File path: r/tests/testthat/test-python-flight.R ## @@ -0,0 +1,63 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more cont

[GitHub] [arrow] nealrichardson closed pull request #9039: ARROW-10416: [R] Support Tables in Flight

2020-12-30 Thread GitBox
nealrichardson closed pull request #9039: URL: https://github.com/apache/arrow/pull/9039 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [arrow] mqy commented on a change in pull request #9025: ARROW-10259: [Rust] Add custom metadata to Field

2020-12-30 Thread GitBox
mqy commented on a change in pull request #9025: URL: https://github.com/apache/arrow/pull/9025#discussion_r550250838 ## File path: rust/arrow/src/datatypes.rs ## @@ -1903,9 +1929,20 @@ mod tests { #[test] fn serde_struct_type() { +let kv_array = [("k".to_st

[GitHub] [arrow] andygrove commented on pull request #9036: ARROW-11053: [Rust] [DataFusion] Optimize joins with dynamic capacity for output batches

2020-12-30 Thread GitBox
andygrove commented on pull request #9036: URL: https://github.com/apache/arrow/pull/9036#issuecomment-752685504 Closed in favor of https://github.com/apache/arrow/pull/9048 This is an automated message from the Apache Git Se

[GitHub] [arrow] andygrove closed pull request #9036: ARROW-11053: [Rust] [DataFusion] Optimize joins with dynamic capacity for output batches

2020-12-30 Thread GitBox
andygrove closed pull request #9036: URL: https://github.com/apache/arrow/pull/9036 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] andygrove commented on a change in pull request #9048: ARROW-11076: [Rust][DataFusion] Refactor usage of right indices in hash join

2020-12-30 Thread GitBox
andygrove commented on a change in pull request #9048: URL: https://github.com/apache/arrow/pull/9048#discussion_r550257236 ## File path: rust/datafusion/src/physical_plan/hash_join.rs ## @@ -276,55 +276,58 @@ fn build_batch_from_indices( todo!("Create empty record bat

[GitHub] [arrow] andygrove commented on pull request #9048: ARROW-11076: [Rust][DataFusion] Refactor usage of right indices in hash join

2020-12-30 Thread GitBox
andygrove commented on pull request #9048: URL: https://github.com/apache/arrow/pull/9048#issuecomment-752688847 For TPC-H q12 at SF=100 and 8 partitions: | Batch Size | Master | #9043 | #9043 + This PR | | --- | --- | --- | | 4096 | ??? | ??? | 25.2 s | | 8192 | 617.5 s | 70

[GitHub] [arrow] andygrove edited a comment on pull request #9048: ARROW-11076: [Rust][DataFusion] Refactor usage of right indices in hash join

2020-12-30 Thread GitBox
andygrove edited a comment on pull request #9048: URL: https://github.com/apache/arrow/pull/9048#issuecomment-752688847 For TPC-H q12 at SF=100 and 8 partitions: | Batch Size | Master | #9043 | #9043 + This PR | | --- | --- | --- | --- | | 4096 | ??? | ??? | 25.2 s | | 8192 |

[GitHub] [arrow] Dandandan commented on a change in pull request #9048: ARROW-11076: [Rust][DataFusion] Refactor usage of right indices in hash join

2020-12-30 Thread GitBox
Dandandan commented on a change in pull request #9048: URL: https://github.com/apache/arrow/pull/9048#discussion_r550260158 ## File path: rust/datafusion/src/physical_plan/hash_join.rs ## @@ -276,55 +276,58 @@ fn build_batch_from_indices( todo!("Create empty record bat

[GitHub] [arrow] andygrove closed pull request #9043: ARROW-11058: [Rust] [DataFusion] Implement coalesce batches operator

2020-12-30 Thread GitBox
andygrove closed pull request #9043: URL: https://github.com/apache/arrow/pull/9043 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] andygrove commented on pull request #9035: ARROW-11052: [Rust] [DataFusion] Implement metrics for HashJoinExec

2020-12-30 Thread GitBox
andygrove commented on pull request #9035: URL: https://github.com/apache/arrow/pull/9035#issuecomment-752691811 I will rebase this and address feedback once https://github.com/apache/arrow/pull/9048 is merged This is an aut

[GitHub] [arrow] Dandandan commented on pull request #9048: ARROW-11076: [Rust][DataFusion] Refactor usage of right indices in hash join

2020-12-30 Thread GitBox
Dandandan commented on pull request #9048: URL: https://github.com/apache/arrow/pull/9048#issuecomment-752692256 Awesome, better than I expected! This is an automated message from the Apache Git Service. To respond to the mes

[GitHub] [arrow] nealrichardson closed pull request #9034: ARROW-10733: [R] Improvements to Linux installation troubleshooting

2020-12-30 Thread GitBox
nealrichardson closed pull request #9034: URL: https://github.com/apache/arrow/pull/9034 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [arrow] jonkeane commented on a change in pull request #8947: ARROW-9187: [R] Add bindings for arithmetic kernels

2020-12-30 Thread GitBox
jonkeane commented on a change in pull request #8947: URL: https://github.com/apache/arrow/pull/8947#discussion_r550266972 ## File path: r/tests/testthat/test-dplyr.R ## @@ -133,6 +133,42 @@ test_that("filtering with expression", { ) }) +test_that("filtering with arithmet

[GitHub] [arrow] Dandandan commented on a change in pull request #9048: ARROW-11076: [Rust][DataFusion] Refactor usage of right indices in hash join

2020-12-30 Thread GitBox
Dandandan commented on a change in pull request #9048: URL: https://github.com/apache/arrow/pull/9048#discussion_r550269050 ## File path: rust/datafusion/src/physical_plan/hash_join.rs ## @@ -276,55 +276,58 @@ fn build_batch_from_indices( todo!("Create empty record bat

[GitHub] [arrow] Dandandan commented on a change in pull request #9048: ARROW-11076: [Rust][DataFusion] Refactor usage of right indices in hash join

2020-12-30 Thread GitBox
Dandandan commented on a change in pull request #9048: URL: https://github.com/apache/arrow/pull/9048#discussion_r550269050 ## File path: rust/datafusion/src/physical_plan/hash_join.rs ## @@ -276,55 +276,58 @@ fn build_batch_from_indices( todo!("Create empty record bat

[GitHub] [arrow] alamb commented on a change in pull request #9043: ARROW-11058: [Rust] [DataFusion] Implement coalesce batches operator

2020-12-30 Thread GitBox
alamb commented on a change in pull request #9043: URL: https://github.com/apache/arrow/pull/9043#discussion_r550271524 ## File path: rust/datafusion/src/physical_plan/coalesce_batches.rs ## @@ -0,0 +1,295 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

[GitHub] [arrow] jhorstmann commented on a change in pull request #8975: ARROW-10990: [Rust] Refactor simd comparison kernels to avoid out of bounds reads

2020-12-30 Thread GitBox
jhorstmann commented on a change in pull request #8975: URL: https://github.com/apache/arrow/pull/8975#discussion_r550272926 ## File path: rust/arrow/src/array/array_primitive.rs ## @@ -67,19 +67,6 @@ impl PrimitiveArray { self.data.is_empty() } -/// Returns

[GitHub] [arrow] AWSjswinney commented on pull request #8491: ARROW-10349: [Python] linux aarch64 wheels

2020-12-30 Thread GitBox
AWSjswinney commented on pull request #8491: URL: https://github.com/apache/arrow/pull/8491#issuecomment-752707138 I believe this PR now has everything needed to build and publish aarch64 wheels for manylinux2014. Thanks for reviewing it! --

[GitHub] [arrow] carols10cents opened a new pull request #9049: ARROW-8853: [Rust] [Integration Testing] Enable Flight tests

2020-12-30 Thread GitBox
carols10cents opened a new pull request #9049: URL: https://github.com/apache/arrow/pull/9049 This PR has a few refactorings and then the main commit contains a new Flight integration test client and server 🎉 The middleware scenario tests are currently skipped because they will fail

[GitHub] [arrow] nealrichardson opened a new pull request #9050: ARROW-11079: [R] Catch up on changelog since 2.0

2020-12-30 Thread GitBox
nealrichardson opened a new pull request #9050: URL: https://github.com/apache/arrow/pull/9050 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[GitHub] [arrow] nealrichardson commented on a change in pull request #8947: ARROW-9187: [R] Add bindings for arithmetic kernels

2020-12-30 Thread GitBox
nealrichardson commented on a change in pull request #8947: URL: https://github.com/apache/arrow/pull/8947#discussion_r550293049 ## File path: r/R/expression.R ## @@ -59,6 +59,44 @@ build_array_expression <- function(.Generic, e1, e2, ...) { } else { e1 <- .wrap_arrow(e

[GitHub] [arrow] github-actions[bot] commented on pull request #9050: ARROW-11079: [R] Catch up on changelog since 2.0

2020-12-30 Thread GitBox
github-actions[bot] commented on pull request #9050: URL: https://github.com/apache/arrow/pull/9050#issuecomment-752727272 https://issues.apache.org/jira/browse/ARROW-11079 This is an automated message from the Apache Git Ser

[GitHub] [arrow] github-actions[bot] commented on pull request #9049: ARROW-8853: [Rust] [Integration Testing] Enable Flight tests

2020-12-30 Thread GitBox
github-actions[bot] commented on pull request #9049: URL: https://github.com/apache/arrow/pull/9049#issuecomment-752727271 https://issues.apache.org/jira/browse/ARROW-8853 This is an automated message from the Apache Git Serv

[GitHub] [arrow] sunchao commented on a change in pull request #9047: ARROW-11072: [Rust] [Parquet] Support reading decimal from physical int types

2020-12-30 Thread GitBox
sunchao commented on a change in pull request #9047: URL: https://github.com/apache/arrow/pull/9047#discussion_r550321322 ## File path: rust/parquet/src/arrow/schema.rs ## @@ -657,6 +645,22 @@ impl ParquetTypeConverter<'_> { } } +fn to_decimal(&self) -> Resu

[GitHub] [arrow] kou commented on pull request #8386: ARROW-10224: [Python] Add support for Python 3.9 except macOS wheel and Windows wheel

2020-12-30 Thread GitBox
kou commented on pull request #8386: URL: https://github.com/apache/arrow/pull/8386#issuecomment-752764866 We're still working on it at #8915. This is an automated message from the Apache Git Service. To respond to the messag

[GitHub] [arrow] kou commented on a change in pull request #9045: Fixup pre-commit-config.yaml so that cmake format runs

2020-12-30 Thread GitBox
kou commented on a change in pull request #9045: URL: https://github.com/apache/arrow/pull/9045#discussion_r550335310 ## File path: .pre-commit-config.yaml ## @@ -40,9 +40,10 @@ repos: - id: cmake-format name: CMake Format language: python -entr

[GitHub] [arrow] kou commented on a change in pull request #9045: Fixup pre-commit-config.yaml so that cmake format runs

2020-12-30 Thread GitBox
kou commented on a change in pull request #9045: URL: https://github.com/apache/arrow/pull/9045#discussion_r550335701 ## File path: .pre-commit-config.yaml ## @@ -40,9 +40,10 @@ repos: - id: cmake-format name: CMake Format language: python -entr

[GitHub] [arrow] seddonm1 commented on pull request #9038: ARROW-10356: [Rust][DataFusion] Add support for is_in (WIP)

2020-12-30 Thread GitBox
seddonm1 commented on pull request #9038: URL: https://github.com/apache/arrow/pull/9038#issuecomment-752776874 > 1. Yes I think there should be a different/more efficient implementation that handles the "scalar" case, where the scalar in this case is the list with values. Agree. I can

[GitHub] [arrow] seddonm1 edited a comment on pull request #9038: ARROW-10356: [Rust][DataFusion] Add support for is_in (WIP)

2020-12-30 Thread GitBox
seddonm1 edited a comment on pull request #9038: URL: https://github.com/apache/arrow/pull/9038#issuecomment-752776874 > 1. Yes I think there should be a different/more efficient implementation that handles the "scalar" case, where the scalar in this case is the list with values. Ag

[GitHub] [arrow] nealrichardson closed pull request #9050: ARROW-11079: [R] Catch up on changelog since 2.0

2020-12-30 Thread GitBox
nealrichardson closed pull request #9050: URL: https://github.com/apache/arrow/pull/9050 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [arrow] jonkeane opened a new pull request #9051: ARROW-10668: [R] Support for the .data pronoun

2020-12-30 Thread GitBox
jonkeane opened a new pull request #9051: URL: https://github.com/apache/arrow/pull/9051 and tests for the .env pronoun This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

[GitHub] [arrow] nealrichardson commented on pull request #9051: ARROW-10668: [R] Support for the .data pronoun

2020-12-30 Thread GitBox
nealrichardson commented on pull request #9051: URL: https://github.com/apache/arrow/pull/9051#issuecomment-752791737 Can you add a news bullet for this (making sure you're rebased on latest master)? This is an automated mes

[GitHub] [arrow] nealrichardson commented on a change in pull request #9051: ARROW-10668: [R] Support for the .data pronoun

2020-12-30 Thread GitBox
nealrichardson commented on a change in pull request #9051: URL: https://github.com/apache/arrow/pull/9051#discussion_r550358517 ## File path: r/R/dplyr.R ## @@ -265,6 +267,8 @@ filter_mask <- function(.data) { # Then add the column references # Renaming is handled automa

[GitHub] [arrow] github-actions[bot] commented on pull request #9051: ARROW-10668: [R] Support for the .data pronoun

2020-12-30 Thread GitBox
github-actions[bot] commented on pull request #9051: URL: https://github.com/apache/arrow/pull/9051#issuecomment-752794206 https://issues.apache.org/jira/browse/ARROW-10668 This is an automated message from the Apache Git Ser

[GitHub] [arrow] seddonm1 commented on pull request #9038: ARROW-10356: [Rust][DataFusion] Add support for is_in (WIP)

2020-12-30 Thread GitBox
seddonm1 commented on pull request #9038: URL: https://github.com/apache/arrow/pull/9038#issuecomment-752807355 I have updated this PR with a reimplementation of the logic so that the kernel which has two undesired behaviour (see points 1 and 2) is no longer invoked. It should also support

[GitHub] [arrow] codecov-io edited a comment on pull request #9007: ARROW-11029: [Rust] [DataFusion] Add documentation for code that determines number of rows per operator

2020-12-30 Thread GitBox
codecov-io edited a comment on pull request #9007: URL: https://github.com/apache/arrow/pull/9007#issuecomment-750924264 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9007?src=pr&el=h1) Report > Merging [#9007](https://codecov.io/gh/apache/arrow/pull/9007?src=pr&el=desc) (516bf56)

[GitHub] [arrow] ElenaHenderson opened a new pull request #9052: URSA-107 > Run Python benchmarks for each commit

2020-12-30 Thread GitBox
ElenaHenderson opened a new pull request #9052: URL: https://github.com/apache/arrow/pull/9052 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[GitHub] [arrow] github-actions[bot] commented on pull request #9052: URSA-107 > Run Python benchmarks for each commit

2020-12-30 Thread GitBox
github-actions[bot] commented on pull request #9052: URL: https://github.com/apache/arrow/pull/9052#issuecomment-752833739 Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? https://issues.apache.org/jira/browse/ARROW Then could

[GitHub] [arrow] liyafan82 opened a new pull request #9053: ARROW-11081: [Java] Make IPC option immutable

2020-12-30 Thread GitBox
liyafan82 opened a new pull request #9053: URL: https://github.com/apache/arrow/pull/9053 By making it immutable, the following benefits can be obtained: 1. It makes the code easier to reason about. 2. It allows JIT to make more optimizations. 3. Immutable objects can be shared,

[GitHub] [arrow] liyafan82 commented on pull request #9053: ARROW-11081: [Java] Make IPC option immutable

2020-12-30 Thread GitBox
liyafan82 commented on pull request #9053: URL: https://github.com/apache/arrow/pull/9053#issuecomment-752834119 This PR also fixes a bug in ArrowMessage#ArrowMessage(ArrowDictionaryBatch, IpcOption) This is an automated mes

[GitHub] [arrow] github-actions[bot] commented on pull request #9053: ARROW-11081: [Java] Make IPC option immutable

2020-12-30 Thread GitBox
github-actions[bot] commented on pull request #9053: URL: https://github.com/apache/arrow/pull/9053#issuecomment-752838345 https://issues.apache.org/jira/browse/ARROW-11081 This is an automated message from the Apache Git Ser

[GitHub] [arrow] ElenaHenderson closed pull request #9052: URSA-107 > Run Python benchmarks for each commit

2020-12-30 Thread GitBox
ElenaHenderson closed pull request #9052: URL: https://github.com/apache/arrow/pull/9052 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [arrow] jorgecarleitao commented on pull request #9016: ARROW-11037: [Rust] Optimized creation of string array from iterator.

2020-12-30 Thread GitBox
jorgecarleitao commented on pull request #9016: URL: https://github.com/apache/arrow/pull/9016#issuecomment-752862936 I have re-opened this PR as #9032 shows that creating a buffer from a mutable buffer is 2x faster. This is

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9016: ARROW-11037: [Rust] Optimized creation of string array from iterator.

2020-12-30 Thread GitBox
jorgecarleitao commented on a change in pull request #9016: URL: https://github.com/apache/arrow/pull/9016#discussion_r550412011 ## File path: rust/arrow/src/array/array_string.rs ## @@ -126,25 +126,28 @@ impl GenericStringArray { } pub(crate) fn from_vec(v: Vec<&s

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9016: ARROW-11037: [Rust] Optimized creation of string array from iterator.

2020-12-30 Thread GitBox
jorgecarleitao commented on a change in pull request #9016: URL: https://github.com/apache/arrow/pull/9016#discussion_r550416288 ## File path: rust/arrow/src/array/array_string.rs ## @@ -126,25 +126,28 @@ impl GenericStringArray { } pub(crate) fn from_vec(v: Vec<&s

[GitHub] [arrow] jorgecarleitao opened a new pull request #9054: ARROW-11082: [Rust] C data interface to largeUTF8

2020-12-30 Thread GitBox
jorgecarleitao opened a new pull request #9054: URL: https://github.com/apache/arrow/pull/9054 This also simplifies some code and adds a test for the boolean case, which is special due to bit-packing. This is an automated me

[GitHub] [arrow] jorgecarleitao opened a new pull request #9055: ARROW-11084: [Rust] Fixed clippy

2020-12-30 Thread GitBox
jorgecarleitao opened a new pull request #9055: URL: https://github.com/apache/arrow/pull/9055 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[GitHub] [arrow] github-actions[bot] commented on pull request #9054: ARROW-11082: [Rust] C data interface to largeUTF8

2020-12-30 Thread GitBox
github-actions[bot] commented on pull request #9054: URL: https://github.com/apache/arrow/pull/9054#issuecomment-752880441 https://issues.apache.org/jira/browse/ARROW-11082 This is an automated message from the Apache Git Ser

[GitHub] [arrow] github-actions[bot] commented on pull request #9055: ARROW-11084: [Rust] Fixed clippy

2020-12-30 Thread GitBox
github-actions[bot] commented on pull request #9055: URL: https://github.com/apache/arrow/pull/9055#issuecomment-752880440 https://issues.apache.org/jira/browse/ARROW-11084 This is an automated message from the Apache Git Ser