[GitHub] [arrow-datafusion] voltcode opened a new issue #464: Question: Can DataFusion handle larger than RAM datasets?

2021-05-31 Thread GitBox
voltcode opened a new issue #464: URL: https://github.com/apache/arrow-datafusion/issues/464 I browsed the readme and slides but failed to grok - can DataFusion handle larger than RAM datasets? In other words, if I register multiple parquet files, which size exceeds RAM, will they get all

[GitHub] [arrow-datafusion] Jimexist opened a new pull request #463: Add sort in window functions

2021-05-31 Thread GitBox
Jimexist opened a new pull request #463: URL: https://github.com/apache/arrow-datafusion/pull/463 # Which issue does this PR close? Closes #. # Rationale for this change # What changes are included in this PR? # Are there any user-facing changes?

[GitHub] [arrow-rs] Dandandan commented on a change in pull request #384: Implement faster arrow array reader

2021-05-31 Thread GitBox
Dandandan commented on a change in pull request #384: URL: https://github.com/apache/arrow-rs/pull/384#discussion_r642802751 ## File path: parquet/src/util/memory.rs ## @@ -292,19 +292,28 @@ impl BufferPtr { } /// Returns slice of data in this buffer. +#[inline]

[GitHub] [arrow] cyb70289 commented on pull request #10364: ARROW-12074: [C++][Compute] Add scalar arithmetic kernels for decimal

2021-05-31 Thread GitBox
cyb70289 commented on pull request #10364: URL: https://github.com/apache/arrow/pull/10364#issuecomment-851806478 > Please extract the decimal upscaling from the addition kernel into an implicit cast. This will simplify the addition kernel to stateless addition (IIUC) and give callers cont

[GitHub] [arrow] nirandaperera commented on a change in pull request #10410: ARROW-10640: [C++] A "where" kernel to combine two arrays based on a mask

2021-05-31 Thread GitBox
nirandaperera commented on a change in pull request #10410: URL: https://github.com/apache/arrow/pull/10410#discussion_r642728158 ## File path: cpp/src/arrow/compute/kernels/scalar_if_else_test.cc ## @@ -0,0 +1,474 @@ +// Licensed to the Apache Software Foundation (ASF) under o

[GitHub] [arrow] nirandaperera commented on pull request #10410: ARROW-10640: [C++] A "where" kernel to combine two arrays based on a mask

2021-05-31 Thread GitBox
nirandaperera commented on pull request #10410: URL: https://github.com/apache/arrow/pull/10410#issuecomment-851743321 @github-actions autotune -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [arrow-rs] codecov-commenter edited a comment on pull request #353: Add append_slice for GenericStringBuilder

2021-05-31 Thread GitBox
codecov-commenter edited a comment on pull request #353: URL: https://github.com/apache/arrow-rs/pull/353#issuecomment-848275425 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/353?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_ter

[GitHub] [arrow-rs] alippai commented on pull request #353: Add append_slice for GenericStringBuilder

2021-05-31 Thread GitBox
alippai commented on pull request #353: URL: https://github.com/apache/arrow-rs/pull/353#issuecomment-851701049 @ritchie46 updated the API with correct name + unsafe, also added benchmark. ``` bench_string/bench_string

[GitHub] [arrow-datafusion] codecov-commenter edited a comment on pull request #459: Refactor QueryStageExec in preparation for implementing map-side shuffle

2021-05-31 Thread GitBox
codecov-commenter edited a comment on pull request #459: URL: https://github.com/apache/arrow-datafusion/pull/459#issuecomment-851638878 # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/459?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+

[GitHub] [arrow-rs] codecov-commenter edited a comment on pull request #384: Implement faster arrow array reader

2021-05-31 Thread GitBox
codecov-commenter edited a comment on pull request #384: URL: https://github.com/apache/arrow-rs/pull/384#issuecomment-851063613 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/384?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_ter

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #68: Experimenting with arrow2

2021-05-31 Thread GitBox
Dandandan commented on a change in pull request #68: URL: https://github.com/apache/arrow-datafusion/pull/68#discussion_r642571782 ## File path: datafusion/src/physical_plan/hash_aggregate.rs ## @@ -339,6 +325,36 @@ pin_project! { } } +fn hash_(group_values: &[ArrayRef]

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #436: Remove reundant filters (e.g. c> 5 AND c>5 --> c>5)

2021-05-31 Thread GitBox
Dandandan commented on a change in pull request #436: URL: https://github.com/apache/arrow-datafusion/pull/436#discussion_r642675738 ## File path: datafusion/src/optimizer/remove_duplicate_filters.rs ## @@ -0,0 +1,611 @@ +// regarding copyright ownership. The ASF licenses this

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #436: Remove reundant filters (e.g. c> 5 AND c>5 --> c>5)

2021-05-31 Thread GitBox
Dandandan commented on a change in pull request #436: URL: https://github.com/apache/arrow-datafusion/pull/436#discussion_r642675738 ## File path: datafusion/src/optimizer/remove_duplicate_filters.rs ## @@ -0,0 +1,611 @@ +// regarding copyright ownership. The ASF licenses this

[GitHub] [arrow-datafusion] Dandandan commented on issue #451: Add Linked data benchmarks

2021-05-31 Thread GitBox
Dandandan commented on issue #451: URL: https://github.com/apache/arrow-datafusion/issues/451#issuecomment-851679821 Yeah I believe joins are reasonably fast currently. I do need to do some comparisions (e.g. add the join queries to https://github.com/h2oai/db-benchmark/pull/182) T

[GitHub] [arrow-datafusion] codecov-commenter edited a comment on pull request #441: WIP: Add tokomak optimizer

2021-05-31 Thread GitBox
codecov-commenter edited a comment on pull request #441: URL: https://github.com/apache/arrow-datafusion/pull/441#issuecomment-850900230 # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/441?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+

[GitHub] [arrow-datafusion] Dandandan opened a new issue #462: Add support for recursive CTEs

2021-05-31 Thread GitBox
Dandandan opened a new issue #462: URL: https://github.com/apache/arrow-datafusion/issues/462 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Recursive CTEs are interesting to support more complex algorithms like graph processing

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #436: Remove reundant filters (e.g. c> 5 AND c>5 --> c>5)

2021-05-31 Thread GitBox
Dandandan commented on a change in pull request #436: URL: https://github.com/apache/arrow-datafusion/pull/436#discussion_r642668210 ## File path: datafusion/src/optimizer/mod.rs ## @@ -25,4 +25,5 @@ pub mod hash_build_probe_order; pub mod limit_push_down; pub mod optimizer;

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #436: Remove reundant filters (e.g. c> 5 AND c>5 --> c>5)

2021-05-31 Thread GitBox
Dandandan commented on a change in pull request #436: URL: https://github.com/apache/arrow-datafusion/pull/436#discussion_r642667853 ## File path: datafusion/src/optimizer/remove_duplicate_filters.rs ## @@ -0,0 +1,611 @@ +// regarding copyright ownership. The ASF licenses this

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #436: Remove reundant filters (e.g. c> 5 AND c>5 --> c>5)

2021-05-31 Thread GitBox
Dandandan commented on a change in pull request #436: URL: https://github.com/apache/arrow-datafusion/pull/436#discussion_r642667063 ## File path: datafusion/src/optimizer/remove_duplicate_filters.rs ## @@ -0,0 +1,611 @@ +// regarding copyright ownership. The ASF licenses this

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #436: Remove reundant filters (e.g. c> 5 AND c>5 --> c>5)

2021-05-31 Thread GitBox
Dandandan commented on a change in pull request #436: URL: https://github.com/apache/arrow-datafusion/pull/436#discussion_r642666851 ## File path: datafusion/src/optimizer/remove_duplicate_filters.rs ## @@ -0,0 +1,611 @@ +// regarding copyright ownership. The ASF licenses this

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #436: Remove reundant filters (e.g. c> 5 AND c>5 --> c>5)

2021-05-31 Thread GitBox
Dandandan commented on a change in pull request #436: URL: https://github.com/apache/arrow-datafusion/pull/436#discussion_r64299 ## File path: datafusion/src/optimizer/remove_duplicate_filters.rs ## @@ -0,0 +1,611 @@ +// regarding copyright ownership. The ASF licenses this

[GitHub] [arrow-datafusion] alippai edited a comment on issue #451: Add Linked data benchmarks

2021-05-31 Thread GitBox
alippai edited a comment on issue #451: URL: https://github.com/apache/arrow-datafusion/issues/451#issuecomment-851672062 In this case LSQB sounds to be a better first target. 👍 > So also into what vectorized engines (can) do here. I have a bad experience with dedicated "grap

[GitHub] [arrow-datafusion] alippai commented on issue #451: Add Linked data benchmarks

2021-05-31 Thread GitBox
alippai commented on issue #451: URL: https://github.com/apache/arrow-datafusion/issues/451#issuecomment-851672062 In this case LSQB sounds to be a better first target. 👍 > So also into what vectorized engines (can) do here. I have a bad experience with dedicated "graph engines",

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #436: Remove reundant filters (e.g. c> 5 AND c>5 --> c>5)

2021-05-31 Thread GitBox
Dandandan commented on a change in pull request #436: URL: https://github.com/apache/arrow-datafusion/pull/436#discussion_r642666286 ## File path: datafusion/src/optimizer/remove_duplicate_filters.rs ## @@ -0,0 +1,611 @@ +// regarding copyright ownership. The ASF licenses this

[GitHub] [arrow-datafusion] Dandandan opened a new issue #461: Add support for semi (hash) join

2021-05-31 Thread GitBox
Dandandan opened a new issue #461: URL: https://github.com/apache/arrow-datafusion/issues/461 *Is your feature request related to a problem or challenge? Please describe what you are trying to do.* We should support semi join. This can also be used to execute IN and EXISTS. Explicit s

[GitHub] [arrow-datafusion] Dandandan opened a new issue #460: Add support for anti (hash) join

2021-05-31 Thread GitBox
Dandandan opened a new issue #460: URL: https://github.com/apache/arrow-datafusion/issues/460 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** We should support anti join. This can also be used to execute `NOT IN` and `NOT EXISTS`

[GitHub] [arrow-datafusion] Dandandan commented on issue #451: Add Linked data benchmarks

2021-05-31 Thread GitBox
Dandandan commented on issue #451: URL: https://github.com/apache/arrow-datafusion/issues/451#issuecomment-851666274 Thanks a lot again 👍 I think the challenging part with recursive CTE in DataFusion will be doing it efficiently with arrow data, as . So also into what vectorized

[GitHub] [arrow-rs] codecov-commenter edited a comment on pull request #382: make sure that only concat preallocates buffers

2021-05-31 Thread GitBox
codecov-commenter edited a comment on pull request #382: URL: https://github.com/apache/arrow-rs/pull/382#issuecomment-850953216 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_ter

[GitHub] [arrow-datafusion] jgoday commented on pull request #436: Remove reundant filters (e.g. c> 5 AND c>5 --> c>5)

2021-05-31 Thread GitBox
jgoday commented on pull request #436: URL: https://github.com/apache/arrow-datafusion/pull/436#issuecomment-851664134 @Dandandan I've just added some more simplification rules (from https://github.com/Dandandan/datafusion-tokomak/blob/main/src/lib.rs#L44, as you mentioned before), What do

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #459: Refactor QueryStageExec in preparation for implementing map-side shuffle

2021-05-31 Thread GitBox
Dandandan commented on a change in pull request #459: URL: https://github.com/apache/arrow-datafusion/pull/459#discussion_r642658594 ## File path: ballista/rust/core/src/execution_plans/query_stage.rs ## @@ -77,16 +109,142 @@ impl ExecutionPlan for QueryStageExec { ) -> Re

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #459: Refactor QueryStageExec in preparation for implementing map-side shuffle

2021-05-31 Thread GitBox
Dandandan commented on a change in pull request #459: URL: https://github.com/apache/arrow-datafusion/pull/459#discussion_r642657693 ## File path: ballista/rust/core/src/execution_plans/query_stage.rs ## @@ -31,26 +45,44 @@ use uuid::Uuid; #[derive(Debug, Clone)] pub struct Q

[GitHub] [arrow-datafusion] codecov-commenter edited a comment on pull request #459: Refactor QueryStageExec in preparation for implementing map-side shuffle

2021-05-31 Thread GitBox
codecov-commenter edited a comment on pull request #459: URL: https://github.com/apache/arrow-datafusion/pull/459#issuecomment-851638878 # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/459?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+

[GitHub] [arrow-rs] codecov-commenter edited a comment on pull request #382: make sure that only concat preallocates buffers

2021-05-31 Thread GitBox
codecov-commenter edited a comment on pull request #382: URL: https://github.com/apache/arrow-rs/pull/382#issuecomment-850953216 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_ter

[GitHub] [arrow-datafusion] alippai commented on issue #451: Add Linked data benchmarks

2021-05-31 Thread GitBox
alippai commented on issue #451: URL: https://github.com/apache/arrow-datafusion/issues/451#issuecomment-851647603 For the LSQB here is the paper https://szarnyasg.github.io/tsmb-grades21/ms.pdf and a presentation https://docs.google.com/presentation/d/1pxyX_CWhFVYEttjTG2BrzuaMkEuLRxfhf5i

[GitHub] [arrow-datafusion] alippai commented on issue #451: Add Linked data benchmarks

2021-05-31 Thread GitBox
alippai commented on issue #451: URL: https://github.com/apache/arrow-datafusion/issues/451#issuecomment-851646571 @Dandandan I'm not sure on the recursive CTE implementatiomm part, however PostgreSQL has a brief description on the algorithm https://www.postgresql.org/docs/current/que

[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #459: Refactor QueryStageExec in preparation for implementing map-side shuffle

2021-05-31 Thread GitBox
codecov-commenter commented on pull request #459: URL: https://github.com/apache/arrow-datafusion/pull/459#issuecomment-851638878 # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/459?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comment

[GitHub] [arrow-datafusion] Dandandan commented on issue #451: Add Linked data benchmarks

2021-05-31 Thread GitBox
Dandandan commented on issue #451: URL: https://github.com/apache/arrow-datafusion/issues/451#issuecomment-851634623 I didn't hear of this benchmark before, thanks for referencing it! Sounds really cool/useful. I believe for graph processing you'll need (mostly) support for recursiv

[GitHub] [arrow-datafusion] andygrove commented on pull request #459: Refactor QueryStageExec in preparation for implementing map-side shuffle

2021-05-31 Thread GitBox
andygrove commented on pull request #459: URL: https://github.com/apache/arrow-datafusion/pull/459#issuecomment-851632933 @edrevo fyi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [arrow-datafusion] andygrove opened a new pull request #459: Refactor QueryStageExec in preparation for implementing map-side shuffle

2021-05-31 Thread GitBox
andygrove opened a new pull request #459: URL: https://github.com/apache/arrow-datafusion/pull/459 # Which issue does this PR close? Closes #458 # Rationale for this change # What changes are included in this PR? # Are there any user-facing chang

[GitHub] [arrow-datafusion] andygrove opened a new issue #458: Ballista refactor QueryStageExec in preparation for map-side shuffle

2021-05-31 Thread GitBox
andygrove opened a new issue #458: URL: https://github.com/apache/arrow-datafusion/issues/458 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** This issue is to track some refactoring work in preparation for implementing map-side s

[GitHub] [arrow-datafusion] mluts opened a new pull request #457: Add examples section to datafusion crate doc

2021-05-31 Thread GitBox
mluts opened a new pull request #457: URL: https://github.com/apache/arrow-datafusion/pull/457 # Which issue does this PR close? Closes #186 . # Rationale for this change Making it easier for newcomers to run example code. Unfortunately i didn't find any

[GitHub] [arrow] pachadotdev commented on a change in pull request #9999: ARROW-11755: [R] Add tests from dplyr/test-mutate.r

2021-05-31 Thread GitBox
pachadotdev commented on a change in pull request #: URL: https://github.com/apache/arrow/pull/#discussion_r642601213 ## File path: r/tests/testthat/test-dplyr-mutate.R ## @@ -32,59 +51,23 @@ test_that("mutate() is lazy", { ) }) -test_that("basic mutate", { - exp

[GitHub] [arrow-datafusion] Dandandan edited a comment on issue #418: [question] performance considerations of create_key_for_col (HashAggregate)

2021-05-31 Thread GitBox
Dandandan edited a comment on issue #418: URL: https://github.com/apache/arrow-datafusion/issues/418#issuecomment-851552960 @jorgecarleitao Interesting! I did some earlier experiments with the vectorized hashing too (and saw similar speed ups for low-cardinality aggregates), but

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #68: Experimenting with arrow2

2021-05-31 Thread GitBox
Dandandan commented on a change in pull request #68: URL: https://github.com/apache/arrow-datafusion/pull/68#discussion_r642571782 ## File path: datafusion/src/physical_plan/hash_aggregate.rs ## @@ -339,6 +325,36 @@ pin_project! { } } +fn hash_(group_values: &[ArrayRef]

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #68: Experimenting with arrow2

2021-05-31 Thread GitBox
Dandandan commented on a change in pull request #68: URL: https://github.com/apache/arrow-datafusion/pull/68#discussion_r642571782 ## File path: datafusion/src/physical_plan/hash_aggregate.rs ## @@ -339,6 +325,36 @@ pin_project! { } } +fn hash_(group_values: &[ArrayRef]

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #68: Experimenting with arrow2

2021-05-31 Thread GitBox
Dandandan commented on a change in pull request #68: URL: https://github.com/apache/arrow-datafusion/pull/68#discussion_r642571782 ## File path: datafusion/src/physical_plan/hash_aggregate.rs ## @@ -339,6 +325,36 @@ pin_project! { } } +fn hash_(group_values: &[ArrayRef]

[GitHub] [arrow-datafusion] andygrove opened a new issue #456: Ballista: Implement map-side of shuffle

2021-05-31 Thread GitBox
andygrove opened a new issue #456: URL: https://github.com/apache/arrow-datafusion/issues/456 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** As a small step towards implementing https://github.com/apache/arrow-datafusion/issues/

[GitHub] [arrow] Mgmaplus closed issue #10408: ipc attribute is not found

2021-05-31 Thread GitBox
Mgmaplus closed issue #10408: URL: https://github.com/apache/arrow/issues/10408 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please con

[GitHub] [arrow] Mgmaplus commented on issue #10408: ipc attribute is not found

2021-05-31 Thread GitBox
Mgmaplus commented on issue #10408: URL: https://github.com/apache/arrow/issues/10408#issuecomment-851558916 It was just a matter of reinstalling the dependecies something in the env was not right -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [arrow-datafusion] jorgecarleitao commented on a change in pull request #68: Experimenting with arrow2

2021-05-31 Thread GitBox
jorgecarleitao commented on a change in pull request #68: URL: https://github.com/apache/arrow-datafusion/pull/68#discussion_r642561933 ## File path: datafusion/src/physical_plan/hash_aggregate.rs ## @@ -339,6 +325,36 @@ pin_project! { } } +fn hash_(group_values: &[Arra

[GitHub] [arrow] Mgmaplus removed a comment on issue #10408: ipc attribute is not found

2021-05-31 Thread GitBox
Mgmaplus removed a comment on issue #10408: URL: https://github.com/apache/arrow/issues/10408#issuecomment-851558679 It was just a matter of reinstalling the dependecies something in the env was not right -- This is an automated message from the Apache Git Service. To respond to the mess

[GitHub] [arrow] Mgmaplus commented on issue #10408: ipc attribute is not found

2021-05-31 Thread GitBox
Mgmaplus commented on issue #10408: URL: https://github.com/apache/arrow/issues/10408#issuecomment-851558679 It was just a matter of reinstalling the dependecies something in the env was not right -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [arrow] kiszk edited a comment on pull request #10404: ARROW-12876: [R] Fix build flags on Raspberry Pi

2021-05-31 Thread GitBox
kiszk edited a comment on pull request #10404: URL: https://github.com/apache/arrow/pull/10404#issuecomment-851556928 Interesting. I found similar reports (e.g. https://github.com/redis/redis/issues/6275) in other places. It would be great to put the log of this failure into here.

[GitHub] [arrow] kiszk commented on pull request #10404: ARROW-12876: [R] Fix build flags on Raspberry Pi

2021-05-31 Thread GitBox
kiszk commented on pull request #10404: URL: https://github.com/apache/arrow/pull/10404#issuecomment-851556928 Interesting. I found similar reports (e.g. https://github.com/redis/redis/issues/6275) in other places. It would be great to put the log of this failure into here. Is

[GitHub] [arrow-datafusion] Dandandan commented on issue #418: [question] performance considerations of create_key_for_col (HashAggregate)

2021-05-31 Thread GitBox
Dandandan commented on issue #418: URL: https://github.com/apache/arrow-datafusion/issues/418#issuecomment-851552960 @jorgecarleitao Interesting! I did some earlier experiments with the vectorized hashing too (and saw similar speed ups for low-cardinality aggregates), but got a

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #68: Experimenting with arrow2

2021-05-31 Thread GitBox
Dandandan commented on a change in pull request #68: URL: https://github.com/apache/arrow-datafusion/pull/68#discussion_r642554876 ## File path: datafusion/src/physical_plan/hash_aggregate.rs ## @@ -339,6 +325,36 @@ pin_project! { } } +fn hash_(group_values: &[ArrayRef]

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #68: Experimenting with arrow2

2021-05-31 Thread GitBox
Dandandan commented on a change in pull request #68: URL: https://github.com/apache/arrow-datafusion/pull/68#discussion_r642554087 ## File path: datafusion/src/physical_plan/hash_aggregate.rs ## @@ -339,6 +325,36 @@ pin_project! { } } +fn hash_(group_values: &[ArrayRef]

[GitHub] [arrow-datafusion] Jimexist commented on pull request #452: Optimize `nth_value`, remove `first_value`, `last_value` structs and use idiomatic rust style

2021-05-31 Thread GitBox
Jimexist commented on pull request #452: URL: https://github.com/apache/arrow-datafusion/pull/452#issuecomment-851542085 @alamb as promised :-) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[GitHub] [arrow-datafusion] codecov-commenter edited a comment on pull request #452: Optimize `nth_value`, remove `first_value`, `last_value` structs and use idiomatic rust style

2021-05-31 Thread GitBox
codecov-commenter edited a comment on pull request #452: URL: https://github.com/apache/arrow-datafusion/pull/452#issuecomment-851118576 # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/452?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+

[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #454: fix window aggregation with alias and add integration test case

2021-05-31 Thread GitBox
codecov-commenter commented on pull request #454: URL: https://github.com/apache/arrow-datafusion/pull/454#issuecomment-851540857 # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/454?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comment

[GitHub] [arrow-datafusion] Jimexist opened a new issue #455: window function with alias is not properly rebased

2021-05-31 Thread GitBox
Jimexist opened a new issue #455: URL: https://github.com/apache/arrow-datafusion/issues/455 **Describe the bug** A clear and concise description of what the bug is. window function with alias is not properly rebased. select with alias will err **To Reproduce** Steps to

[GitHub] [arrow-datafusion] Jimexist opened a new pull request #454: fix window aggregation with alias

2021-05-31 Thread GitBox
Jimexist opened a new pull request #454: URL: https://github.com/apache/arrow-datafusion/pull/454 # Which issue does this PR close? fix window aggregation with alias Closes #. # Rationale for this change # What changes are included in this PR? # Ar

[GitHub] [arrow-datafusion] jorgecarleitao commented on issue #418: [question] performance considerations of create_key_for_col (HashAggregate)

2021-05-31 Thread GitBox
jorgecarleitao commented on issue #418: URL: https://github.com/apache/arrow-datafusion/issues/418#issuecomment-851522837 fwiw, I have used the hashing [in the experimental branch](https://github.com/apache/arrow-datafusion/pull/68/files#diff-03876812a8bef4074e517600fdcf8e6b49f1ea24df44905

[GitHub] [arrow-datafusion] Dandandan edited a comment on issue #418: [question] performance considerations of create_key_for_col (HashAggregate)

2021-05-31 Thread GitBox
Dandandan edited a comment on issue #418: URL: https://github.com/apache/arrow-datafusion/issues/418#issuecomment-850491025 @ravlio You are very right, that part is suboptimal. Also in hash aggregates there are a couple of other things: * Keys are created by row and indexe

[GitHub] [arrow-datafusion] Dandandan commented on issue #418: [question] performance considerations of create_key_for_col (HashAggregate)

2021-05-31 Thread GitBox
Dandandan commented on issue #418: URL: https://github.com/apache/arrow-datafusion/issues/418#issuecomment-851518881 Thanks for the input @jhorstmann - that adds some support for the idea! Something like ~2x speed up for more challenging queries where DF currently "struggles" (or bigger f

[GitHub] [arrow-rs] nevi-me commented on a change in pull request #381: Respect max rowgroup size in Arrow writer

2021-05-31 Thread GitBox
nevi-me commented on a change in pull request #381: URL: https://github.com/apache/arrow-rs/pull/381#discussion_r642516947 ## File path: parquet/src/arrow/arrow_writer.rs ## @@ -87,17 +92,31 @@ impl ArrowWriter { "Record batch schema does not match writer schem

[GitHub] [arrow] github-actions[bot] commented on pull request #10411: ARROW-12801: [CI][Packaging][Java] Include all modules in script that generate Arrow jars

2021-05-31 Thread GitBox
github-actions[bot] commented on pull request #10411: URL: https://github.com/apache/arrow/pull/10411#issuecomment-851476508 Revision: c40f74f0ae57f5589c139aaffc64a23ae80618c0 Submitted crossbow builds: [ursacomputing/crossbow @ actions-444](https://github.com/ursacomputing/crossbow/

[GitHub] [arrow] kszucs commented on pull request #10411: ARROW-12801: [CI][Packaging][Java] Include all modules in script that generate Arrow jars

2021-05-31 Thread GitBox
kszucs commented on pull request #10411: URL: https://github.com/apache/arrow/pull/10411#issuecomment-851475938 @github-actions crossbow submit java-jars -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [arrow] AlenkaF commented on a change in pull request #10334: ARROW-12198: [R] bindings for strptime

2021-05-31 Thread GitBox
AlenkaF commented on a change in pull request #10334: URL: https://github.com/apache/arrow/pull/10334#discussion_r642422637 ## File path: r/tests/testthat/test-dplyr-string-functions.R ## @@ -493,3 +493,81 @@ test_that("edge cases in string detection and replacement", { t

[GitHub] [arrow-datafusion] codecov-commenter edited a comment on pull request #452: Optimize `nth_value`, remove `first_value`, `last_value` structs and use idiomatic rust style

2021-05-31 Thread GitBox
codecov-commenter edited a comment on pull request #452: URL: https://github.com/apache/arrow-datafusion/pull/452#issuecomment-851118576 # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/452?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+

[GitHub] [arrow] github-actions[bot] commented on pull request #10425: ARROW-12910: [Gandiva][C++]Add support for ADD and SUBTRACT functions receiving time intervals

2021-05-31 Thread GitBox
github-actions[bot] commented on pull request #10425: URL: https://github.com/apache/arrow/pull/10425#issuecomment-851422579 https://issues.apache.org/jira/browse/ARROW-12910 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [arrow-datafusion] jhorstmann commented on issue #418: [question] performance considerations of create_key_for_col (HashAggregate)

2021-05-31 Thread GitBox
jhorstmann commented on issue #418: URL: https://github.com/apache/arrow-datafusion/issues/418#issuecomment-851422139 > I think what would be bes in the long run is building a mutable typed array based for the aggregation states, and keeping only the _offsets_ to that array in a hash tabl

[GitHub] [arrow] jvictorhuguenin opened a new pull request #10425: ARROW-12910: [Gandiva][C++]Add support for ADD and SUBTRACT functions receiving time intervals

2021-05-31 Thread GitBox
jvictorhuguenin opened a new pull request #10425: URL: https://github.com/apache/arrow/pull/10425 …r month intervals -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For qu

[GitHub] [arrow] kszucs closed pull request #10357: [Release] Verify 4.0.1 RC1

2021-05-31 Thread GitBox
kszucs closed pull request #10357: URL: https://github.com/apache/arrow/pull/10357 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [arrow-datafusion] Jimexist edited a comment on issue #298: Support window functions with empty `OVER` clause

2021-05-31 Thread GitBox
Jimexist edited a comment on issue #298: URL: https://github.com/apache/arrow-datafusion/issues/298#issuecomment-848809874 - [x] https://github.com/apache/arrow-datafusion/pull/375 to add window function support, streaming, and `row_number` - [x] https://github.com/apache/arrow-datafusi

[GitHub] [arrow-rs] yordan-pavlov commented on a change in pull request #384: Implement faster arrow array reader

2021-05-31 Thread GitBox
yordan-pavlov commented on a change in pull request #384: URL: https://github.com/apache/arrow-rs/pull/384#discussion_r642402579 ## File path: parquet/src/util/memory.rs ## @@ -292,19 +292,28 @@ impl BufferPtr { } /// Returns slice of data in this buffer. +#[inl

[GitHub] [arrow-rs] yordan-pavlov commented on a change in pull request #384: Implement faster arrow array reader

2021-05-31 Thread GitBox
yordan-pavlov commented on a change in pull request #384: URL: https://github.com/apache/arrow-rs/pull/384#discussion_r642397656 ## File path: arrow/src/compute/kernels/filter.rs ## @@ -59,19 +59,14 @@ pub(crate) struct SlicesIterator<'a> { } impl<'a> SlicesIterator<'a> { -

[GitHub] [arrow-rs] alamb commented on pull request #383: Add set_bit to BooleanBufferBuilder to allow mutating bit in index

2021-05-31 Thread GitBox
alamb commented on pull request #383: URL: https://github.com/apache/arrow-rs/pull/383#issuecomment-851398299 Thanks @boazberman ! Here is the Slack link @Dandandan is referring to: https://the-asf.slack.com/archives/C01QUFS30TD/p1622403410176000 FWIW @tustvold implemented so

[GitHub] [arrow-rs] alamb commented on pull request #383: Add set_bit to BooleanBufferBuilder to allow mutating bit in index

2021-05-31 Thread GitBox
alamb commented on pull request #383: URL: https://github.com/apache/arrow-rs/pull/383#issuecomment-851395966 FYI @tustvold -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [arrow-rs] alamb commented on a change in pull request #381: Respect max rowgroup size in Arrow writer

2021-05-31 Thread GitBox
alamb commented on a change in pull request #381: URL: https://github.com/apache/arrow-rs/pull/381#discussion_r642388141 ## File path: parquet/src/arrow/arrow_writer.rs ## @@ -1176,31 +1236,51 @@ mod tests { let raw_values: Vec<_> = (0..SMALL_SIZE as i64).collect();

[GitHub] [arrow-rs] alamb commented on issue #343: Add a RecordBatch::split to split large batches into a set of smaller batches

2021-05-31 Thread GitBox
alamb commented on issue #343: URL: https://github.com/apache/arrow-rs/issues/343#issuecomment-851388566 @nevi-me notes on #381 > we would need to account for its individual array offsets, as there is never a guarantee that a record batch has all child arrays starting from the sam

[GitHub] [arrow-datafusion] alamb merged pull request #450: Refactor Ballista executor so that FlightService delegates to an Executor struct

2021-05-31 Thread GitBox
alamb merged pull request #450: URL: https://github.com/apache/arrow-datafusion/pull/450 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, p

[GitHub] [arrow-datafusion] alamb closed issue #449: Refactor Ballista to separate Flight logic from execution logic

2021-05-31 Thread GitBox
alamb closed issue #449: URL: https://github.com/apache/arrow-datafusion/issues/449 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [arrow-datafusion] alamb closed issue #428: enrich integration test to include aggregate csv data

2021-05-31 Thread GitBox
alamb closed issue #428: URL: https://github.com/apache/arrow-datafusion/issues/428 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [arrow-datafusion] alamb merged pull request #425: include test data and add aggregation tests in integration test

2021-05-31 Thread GitBox
alamb merged pull request #425: URL: https://github.com/apache/arrow-datafusion/pull/425 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, p

[GitHub] [arrow-datafusion] alamb closed issue #298: Support window functions with empty `OVER` clause

2021-05-31 Thread GitBox
alamb closed issue #298: URL: https://github.com/apache/arrow-datafusion/issues/298 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [arrow-datafusion] alamb merged pull request #403: add `first_value`, `last_value`, and `nth_value` built-in window functions

2021-05-31 Thread GitBox
alamb merged pull request #403: URL: https://github.com/apache/arrow-datafusion/pull/403 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, p

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #415: use prettier to auto format md files

2021-05-31 Thread GitBox
alamb commented on a change in pull request #415: URL: https://github.com/apache/arrow-datafusion/pull/415#discussion_r642379602 ## File path: .github/workflows/dev.yml ## @@ -23,7 +23,22 @@ on: pull_request: jobs: + prettier: +runs-on: ubuntu-latest +steps: +

[GitHub] [arrow-rs] crepererum commented on a change in pull request #381: Respect max rowgroup size in Arrow writer

2021-05-31 Thread GitBox
crepererum commented on a change in pull request #381: URL: https://github.com/apache/arrow-rs/pull/381#discussion_r642369269 ## File path: parquet/src/arrow/arrow_writer.rs ## @@ -87,17 +92,31 @@ impl ArrowWriter { "Record batch schema does not match writer sc

[GitHub] [arrow-rs] Dandandan commented on a change in pull request #384: Implement faster arrow array reader

2021-05-31 Thread GitBox
Dandandan commented on a change in pull request #384: URL: https://github.com/apache/arrow-rs/pull/384#discussion_r642361176 ## File path: arrow/src/compute/kernels/filter.rs ## @@ -59,19 +59,14 @@ pub(crate) struct SlicesIterator<'a> { } impl<'a> SlicesIterator<'a> { -

[GitHub] [arrow-rs] nevi-me commented on a change in pull request #381: Respect max rowgroup size in Arrow writer

2021-05-31 Thread GitBox
nevi-me commented on a change in pull request #381: URL: https://github.com/apache/arrow-rs/pull/381#discussion_r642357899 ## File path: parquet/src/arrow/arrow_writer.rs ## @@ -87,17 +92,31 @@ impl ArrowWriter { "Record batch schema does not match writer schem

[GitHub] [arrow] kszucs closed pull request #10416: ARROW-12895: [CI] Use "concurrency" setting on Github Actions to cancel stale jobs

2021-05-31 Thread GitBox
kszucs closed pull request #10416: URL: https://github.com/apache/arrow/pull/10416 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [arrow] kszucs commented on pull request #10416: ARROW-12895: [CI] Use "concurrency" setting on Github Actions to cancel stale jobs

2021-05-31 Thread GitBox
kszucs commented on pull request #10416: URL: https://github.com/apache/arrow/pull/10416#issuecomment-851341284 > > If I understand correctly this change should cancel pull request builds as well, like subsequent pushes to the same PR. > > Yep Thanks for confirming it! Let's t

[GitHub] [arrow] potiuk commented on pull request #10416: ARROW-12895: [CI] Use "concurrency" setting on Github Actions to cancel stale jobs

2021-05-31 Thread GitBox
potiuk commented on pull request #10416: URL: https://github.com/apache/arrow/pull/10416#issuecomment-851320624 > If I understand correctly this change should cancel pull request builds as well, like subsequent pushes to the same PR. Yep -- This is an automated message from the Ap

[GitHub] [arrow] kszucs commented on pull request #10416: ARROW-12895: [CI] Use "concurrency" setting on Github Actions to cancel stale jobs

2021-05-31 Thread GitBox
kszucs commented on pull request #10416: URL: https://github.com/apache/arrow/pull/10416#issuecomment-851319673 > The configuration cancels pending jobs for the master branch, right? If I understand correctly this change should cancel pull request builds as well, like subsequent push

[GitHub] [arrow] kszucs commented on pull request #10374: [Release] Verify 4.0.1 RC1 [WIP]

2021-05-31 Thread GitBox
kszucs commented on pull request #10374: URL: https://github.com/apache/arrow/pull/10374#issuecomment-851316557 The vote has passed, so closing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[GitHub] [arrow] kszucs closed pull request #10374: [Release] Verify 4.0.1 RC1 [WIP]

2021-05-31 Thread GitBox
kszucs closed pull request #10374: URL: https://github.com/apache/arrow/pull/10374 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [arrow-datafusion] Jimexist opened a new pull request #453: use prettier check in CI

2021-05-31 Thread GitBox
Jimexist opened a new pull request #453: URL: https://github.com/apache/arrow-datafusion/pull/453 # Which issue does this PR close? Closes #415 # Rationale for this change # What changes are included in this PR? # Are there any user-facing changes?

[GitHub] [arrow-datafusion] Jimexist commented on a change in pull request #415: use prettier to auto format md files

2021-05-31 Thread GitBox
Jimexist commented on a change in pull request #415: URL: https://github.com/apache/arrow-datafusion/pull/415#discussion_r642280510 ## File path: .github/workflows/dev.yml ## @@ -23,7 +23,22 @@ on: pull_request: jobs: + prettier: +runs-on: ubuntu-latest +steps:

[GitHub] [arrow-rs] crepererum commented on a change in pull request #381: Respect max rowgroup size in Arrow writer

2021-05-31 Thread GitBox
crepererum commented on a change in pull request #381: URL: https://github.com/apache/arrow-rs/pull/381#discussion_r642270850 ## File path: parquet/src/arrow/arrow_writer.rs ## @@ -87,17 +92,31 @@ impl ArrowWriter { "Record batch schema does not match writer sc

  1   2   >