[GitHub] [arrow-datafusion] Ted-Jiang opened a new pull request #1530: Add load test command in tpch.rs.

2022-01-09 Thread GitBox
Ted-Jiang opened a new pull request #1530: URL: https://github.com/apache/arrow-datafusion/pull/1530 # Which issue does this PR close? #1521 Closes #. # Rationale for this change Add loadtest for testing the robustness of the Ballista # What changes are inclu

[GitHub] [arrow-rs] jhorstmann commented on a change in pull request #1145: Fix undefined behavor in GenericStringArray::from_iter_values

2022-01-09 Thread GitBox
jhorstmann commented on a change in pull request #1145: URL: https://github.com/apache/arrow-rs/pull/1145#discussion_r780760173 ## File path: arrow/src/array/array_string.rs ## @@ -187,8 +187,15 @@ impl GenericStringArray { offsets.push(length_so_far);

[GitHub] [arrow-rs] jhorstmann commented on issue #1136: Interval comparisons with `simd` feature asserts

2022-01-09 Thread GitBox
jhorstmann commented on issue #1136: URL: https://github.com/apache/arrow-rs/issues/1136#issuecomment-1008270231 The issue seems to be with the relatively new `MonthDayNano` interval type, which is stored as `i128`. The simd machinery does not yet support that type, since it works with 512

[GitHub] [arrow-rs] jhorstmann edited a comment on issue #1136: Interval comparisons with `simd` feature asserts

2022-01-09 Thread GitBox
jhorstmann edited a comment on issue #1136: URL: https://github.com/apache/arrow-rs/issues/1136#issuecomment-1008270231 The issue seems to be with the relatively new `MonthDayNano` interval type, which is stored as `i128`. The simd machinery does not yet support that type, since it works w

[GitHub] [arrow-datafusion] Ted-Jiang opened a new issue #1531: Ballista scheduler dead loop in loadtest

2022-01-09 Thread GitBox
Ted-Jiang opened a new issue #1531: URL: https://github.com/apache/arrow-datafusion/issues/1531 **Describe the bug** 1. start scheduler `RUST_LOG=INFO cargo run --bin ballista-scheduler` 2. start one executor 'RUST_LOG=INFO cargo run --release --bin ballista-executor ' 3. run

[GitHub] [arrow-datafusion] Ted-Jiang commented on issue #1531: Ballista scheduler dead loop in loadtest

2022-01-09 Thread GitBox
Ted-Jiang commented on issue #1531: URL: https://github.com/apache/arrow-datafusion/issues/1531#issuecomment-1008272250 All 16 tokio-runtime-worker are lock at sled::subscriber::Subscribers `register` or `reserve` `register` : create a Subscribe on some key in sled `reserve`: c

[GitHub] [arrow-datafusion] Ted-Jiang commented on issue #1531: Ballista scheduler dead loop in loadtest

2022-01-09 Thread GitBox
Ted-Jiang commented on issue #1531: URL: https://github.com/apache/arrow-datafusion/issues/1531#issuecomment-1008278405 I found `ballista_scheduler::SchedulerServer::new` 11 times in process https://github.com/apache/arrow-datafusion/blob/847e78a675703c24933af5d6a429c2576bc14e9d/ba

[GitHub] [arrow-datafusion] Ted-Jiang commented on issue #1531: Ballista scheduler dead loop in loadtest

2022-01-09 Thread GitBox
Ted-Jiang commented on issue #1531: URL: https://github.com/apache/arrow-datafusion/issues/1531#issuecomment-1008278877 I think it should create only one synchronize_job_status_loop in one ballista_scheduler process. @andygrove @Dandandan @alamb FYI -- This is an automated m

[GitHub] [arrow-datafusion] alamb opened a new issue #1532: Discussion: Switch DataFusion to using arrow2?

2022-01-09 Thread GitBox
alamb opened a new issue #1532: URL: https://github.com/apache/arrow-datafusion/issues/1532 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Datafusion currently relies on the https://github.com/apache/arrow-rs implementation of A

[GitHub] [arrow-datafusion] alamb commented on issue #1532: Discussion: Switch DataFusion to using arrow2?

2022-01-09 Thread GitBox
alamb commented on issue #1532: URL: https://github.com/apache/arrow-datafusion/issues/1532#issuecomment-1008288047 I believe the current proposal is to make an official arrow branch in datafusion: https://github.com/apache/arrow-datafusion/pull/68#issuecomment-1007955176, which is proba

[GitHub] [arrow-datafusion] selvavm opened a new issue #1533: When using Dataframe getting empty row but pretty print contain rows

2022-01-09 Thread GitBox
selvavm opened a new issue #1533: URL: https://github.com/apache/arrow-datafusion/issues/1533 **Describe the bug** Getting empty array when using parquet but prints data **To Reproduce** let df = df .aggregate( vec![col("name")],

[GitHub] [arrow] cyb70289 commented on a change in pull request #12084: ARROW-15029: [C++] Split compute/kernels/scalar_string.cc

2022-01-09 Thread GitBox
cyb70289 commented on a change in pull request #12084: URL: https://github.com/apache/arrow/pull/12084#discussion_r780778052 ## File path: cpp/src/arrow/compute/kernels/scalar_string_internal.cc ## @@ -0,0 +1,57 @@ +// Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [arrow-datafusion] alamb merged pull request #1528: Correct typos in README

2022-01-09 Thread GitBox
alamb merged pull request #1528: URL: https://github.com/apache/arrow-datafusion/pull/1528 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-

[GitHub] [arrow] jvanstraten commented on a change in pull request #12084: ARROW-15029: [C++] Split compute/kernels/scalar_string.cc

2022-01-09 Thread GitBox
jvanstraten commented on a change in pull request #12084: URL: https://github.com/apache/arrow/pull/12084#discussion_r780780327 ## File path: cpp/src/arrow/compute/kernels/scalar_string_internal.cc ## @@ -0,0 +1,57 @@ +// Licensed to the Apache Software Foundation (ASF) under o

[GitHub] [arrow] bkmgit commented on a change in pull request #11882: ARROW-9843: [C++] Implement Between ternary kernel

2022-01-09 Thread GitBox
bkmgit commented on a change in pull request #11882: URL: https://github.com/apache/arrow/pull/11882#discussion_r780780395 ## File path: cpp/src/arrow/util/bit_block_counter.h ## @@ -424,6 +565,141 @@ class ARROW_EXPORT OptionalBinaryBitBlockCounter { } }; +/// \brief A c

[GitHub] [arrow] bkmgit commented on a change in pull request #11882: ARROW-9843: [C++] Implement Between ternary kernel

2022-01-09 Thread GitBox
bkmgit commented on a change in pull request #11882: URL: https://github.com/apache/arrow/pull/11882#discussion_r780780625 ## File path: cpp/src/arrow/compute/kernels/scalar_compare.cc ## @@ -746,6 +787,111 @@ std::shared_ptr MakeScalarMinMax(std::string name, return func;

[GitHub] [arrow] bkmgit commented on a change in pull request #11882: ARROW-9843: [C++] Implement Between ternary kernel

2022-01-09 Thread GitBox
bkmgit commented on a change in pull request #11882: URL: https://github.com/apache/arrow/pull/11882#discussion_r780780707 ## File path: cpp/src/arrow/compute/kernels/scalar_compare.cc ## @@ -746,6 +787,111 @@ std::shared_ptr MakeScalarMinMax(std::string name, return func;

[GitHub] [arrow] bkmgit commented on a change in pull request #11882: ARROW-9843: [C++] Implement Between ternary kernel

2022-01-09 Thread GitBox
bkmgit commented on a change in pull request #11882: URL: https://github.com/apache/arrow/pull/11882#discussion_r780780727 ## File path: cpp/src/arrow/compute/kernels/scalar_compare.cc ## @@ -746,6 +787,111 @@ std::shared_ptr MakeScalarMinMax(std::string name, return func;

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1526: A simplified memory manager for query execution

2022-01-09 Thread GitBox
alamb commented on a change in pull request #1526: URL: https://github.com/apache/arrow-datafusion/pull/1526#discussion_r780779376 ## File path: ballista/rust/executor/src/collect.rs ## @@ -75,11 +76,12 @@ impl ExecutionPlan for CollectExec { async fn execute( &se

[GitHub] [arrow] bkmgit commented on a change in pull request #11882: ARROW-9843: [C++] Implement Between ternary kernel

2022-01-09 Thread GitBox
bkmgit commented on a change in pull request #11882: URL: https://github.com/apache/arrow/pull/11882#discussion_r780786506 ## File path: cpp/src/arrow/compute/kernels/scalar_compare.cc ## @@ -156,39 +210,50 @@ struct Maximum { } }; +// Check if timestamp timezones are com

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1525: Add stddev operator

2022-01-09 Thread GitBox
alamb commented on a change in pull request #1525: URL: https://github.com/apache/arrow-datafusion/pull/1525#discussion_r780787572 ## File path: datafusion/src/physical_plan/expressions/variance.rs ## @@ -0,0 +1,376 @@ +// Licensed to the Apache Software Foundation (ASF) under

[GitHub] [arrow-datafusion] alamb commented on pull request #1525: Add stddev operator

2022-01-09 Thread GitBox
alamb commented on pull request #1525: URL: https://github.com/apache/arrow-datafusion/pull/1525#issuecomment-1008307171 > I want to discuss the aggregator interface a bit further. The update function doesn't seem to provide much value with the current architecture, it seems under all cir

[GitHub] [arrow-rs] alamb commented on a change in pull request #1145: Fix undefined behavor in GenericStringArray::from_iter_values

2022-01-09 Thread GitBox
alamb commented on a change in pull request #1145: URL: https://github.com/apache/arrow-rs/pull/1145#discussion_r780788179 ## File path: arrow/src/array/array_string.rs ## @@ -187,8 +187,15 @@ impl GenericStringArray { offsets.push(length_so_far); val

[GitHub] [arrow-rs] jhorstmann opened a new pull request #1146: Implement SIMD comparison operations for types with less than 4 lanes (i128)

2022-01-09 Thread GitBox
jhorstmann opened a new pull request #1146: URL: https://github.com/apache/arrow-rs/pull/1146 # Which issue does this PR close? Implements comparison for simd types with less than 8 lanes. Closes #1136 . # What changes are included in this PR? This PR changes the

[GitHub] [arrow-rs] alamb commented on a change in pull request #1145: Fix undefined behavor in GenericStringArray::from_iter_values

2022-01-09 Thread GitBox
alamb commented on a change in pull request #1145: URL: https://github.com/apache/arrow-rs/pull/1145#discussion_r780795424 ## File path: arrow/src/array/array_string.rs ## @@ -187,8 +187,13 @@ impl GenericStringArray { offsets.push(length_so_far); val

[GitHub] [arrow-rs] codecov-commenter commented on pull request #1146: Implement SIMD comparison operations for types with less than 4 lanes (i128)

2022-01-09 Thread GitBox
codecov-commenter commented on pull request #1146: URL: https://github.com/apache/arrow-rs/pull/1146#issuecomment-1008318204 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1146?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=T

[GitHub] [arrow-rs] alamb opened a new pull request #1147: Internal Remove `GenericStringArray::from_vec` and `GenericStringArray::from_opt_vec`

2022-01-09 Thread GitBox
alamb opened a new pull request #1147: URL: https://github.com/apache/arrow-rs/pull/1147 # Which issue does this PR close? Minor change # Rationale for this change Less code is easier to maintain and faster to compile I noticed this while working on https://gith

[GitHub] [arrow-rs] alamb commented on issue #197: String and BinaryArray created from iterators that don't accurately report size can lead to undefined behavior

2022-01-09 Thread GitBox
alamb commented on issue #197: URL: https://github.com/apache/arrow-rs/issues/197#issuecomment-1008319749 An update: manually audited all the uses of the `size_hint` in the arrow codebase: ```shell rg -n -H --no-heading -e 'size_hint' $(git rev-parse --show-toplevel || pwd) /U

[GitHub] [arrow-rs] codecov-commenter edited a comment on pull request #1145: Fix undefined behavor in GenericStringArray::from_iter_values

2022-01-09 Thread GitBox
codecov-commenter edited a comment on pull request #1145: URL: https://github.com/apache/arrow-rs/pull/1145#issuecomment-1007976168 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1145?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm

[GitHub] [arrow-rs] codecov-commenter commented on pull request #1147: Internal Remove `GenericStringArray::from_vec` and `GenericStringArray::from_opt_vec`

2022-01-09 Thread GitBox
codecov-commenter commented on pull request #1147: URL: https://github.com/apache/arrow-rs/pull/1147#issuecomment-1008321424 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1147?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=T

[GitHub] [arrow-rs] alamb opened a new pull request #1148: Document safety justification of some uses of `from_trusted_len_iter`

2022-01-09 Thread GitBox
alamb opened a new pull request #1148: URL: https://github.com/apache/arrow-rs/pull/1148 # Which issue does this PR close? Re https://github.com/apache/arrow-rs/issues/197 # Rationale for this change As part of #197 I reviewed all uses of `from_trusted_len_iter`; While

[GitHub] [arrow] nealrichardson commented on issue #12102: How to pass an in-memory arrow object from Rust into R

2022-01-09 Thread GitBox
nealrichardson commented on issue #12102: URL: https://github.com/apache/arrow/issues/12102#issuecomment-1008322593 The C Data Interface is the way you pass a block of memory in process. I don't know the specifics of the Rust implementation to guide you further, maybe others can help there

[GitHub] [arrow] alamb commented on issue #12102: How to pass an in-memory arrow object from Rust into R

2022-01-09 Thread GitBox
alamb commented on issue #12102: URL: https://github.com/apache/arrow/issues/12102#issuecomment-1008323398 I think what you are looking for is called the `ffi` module in arrow-rs: https://docs.rs/arrow/6.5.0/arrow/ffi/index.html Perhaps something like ```rust let array = unsafe

[GitHub] [arrow] alamb edited a comment on issue #12102: How to pass an in-memory arrow object from Rust into R

2022-01-09 Thread GitBox
alamb edited a comment on issue #12102: URL: https://github.com/apache/arrow/issues/12102#issuecomment-1008323398 I think what you are looking for is called the `ffi` module in arrow-rs: https://docs.rs/arrow/6.5.0/arrow/ffi/index.html Perhaps something like ```rust let (array_

[GitHub] [arrow] alamb commented on issue #12102: How to pass an in-memory arrow object from Rust into R

2022-01-09 Thread GitBox
alamb commented on issue #12102: URL: https://github.com/apache/arrow/issues/12102#issuecomment-1008323791 FWIW I think `polars` uses `arrow2` which also has an `ffi` module, but the interface is different: https://docs.rs/arrow2/0.8.1/arrow2/ffi/index.html -- This is an automated messag

[GitHub] [arrow-datafusion] alamb commented on issue #1531: Ballista scheduler dead loop in loadtest

2022-01-09 Thread GitBox
alamb commented on issue #1531: URL: https://github.com/apache/arrow-datafusion/issues/1531#issuecomment-1008324233 hi @Ted-Jiang -- thanks for the report. I am not super familiar with the ballista internals but it definitely sounds like a bug to me. We would welcome a pull request with

[GitHub] [arrow-rs] codecov-commenter commented on pull request #1148: Document safety justification of some uses of `from_trusted_len_iter`

2022-01-09 Thread GitBox
codecov-commenter commented on pull request #1148: URL: https://github.com/apache/arrow-rs/pull/1148#issuecomment-1008324491 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1148?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=T

[GitHub] [arrow-datafusion] alamb commented on issue #1533: When using Dataframe getting empty row but pretty print contain rows

2022-01-09 Thread GitBox
alamb commented on issue #1533: URL: https://github.com/apache/arrow-datafusion/issues/1533#issuecomment-1008324922 Hi @selvavm -- this is very strange what happens when you print out the entire `results`? ```rust println!("Min for Aaa is {:#?}", results); ``` If you

[GitHub] [arrow-datafusion] thinkharderdev commented on issue #1532: Discussion: Switch DataFusion to using arrow2?

2022-01-09 Thread GitBox
thinkharderdev commented on issue #1532: URL: https://github.com/apache/arrow-datafusion/issues/1532#issuecomment-1008329590 Will `arrow-rs` eventually support async file IO? Requiring a synchronous `ChuckReader` is currently a major limitation in supporting alternate `ObjectStore`s --

[GitHub] [arrow] jorgecarleitao commented on issue #12102: How to pass an in-memory arrow object from Rust into R

2022-01-09 Thread GitBox
jorgecarleitao commented on issue #12102: URL: https://github.com/apache/arrow/issues/12102#issuecomment-1008330881 Hi. Thanks for the ping @nealrichardson . Thanks for the initiative, @multimeric , super cool! Note that the C data interface is designed for _intra_ process comm

[GitHub] [arrow] jorgecarleitao edited a comment on issue #12102: How to pass an in-memory arrow object from Rust into R

2022-01-09 Thread GitBox
jorgecarleitao edited a comment on issue #12102: URL: https://github.com/apache/arrow/issues/12102#issuecomment-1008330881 Hi. Thanks for the ping @nealrichardson . Thanks for the initiative, @multimeric , super cool! Note that the C data interface is designed for _intra_ proce

[GitHub] [arrow] jorgecarleitao edited a comment on issue #12102: How to pass an in-memory arrow object from Rust into R

2022-01-09 Thread GitBox
jorgecarleitao edited a comment on issue #12102: URL: https://github.com/apache/arrow/issues/12102#issuecomment-1008330881 Hi. Thanks for the ping @nealrichardson . Thanks for the initiative, @multimeric , super cool! Note that the C data interface is designed for _intra_ proce

[GitHub] [arrow] jorgecarleitao edited a comment on issue #12102: How to pass an in-memory arrow object from Rust into R

2022-01-09 Thread GitBox
jorgecarleitao edited a comment on issue #12102: URL: https://github.com/apache/arrow/issues/12102#issuecomment-1008330881 Hi. Thanks for the ping @nealrichardson . Thanks for the initiative, @multimeric , super cool! Note that the C data interface is designed for _intra_ proce

[GitHub] [arrow] bkmgit commented on pull request #11882: ARROW-9843: [C++] Implement Between ternary kernel

2022-01-09 Thread GitBox
bkmgit commented on pull request #11882: URL: https://github.com/apache/arrow/pull/11882#issuecomment-1008333615 @edponce Thanks for making it work. Other than formatting, tests seem to pass for commit https://github.com/apache/arrow/pull/11882/commits/b84ca3b63fcbf13b03e13c40ada6a7e93b141

[GitHub] [arrow] ritchie46 commented on issue #12102: How to pass an in-memory arrow object from Rust into R

2022-01-09 Thread GitBox
ritchie46 commented on issue #12102: URL: https://github.com/apache/arrow/issues/12102#issuecomment-1008334083 If you are also using polars/arrow logical types (`Categorical`, `Datetime`, `Date`, `Duration`, or `Time`) you must ensure that the internal chunks of `Series` are coerced to the

[GitHub] [arrow-datafusion] realno commented on pull request #1525: Add stddev operator

2022-01-09 Thread GitBox
realno commented on pull request #1525: URL: https://github.com/apache/arrow-datafusion/pull/1525#issuecomment-1008337842 > > I want to discuss the aggregator interface a bit further. The update function doesn't seem to provide much value with the current architecture, it seems under all

[GitHub] [arrow] zeroshade commented on pull request #11538: ARROW-13986: [Go][Parquet] Add File Writers and tests

2022-01-09 Thread GitBox
zeroshade commented on pull request #11538: URL: https://github.com/apache/arrow/pull/11538#issuecomment-1008339296 @emkornfield just bumping to make sure you see I addressed the comments you left. If i don't hear back from you in a few days i'll merge this as you said you don't think you

[GitHub] [arrow] zeroshade closed pull request #11832: ARROW-5599: [Go] Migrate array.{Interface,Record,Column,Chunked,Table} to arrow.{Array,Record,Column,Chunked,Table}

2022-01-09 Thread GitBox
zeroshade closed pull request #11832: URL: https://github.com/apache/arrow/pull/11832 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsub

[GitHub] [arrow] ursabot commented on pull request #11832: ARROW-5599: [Go] Migrate array.{Interface,Record,Column,Chunked,Table} to arrow.{Array,Record,Column,Chunked,Table}

2022-01-09 Thread GitBox
ursabot commented on pull request #11832: URL: https://github.com/apache/arrow/pull/11832#issuecomment-1008345195 Benchmark runs are scheduled for baseline = 4c2294c7173abf6a9920f09520d8cbc56c361ddc and contender = 25cd0078b0a9a913f2443e447afe89beb81e8760. 25cd0078b0a9a913f2443e447afe89be

[GitHub] [arrow] ursabot edited a comment on pull request #11832: ARROW-5599: [Go] Migrate array.{Interface,Record,Column,Chunked,Table} to arrow.{Array,Record,Column,Chunked,Table}

2022-01-09 Thread GitBox
ursabot edited a comment on pull request #11832: URL: https://github.com/apache/arrow/pull/11832#issuecomment-1008345195 Benchmark runs are scheduled for baseline = 4c2294c7173abf6a9920f09520d8cbc56c361ddc and contender = 25cd0078b0a9a913f2443e447afe89beb81e8760. 25cd0078b0a9a913f2443e447

[GitHub] [arrow-datafusion] hntd187 commented on issue #1532: Discussion: Switch DataFusion to using arrow2?

2022-01-09 Thread GitBox
hntd187 commented on issue #1532: URL: https://github.com/apache/arrow-datafusion/issues/1532#issuecomment-1008352621 I guess, what are the reasons switching would be a bad idea? Like what is the delta between what they both currently provide? -- This is an automated message from the Ap

[GitHub] [arrow] edponce commented on pull request #11882: ARROW-9843: [C++] Implement Between ternary kernel

2022-01-09 Thread GitBox
edponce commented on pull request #11882: URL: https://github.com/apache/arrow/pull/11882#issuecomment-1008355611 @bkmgit For Python, you would only need to add bindings for `BetweenOptions` because the compute function is "binded" automatically (`call_function` mechanism). I suggest that

[GitHub] [arrow-datafusion] houqp commented on issue #1532: Discussion: Switch DataFusion to using arrow2?

2022-01-09 Thread GitBox
houqp commented on issue #1532: URL: https://github.com/apache/arrow-datafusion/issues/1532#issuecomment-1008356293 Thank you @alamb for bringing this up! > I believe the current proposal is to make an official arrow branch in datafusion: #68 (comment), which is probably a step towa

[GitHub] [arrow-datafusion] houqp merged pull request #1530: Add load test command in tpch.rs.

2022-01-09 Thread GitBox
houqp merged pull request #1530: URL: https://github.com/apache/arrow-datafusion/pull/1530 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-

[GitHub] [arrow-datafusion] houqp commented on pull request #1530: Add load test command in tpch.rs.

2022-01-09 Thread GitBox
houqp commented on pull request #1530: URL: https://github.com/apache/arrow-datafusion/pull/1530#issuecomment-1008356941 Thanks @Ted-Jiang ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow-datafusion] james727 opened a new pull request #1534: Mark ARRAY_AGG(DISTINCT ...) not implemented

2022-01-09 Thread GitBox
james727 opened a new pull request #1534: URL: https://github.com/apache/arrow-datafusion/pull/1534 # Which issue does this PR close? This partially addresses https://github.com/apache/arrow-datafusion/issues/1512 # Rationale for this change Right now `array_agg(distinct ...)`

[GitHub] [arrow] ursabot edited a comment on pull request #11832: ARROW-5599: [Go] Migrate array.{Interface,Record,Column,Chunked,Table} to arrow.{Array,Record,Column,Chunked,Table}

2022-01-09 Thread GitBox
ursabot edited a comment on pull request #11832: URL: https://github.com/apache/arrow/pull/11832#issuecomment-1008345195 Benchmark runs are scheduled for baseline = 4c2294c7173abf6a9920f09520d8cbc56c361ddc and contender = 25cd0078b0a9a913f2443e447afe89beb81e8760. 25cd0078b0a9a913f2443e447

[GitHub] [arrow-datafusion] matthewmturner commented on issue #1515: high level roadmap for Arrow / Datafusion

2022-01-09 Thread GitBox
matthewmturner commented on issue #1515: URL: https://github.com/apache/arrow-datafusion/issues/1515#issuecomment-1008363268 @Dandandan @pjmore do you think the work on tokomak optimizer could be added to roadmap? -- This is an automated message from the Apache Git Service. To respond t

[GitHub] [arrow-datafusion] alamb commented on issue #1532: Discussion: Switch DataFusion to using arrow2?

2022-01-09 Thread GitBox
alamb commented on issue #1532: URL: https://github.com/apache/arrow-datafusion/issues/1532#issuecomment-1008364021 > Will arrow-rs eventually support async file IO? Requiring a synchronous ChuckReader is currently a major limitation in supporting alternate ObjectStores I think some

[GitHub] [arrow-datafusion] houqp commented on issue #1532: Discussion: Switch DataFusion to using arrow2?

2022-01-09 Thread GitBox
houqp commented on issue #1532: URL: https://github.com/apache/arrow-datafusion/issues/1532#issuecomment-1008364868 > @houqp can you make a PR? Would you like me to? @yjshen ? For sure, I can help create that PR :+1: -- This is an automated message from the Apache Git Service. To

[GitHub] [arrow] ursabot edited a comment on pull request #11832: ARROW-5599: [Go] Migrate array.{Interface,Record,Column,Chunked,Table} to arrow.{Array,Record,Column,Chunked,Table}

2022-01-09 Thread GitBox
ursabot edited a comment on pull request #11832: URL: https://github.com/apache/arrow/pull/11832#issuecomment-1008345195 Benchmark runs are scheduled for baseline = 4c2294c7173abf6a9920f09520d8cbc56c361ddc and contender = 25cd0078b0a9a913f2443e447afe89beb81e8760. 25cd0078b0a9a913f2443e447

[GitHub] [arrow] edponce edited a comment on pull request #11882: ARROW-9843: [C++] Implement Between ternary kernel

2022-01-09 Thread GitBox
edponce edited a comment on pull request #11882: URL: https://github.com/apache/arrow/pull/11882#issuecomment-1008355611 @bkmgit It is a team effort 👍🏾! For Python, you would only need to add bindings for `BetweenOptions` because the compute function is "binded" automatically (`call_functi

[GitHub] [arrow] kou commented on pull request #12107: ARROW-15288: [GLib] Add garrow_execute_plan_build_hash_join_node()

2022-01-09 Thread GitBox
kou commented on pull request #12107: URL: https://github.com/apache/arrow/pull/12107#issuecomment-1008401605 +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

[GitHub] [arrow] kou closed pull request #12107: ARROW-15288: [GLib] Add garrow_execute_plan_build_hash_join_node()

2022-01-09 Thread GitBox
kou closed pull request #12107: URL: https://github.com/apache/arrow/pull/12107 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

[GitHub] [arrow-rs] codecov-commenter edited a comment on pull request #1145: Fix undefined behavor in GenericStringArray::from_iter_values

2022-01-09 Thread GitBox
codecov-commenter edited a comment on pull request #1145: URL: https://github.com/apache/arrow-rs/pull/1145#issuecomment-1007976168 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1145?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm

[GitHub] [arrow] ursabot commented on pull request #12107: ARROW-15288: [GLib] Add garrow_execute_plan_build_hash_join_node()

2022-01-09 Thread GitBox
ursabot commented on pull request #12107: URL: https://github.com/apache/arrow/pull/12107#issuecomment-1008422136 Benchmark runs are scheduled for baseline = 25cd0078b0a9a913f2443e447afe89beb81e8760 and contender = f7bd4c3904e30ec63263eab1cb59876c15f67d5a. f7bd4c3904e30ec63263eab1cb59876c

[GitHub] [arrow] github-actions[bot] commented on pull request #12108: ARROW-14531: [Ruby] Add Arrow::Table#join

2022-01-09 Thread GitBox
github-actions[bot] commented on pull request #12108: URL: https://github.com/apache/arrow/pull/12108#issuecomment-1008422840 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [arrow] domoritz edited a comment on pull request #10371: ARROW-12549: [JS] Table and RecordBatch should not extend Vector, make JS lib smaller

2022-01-09 Thread GitBox
domoritz edited a comment on pull request #10371: URL: https://github.com/apache/arrow/pull/10371#issuecomment-1008213010 - [ ] Remove index subscript code - [x] Update docs - [ ] Apply memoization in iterator visitor -- This is an automated message from the Apache Git Service. To r

[GitHub] [arrow] ursabot edited a comment on pull request #12107: ARROW-15288: [GLib] Add garrow_execute_plan_build_hash_join_node()

2022-01-09 Thread GitBox
ursabot edited a comment on pull request #12107: URL: https://github.com/apache/arrow/pull/12107#issuecomment-1008422136 Benchmark runs are scheduled for baseline = 25cd0078b0a9a913f2443e447afe89beb81e8760 and contender = f7bd4c3904e30ec63263eab1cb59876c15f67d5a. f7bd4c3904e30ec63263eab1c

[GitHub] [arrow-datafusion] hntd187 commented on issue #1532: Discussion: Switch DataFusion to using arrow2?

2022-01-09 Thread GitBox
hntd187 commented on issue #1532: URL: https://github.com/apache/arrow-datafusion/issues/1532#issuecomment-1008428765 > > I guess, what are the reasons switching would be a bad idea? Like what is the delta between what they both currently provide? > > IMHO, the main downside is the

[GitHub] [arrow] edponce commented on a change in pull request #12084: ARROW-15029: [C++] Split compute/kernels/scalar_string.cc

2022-01-09 Thread GitBox
edponce commented on a change in pull request #12084: URL: https://github.com/apache/arrow/pull/12084#discussion_r780836474 ## File path: cpp/src/arrow/compute/kernels/scalar_string_internal.cc ## @@ -0,0 +1,57 @@ +// Licensed to the Apache Software Foundation (ASF) under one +

[GitHub] [arrow] edponce commented on a change in pull request #12081: ARROW-10643: [Python] Pandas<->pyarrow roundtrip failing to recreate index for empty dataframe

2022-01-09 Thread GitBox
edponce commented on a change in pull request #12081: URL: https://github.com/apache/arrow/pull/12081#discussion_r780837727 ## File path: python/pyarrow/pandas_compat.py ## @@ -934,7 +934,7 @@ def _reconstruct_index(table, index_descriptors, all_columns):

[GitHub] [arrow] edponce commented on a change in pull request #11978: ARROW-15137: [Dev] Update archery crossbow latest-prefix to work with nightly dates

2022-01-09 Thread GitBox
edponce commented on a change in pull request #11978: URL: https://github.com/apache/arrow/pull/11978#discussion_r780838857 ## File path: dev/archery/archery/crossbow/core.py ## @@ -536,17 +536,34 @@ def _latest_prefix_id(self, prefix): latest = -1 return

[GitHub] [arrow] edponce commented on a change in pull request #11942: ARROW-14762: [Doc] Additional info and resources

2022-01-09 Thread GitBox
edponce commented on a change in pull request #11942: URL: https://github.com/apache/arrow/pull/11942#discussion_r780839108 ## File path: docs/source/developers/guide/resources.rst ## @@ -27,3 +27,52 @@ Additional information and resources

[GitHub] [arrow] ursabot edited a comment on pull request #12107: ARROW-15288: [GLib] Add garrow_execute_plan_build_hash_join_node()

2022-01-09 Thread GitBox
ursabot edited a comment on pull request #12107: URL: https://github.com/apache/arrow/pull/12107#issuecomment-1008422136 Benchmark runs are scheduled for baseline = 25cd0078b0a9a913f2443e447afe89beb81e8760 and contender = f7bd4c3904e30ec63263eab1cb59876c15f67d5a. f7bd4c3904e30ec63263eab1c

[GitHub] [arrow] ursabot edited a comment on pull request #12107: ARROW-15288: [GLib] Add garrow_execute_plan_build_hash_join_node()

2022-01-09 Thread GitBox
ursabot edited a comment on pull request #12107: URL: https://github.com/apache/arrow/pull/12107#issuecomment-1008422136 Benchmark runs are scheduled for baseline = 25cd0078b0a9a913f2443e447afe89beb81e8760 and contender = f7bd4c3904e30ec63263eab1cb59876c15f67d5a. f7bd4c3904e30ec63263eab1c

[GitHub] [arrow] multimeric commented on issue #12102: How to pass an in-memory arrow object from Rust into R

2022-01-09 Thread GitBox
multimeric commented on issue #12102: URL: https://github.com/apache/arrow/issues/12102#issuecomment-1008448831 Thanks all, the help is greatly appreciated. I'll try the FFI interface, it seems to be what I want. It looks like the workflow will involve `let ptr = export_array_to_c(so

[GitHub] [arrow-rs] liukun4515 commented on a change in pull request #1141: Update version to 7.0.0 and update CHANGELOG

2022-01-09 Thread GitBox
liukun4515 commented on a change in pull request #1141: URL: https://github.com/apache/arrow-rs/pull/1141#discussion_r780867767 ## File path: CHANGELOG.md ## @@ -19,8 +19,146 @@ For older versions, see [apache/arrow/CHANGELOG.md](https://github.com/apache/arrow/blob/master/

[GitHub] [arrow-datafusion] liukun4515 commented on issue #1521: Add load test in tpch.rs

2022-01-09 Thread GitBox
liukun4515 commented on issue #1521: URL: https://github.com/apache/arrow-datafusion/issues/1521#issuecomment-1008497764 The issue will be closed automatically, with `Close #1521` in the pull request. Just like Close #1521. @Ted-Jiang -- This is an automated message from the Apache

[GitHub] [arrow-datafusion] Ted-Jiang closed issue #1521: Add load test in tpch.rs

2022-01-09 Thread GitBox
Ted-Jiang closed issue #1521: URL: https://github.com/apache/arrow-datafusion/issues/1521 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-u

[GitHub] [arrow-datafusion] liukun4515 opened a new issue #1535: support hasher for decimal type

2022-01-09 Thread GitBox
liukun4515 opened a new issue #1535: URL: https://github.com/apache/arrow-datafusion/issues/1535 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** part of #122 We need to group by decimal column like below ``` SELECT c

[GitHub] [arrow-datafusion] liukun4515 commented on issue #1522: support sorting decimal data type

2022-01-09 Thread GitBox
liukun4515 commented on issue #1522: URL: https://github.com/apache/arrow-datafusion/issues/1522#issuecomment-1008499067 It's resolved in my branch, and will be cherry-picked later -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[GitHub] [arrow-datafusion] liukun4515 commented on issue #1535: support hasher for decimal type

2022-01-09 Thread GitBox
liukun4515 commented on issue #1535: URL: https://github.com/apache/arrow-datafusion/issues/1535#issuecomment-1008499198 It will be done after other pull requests about decimal merged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

[GitHub] [arrow] guyuqi commented on a change in pull request #12009: ARROW-15172: [Go] Add Arm64 Neon implementation for Arrow-math

2022-01-09 Thread GitBox
guyuqi commented on a change in pull request #12009: URL: https://github.com/apache/arrow/pull/12009#discussion_r780871634 ## File path: go/arrow/math/Makefile ## @@ -37,27 +41,42 @@ INTEL_SOURCES := \ int64_avx2_amd64.s int64_sse4_amd64.s \ uint64_avx2_amd64.s

[GitHub] [arrow] guyuqi commented on a change in pull request #12009: ARROW-15172: [Go] Add Arm64 Neon implementation for Arrow-math

2022-01-09 Thread GitBox
guyuqi commented on a change in pull request #12009: URL: https://github.com/apache/arrow/pull/12009#discussion_r780871634 ## File path: go/arrow/math/Makefile ## @@ -37,27 +41,42 @@ INTEL_SOURCES := \ int64_avx2_amd64.s int64_sse4_amd64.s \ uint64_avx2_amd64.s

[GitHub] [arrow] vibhatha opened a new pull request #12110: Arrow 15212: [C++] Handle suffix argument in joins

2022-01-09 Thread GitBox
vibhatha opened a new pull request #12110: URL: https://github.com/apache/arrow/pull/12110 In this PR, - [x ] Replaced the prefixes with suffixes - [x] Added a test case to check suffixes This change is made to enable the consistency with the join APIs in `dplyr` for R and

[GitHub] [arrow] github-actions[bot] commented on pull request #12110: Arrow 15212: [C++] Handle suffix argument in joins

2022-01-09 Thread GitBox
github-actions[bot] commented on pull request #12110: URL: https://github.com/apache/arrow/pull/12110#issuecomment-1008502841 Thanks for opening a pull request! If this is not a [minor PR](https://github.com/apache/arrow/blob/master/CONTRIBUTING.md#Minor-Fixes). Could you op

[GitHub] [arrow-datafusion] selvavm opened a new issue #1536: Not able to get the table from register_listing_table

2022-01-09 Thread GitBox
selvavm opened a new issue #1536: URL: https://github.com/apache/arrow-datafusion/issues/1536 **Describe the bug** When I want to generate a dataframe over **To Reproduce** let file_format = ParquetFormat::default().with_enable_pruning(true); let

[GitHub] [arrow] vibhatha commented on a change in pull request #12033: ARROW-15091: [C++][Doc] Document nodes in C++ streaming execution engine

2022-01-09 Thread GitBox
vibhatha commented on a change in pull request #12033: URL: https://github.com/apache/arrow/pull/12033#discussion_r780896887 ## File path: docs/source/cpp/streaming_execution.rst ## @@ -175,9 +175,607 @@ their completion:: // alive until this future is marked finished.

[GitHub] [arrow-datafusion] houqp commented on issue #1536: Not able to get the table from register_listing_table

2022-01-09 Thread GitBox
houqp commented on issue #1536: URL: https://github.com/apache/arrow-datafusion/issues/1536#issuecomment-1008552422 Did `register_listing_table` return an error for you? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

[GitHub] [arrow-datafusion] houqp commented on issue #1531: Ballista scheduler dead loop in loadtest

2022-01-09 Thread GitBox
houqp commented on issue #1531: URL: https://github.com/apache/arrow-datafusion/issues/1531#issuecomment-1008553539 Thanks @Ted-Jiang for the deepdive, if you look at the comment above the tokio spawn line you linked, it also mentioned we should only run a single loop. Perhaps we should f

[GitHub] [arrow-datafusion] houqp edited a comment on issue #1531: Ballista scheduler dead loop in loadtest

2022-01-09 Thread GitBox
houqp edited a comment on issue #1531: URL: https://github.com/apache/arrow-datafusion/issues/1531#issuecomment-1008553539 Thanks @Ted-Jiang for the deepdive, if you look at the comment above the tokio spawn line you linked, it also mentioned we should only run a single loop. Perhaps we s

[GitHub] [arrow-datafusion] selvavm commented on issue #1536: Not able to get the table from register_listing_table

2022-01-09 Thread GitBox
selvavm commented on issue #1536: URL: https://github.com/apache/arrow-datafusion/issues/1536#issuecomment-1008559282 @houqp No. `ctx.table` returns an error. My knowledge on Parquet is limited. Sorry for that. Below are the things I tried, - Changed the `uri` field in regis

[GitHub] [arrow-datafusion] selvavm edited a comment on issue #1536: Not able to get the table from register_listing_table

2022-01-09 Thread GitBox
selvavm edited a comment on issue #1536: URL: https://github.com/apache/arrow-datafusion/issues/1536#issuecomment-1008559282 @houqp No. `ctx.table` returns an error. My knowledge on Parquet is limited. Sorry for that. Below are the things I tried, - Changed the `uri` field i

[GitHub] [arrow-datafusion] selvavm edited a comment on issue #1536: Not able to get the table from register_listing_table

2022-01-09 Thread GitBox
selvavm edited a comment on issue #1536: URL: https://github.com/apache/arrow-datafusion/issues/1536#issuecomment-1008559282 @houqp No. `ctx.table` returns an error. My knowledge on Parquet is limited. Sorry for that. Below are the things I tried, - Changed the `uri` field i

[GitHub] [arrow-datafusion] Ted-Jiang commented on issue #1531: Ballista scheduler dead loop in loadtest

2022-01-09 Thread GitBox
Ted-Jiang commented on issue #1531: URL: https://github.com/apache/arrow-datafusion/issues/1531#issuecomment-1008574403 > Thanks @Ted-Jiang for the deepdive, if you look at the comment above the tokio spawn line you linked, it also mentioned we should only run a single loop. Perhaps we sh

[GitHub] [arrow-datafusion] xudong963 commented on pull request #1489: add dependbot

2022-01-09 Thread GitBox
xudong963 commented on pull request #1489: URL: https://github.com/apache/arrow-datafusion/pull/1489#issuecomment-1008605323 Unfortunately, it didn't work -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[GitHub] [arrow-datafusion] liukun4515 commented on issue #700: Improve performance polling / task sharing mechanism in Ballista

2022-01-09 Thread GitBox
liukun4515 commented on issue #700: URL: https://github.com/apache/arrow-datafusion/issues/700#issuecomment-1008607240 @mingmwang can update the status? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[GitHub] [arrow-datafusion] Ted-Jiang opened a new pull request #1537: Make call SchedulerServer::new once in ballista-scheduler process

2022-01-09 Thread GitBox
Ted-Jiang opened a new pull request #1537: URL: https://github.com/apache/arrow-datafusion/pull/1537 # Which issue does this PR close? Closes #1531 . avoid `tokio::spawn(async move { state_clone.synchronize_job_status_loop(). ` multi-times -- This is an automat

[GitHub] [arrow] colinbs opened a new pull request #12111: c_glib/README.md: fixes wrong build directory

2022-01-09 Thread GitBox
colinbs opened a new pull request #12111: URL: https://github.com/apache/arrow/pull/12111 The "How to build by users"->"Others" section in `c_glib/README.md` states the wrong directory when installing. -- This is an automated message from the Apache Git Service. To respond to the message

  1   2   >