[GitHub] [arrow] arthursunbao commented on issue #10885: Does arrow intends to support IDL schema like protobuf?

2021-08-05 Thread GitBox
arthursunbao commented on issue #10885: URL: https://github.com/apache/arrow/issues/10885#issuecomment-894037170 Hi westonpace, Thanks for your quick response. Our scenario is like this: We have a recommendation system and we want to transfer the user data from kafka

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #808: (WIP) Rework GroupByHash to for faster performance and support grouping by nulls

2021-08-05 Thread GitBox
Dandandan commented on a change in pull request #808: URL: https://github.com/apache/arrow-datafusion/pull/808#discussion_r683978762 ## File path: datafusion/src/physical_plan/hash_aggregate.rs ## @@ -363,55 +348,74 @@ fn group_aggregate_batch( let mut group_by_values =

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #808: (WIP) Rework GroupByHash to for faster performance and support grouping by nulls

2021-08-05 Thread GitBox
Dandandan commented on a change in pull request #808: URL: https://github.com/apache/arrow-datafusion/pull/808#discussion_r683978762 ## File path: datafusion/src/physical_plan/hash_aggregate.rs ## @@ -363,55 +348,74 @@ fn group_aggregate_batch( let mut group_by_values =

[GitHub] [arrow-datafusion] yjshen edited a comment on issue #824: A global, shared `ExecutionContext`

2021-08-05 Thread GitBox
yjshen edited a comment on issue #824: URL: https://github.com/apache/arrow-datafusion/issues/824#issuecomment-894031550 If a singleton `context` is not the preferred way, is it possible to control total memory usage for DataFusion across all physical operators? - Should we extend

[GitHub] [arrow-datafusion] yjshen commented on issue #824: A global, shared `ExecutionContext`

2021-08-05 Thread GitBox
yjshen commented on issue #824: URL: https://github.com/apache/arrow-datafusion/issues/824#issuecomment-894031550 If a singleton `context` is not the preferred way, is it possible to control total memory usage for DataFusion across all physical operators? - Should we extend `Execut

[GitHub] [arrow] JaguarPaw2409 commented on pull request #10450: ARROW-9947: [Python] High-level Python API for Parquet encryption of files.

2021-08-05 Thread GitBox
JaguarPaw2409 commented on pull request #10450: URL: https://github.com/apache/arrow/pull/10450#issuecomment-894030465 Is there any time frame to merge this request? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #808: (WIP) Rework GroupByHash to for faster performance and support grouping by nulls

2021-08-05 Thread GitBox
Dandandan commented on a change in pull request #808: URL: https://github.com/apache/arrow-datafusion/pull/808#discussion_r683975473 ## File path: datafusion/src/physical_plan/hash_aggregate.rs ## @@ -363,55 +348,74 @@ fn group_aggregate_batch( let mut group_by_values =

[GitHub] [arrow-datafusion] houqp commented on issue #825: Add documentation for support for skipping Parquet row groups

2021-08-05 Thread GitBox
houqp commented on issue #825: URL: https://github.com/apache/arrow-datafusion/issues/825#issuecomment-894029329 seems like something that would be a good fit for design doc or user guide. -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [arrow-datafusion] Dandandan edited a comment on issue #790: Rework GroupByHash for faster performance and support grouping by nulls

2021-08-05 Thread GitBox
Dandandan edited a comment on issue #790: URL: https://github.com/apache/arrow-datafusion/issues/790#issuecomment-894021924 ![image](https://user-images.githubusercontent.com/163737/128462553-e5beabfe-1fd3-45e2-accc-4d7ec044d45c.png) Some profiling output of the gby_null_new branch.

[GitHub] [arrow-datafusion] Dandandan edited a comment on issue #790: Rework GroupByHash for faster performance and support grouping by nulls

2021-08-05 Thread GitBox
Dandandan edited a comment on issue #790: URL: https://github.com/apache/arrow-datafusion/issues/790#issuecomment-894016110 Some good news, this is quite like mostly an improvement on the (more challenging) db-benchmark aggregates. Master: ``` q1 took 37 ms q2 took 325 ms

[GitHub] [arrow-datafusion] Dandandan commented on issue #790: Rework GroupByHash for faster performance and support grouping by nulls

2021-08-05 Thread GitBox
Dandandan commented on issue #790: URL: https://github.com/apache/arrow-datafusion/issues/790#issuecomment-894021924 ![image](https://user-images.githubusercontent.com/163737/128462553-e5beabfe-1fd3-45e2-accc-4d7ec044d45c.png) Some profiling output of the gby_null_new branch.

[GitHub] [arrow-datafusion] houqp commented on issue #824: A global, shared `ExecutionContext`

2021-08-05 Thread GitBox
houqp commented on issue #824: URL: https://github.com/apache/arrow-datafusion/issues/824#issuecomment-894021340 @yjshen based on the sample code in your io source PR, it looks like you want the following API as a consumer of datafusion: ```rust let ctx = ExecutionContext::get();

[GitHub] [arrow-datafusion] Dandandan commented on issue #790: Rework GroupByHash for faster performance and support grouping by nulls

2021-08-05 Thread GitBox
Dandandan commented on issue #790: URL: https://github.com/apache/arrow-datafusion/issues/790#issuecomment-894016110 Some good news, this is quite like mostly an improvement on the (more challenging) db-benchmark aggregates. Master: ``` q1 took 37 ms q2 took 325 ms q3 to

[GitHub] [arrow-datafusion] Dandandan commented on issue #790: Rework GroupByHash for faster performance and support grouping by nulls

2021-08-05 Thread GitBox
Dandandan commented on issue #790: URL: https://github.com/apache/arrow-datafusion/issues/790#issuecomment-894008907 Cool. I am doing some tests with db-benchmark, which includes some more challenging queries. -- This is an automated message from the Apache Git Service. To respond to th

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #827: Use `RawTable` API in hash join

2021-08-05 Thread GitBox
Dandandan commented on a change in pull request #827: URL: https://github.com/apache/arrow-datafusion/pull/827#discussion_r683951579 ## File path: datafusion/src/physical_plan/hash_join.rs ## @@ -476,18 +483,14 @@ fn update_hash( // insert hashes to key of the hashmap

[GitHub] [arrow-datafusion] andygrove opened a new pull request #831: Add minimal crate documentation for Ballista crates

2021-08-05 Thread GitBox
andygrove opened a new pull request #831: URL: https://github.com/apache/arrow-datafusion/pull/831 # Which issue does this PR close? Closes #830. # Rationale for this change Adds crate documentation that will appear in crates.io when the crates are publishe

[GitHub] [arrow-datafusion] andygrove opened a new issue #830: Add crate documentation for Ballista crates

2021-08-05 Thread GitBox
andygrove opened a new issue #830: URL: https://github.com/apache/arrow-datafusion/issues/830 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** There is currently no crate documentation for the Ballista crates. **Describe the

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #808: (WIP) Rework GroupByHash to for faster performance and support grouping by nulls

2021-08-05 Thread GitBox
Dandandan commented on a change in pull request #808: URL: https://github.com/apache/arrow-datafusion/pull/808#discussion_r683939523 ## File path: datafusion/src/physical_plan/hash_aggregate.rs ## @@ -363,55 +348,74 @@ fn group_aggregate_batch( let mut group_by_values =

[GitHub] [arrow] cyb70289 commented on a change in pull request #10890: ARROW-13575: [C++] Add hash_product kernel

2021-08-05 Thread GitBox
cyb70289 commented on a change in pull request #10890: URL: https://github.com/apache/arrow/pull/10890#discussion_r683931261 ## File path: cpp/src/arrow/compute/kernels/hash_aggregate.cc ## @@ -1011,6 +1011,108 @@ struct GroupedSumFactory { InputType argument_type; }; +//

[GitHub] [arrow-datafusion] andygrove opened a new pull request #829: Add ballista-examples to docker build

2021-08-05 Thread GitBox
andygrove opened a new pull request #829: URL: https://github.com/apache/arrow-datafusion/pull/829 # Which issue does this PR close? Closes #828. # Rationale for this change Fixes docker build and the integration tests. # What changes are included in

[GitHub] [arrow-datafusion] andygrove opened a new issue #828: Ballista docker images fail to build

2021-08-05 Thread GitBox
andygrove opened a new issue #828: URL: https://github.com/apache/arrow-datafusion/issues/828 **Describe the bug** I ran `./dev/integration-tests.sh` with latest from master just now and the docker build phase failed with: ``` Step 20/52 : RUN cargo chef cook $RELEASE_FLAG --re

[GitHub] [arrow] rok commented on pull request #10647: ARROW-13174: [C++][Compute] Add strftime kernel

2021-08-05 Thread GitBox
rok commented on pull request #10647: URL: https://github.com/apache/arrow/pull/10647#issuecomment-893973246 @pitrou I refactored to have two kernel generators now - `MakeTemporal` and `MakeSimpleUnaryTemporal`. Could you please review if this is dry enough now? -- This is an automated m

[GitHub] [arrow] aocsa commented on a change in pull request #10802: ARROW-1568: [C++] Implement Drop Null Kernel for Arrays

2021-08-05 Thread GitBox
aocsa commented on a change in pull request #10802: URL: https://github.com/apache/arrow/pull/10802#discussion_r683908465 ## File path: cpp/src/arrow/compute/kernels/vector_selection.cc ## @@ -2146,6 +2146,184 @@ class TakeMetaFunction : public MetaFunction { } }; +// ---

[GitHub] [arrow] nealrichardson commented on pull request #10870: ARROW-12540: [C++] Implementing casting support from date32/date64 to uft8/large_utf8

2021-08-05 Thread GitBox
nealrichardson commented on pull request #10870: URL: https://github.com/apache/arrow/pull/10870#issuecomment-893925233 Thanks for doing this! 🙏 In the future, it can be a good idea to grep the source for the Jira issue number to see if there are any TODOs or skipped tests related to the i

[GitHub] [arrow-cookbook] westonpace opened a new pull request #15: Prevent force pushes to main

2021-08-05 Thread GitBox
westonpace opened a new pull request #15: URL: https://github.com/apache/arrow-cookbook/pull/15 Seems like a good idea -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[GitHub] [arrow-cookbook] westonpace opened a new pull request #14: Adding thisisnic and amol- as collaborators

2021-08-05 Thread GitBox
westonpace opened a new pull request #14: URL: https://github.com/apache/arrow-cookbook/pull/14 Thsi gives `thisisnic` and `amol-` the ability to "assign, edit, and close issues and pull requests". Given their help so far I think this should be pretty straightforward. -- This is an aut

[GitHub] [arrow-cookbook] westonpace commented on issue #9: gh-pages or Apache hosting?

2021-08-05 Thread GitBox
westonpace commented on issue #9: URL: https://github.com/apache/arrow-cookbook/issues/9#issuecomment-893873553 It should be pretty quick (I hope) to fix whatever we break. I guess I don't know what the refresh rate is for the site publishing. I'll ask on Zulip. -- This is an automated

[GitHub] [arrow-cookbook] westonpace commented on issue #9: gh-pages or Apache hosting?

2021-08-05 Thread GitBox
westonpace commented on issue #9: URL: https://github.com/apache/arrow-cookbook/issues/9#issuecomment-893873312 ![image](https://user-images.githubusercontent.com/1696093/128431859-fe3ceae2-ddc0-43e2-b7c4-67a24b977a87.png) -- This is an automated message from the Apache Git Service.

[GitHub] [arrow-cookbook] westonpace opened a new issue #13: Change builds to use prebuilt binaries

2021-08-05 Thread GitBox
westonpace opened a new issue #13: URL: https://github.com/apache/arrow-cookbook/issues/13 Right now the CI build is pretty slow and a fair amount of time is spent downloading / installing build dependencies and building Arrow. We should be able to use prebuilt binaries for this. T

[GitHub] [arrow-cookbook] thisisnic commented on issue #12: Should recipes use / demonstrate deprecated methods?

2021-08-05 Thread GitBox
thisisnic commented on issue #12: URL: https://github.com/apache/arrow-cookbook/issues/12#issuecomment-893870552 My initial thoughts are that we should remove them otherwise it could send out a confusing message to readers. I suppose as well, this is intrinsically linked to the question o

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #808: (WIP) Rework GroupByHash to for faster performance and support grouping by nulls

2021-08-05 Thread GitBox
alamb commented on a change in pull request #808: URL: https://github.com/apache/arrow-datafusion/pull/808#discussion_r683834491 ## File path: datafusion/src/physical_plan/hash_aggregate.rs ## @@ -363,55 +348,74 @@ fn group_aggregate_batch( let mut group_by_values = grou

[GitHub] [arrow-cookbook] westonpace opened a new issue #12: [DISCUSS] Should recipes use / demonstrate deprecated methods?

2021-08-05 Thread GitBox
westonpace opened a new issue #12: URL: https://github.com/apache/arrow-cookbook/issues/12 Related: Should we leave recipes around that are no longer valid at all on the latest release of Arrow (e.g. the feature has been since removed, presumably after being deprecated for some amount of t

[GitHub] [arrow-cookbook] westonpace closed issue #11: [DISCUSS] Handling Arrow Versioning

2021-08-05 Thread GitBox
westonpace closed issue #11: URL: https://github.com/apache/arrow-cookbook/issues/11 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubs

[GitHub] [arrow-cookbook] westonpace opened a new issue #11: [DISCUSS] Handling Arrow Versioning

2021-08-05 Thread GitBox
westonpace opened a new issue #11: URL: https://github.com/apache/arrow-cookbook/issues/11 I think there are a number of questions around Arrow versioning: 1. Should recipes be based on the latest released version of the implementation? Or should they be based on the nightly build o

[GitHub] [arrow-cookbook] westonpace commented on issue #9: gh-pages or Apache hosting?

2021-08-05 Thread GitBox
westonpace commented on issue #9: URL: https://github.com/apache/arrow-cookbook/issues/9#issuecomment-893859299 Well, the docs for `.asf.yaml` (https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features#Git.asf.yamlfeatures-Specifyingasub-directorytopublishto) say you can u

[GitHub] [arrow-cookbook] thisisnic commented on issue #9: gh-pages or Apache hosting?

2021-08-05 Thread GitBox
thisisnic commented on issue #9: URL: https://github.com/apache/arrow-cookbook/issues/9#issuecomment-893858271 That second URL certainly looks nicer, what kind of synchronization might be needed? -- This is an automated message from the Apache Git Service. To respond to the message, plea

[GitHub] [arrow-cookbook] thisisnic opened a new issue #10: Allow cookbook to pair build version with release version

2021-08-05 Thread GitBox
thisisnic opened a new issue #10: URL: https://github.com/apache/arrow-cookbook/issues/10 As more features are added to Apache Arrow, we might want to build versions of the cookbook that are relevant to that release. I'm not sure what a good strategy would be in terms of adding conte

[GitHub] [arrow-cookbook] westonpace opened a new issue #9: gh-pages or Apache hosting?

2021-08-05 Thread GitBox
westonpace opened a new issue #9: URL: https://github.com/apache/arrow-cookbook/issues/9 It appears that in addition to gh-pages we can use Apache hosting. The only real difference would be the URLs. https://apache.github.io/arrow-cookbook https://arrow.apache.org/cookbook

[GitHub] [arrow-cookbook] wesm commented on issue #8: [DISCUSS] Do we want to feed github notifications to zulip and/or a mailing list?

2021-08-05 Thread GitBox
wesm commented on issue #8: URL: https://github.com/apache/arrow-cookbook/issues/8#issuecomment-893844190 The main stream of e-mail notifications about commits or issues should match up with what apache/arrow does Regarding Zulip, I think you're referring to http://ursalabs.zulipchat

[GitHub] [arrow-cookbook] westonpace opened a new issue #8: [DISCUSS] Do we want to feed github notifications to zulip and/or a mailing list?

2021-08-05 Thread GitBox
westonpace opened a new issue #8: URL: https://github.com/apache/arrow-cookbook/issues/8 It appears for the ML it is a mere matter of updating `.asf.yaml`: ``` notifications: commits: comm...@foo.apache.org issues: iss...@foo.apache.org pullrequests: d...@

[GitHub] [arrow-cookbook] westonpace merged pull request #7: remove duplicated sections and add ASF license in .asf.yaml

2021-08-05 Thread GitBox
westonpace merged pull request #7: URL: https://github.com/apache/arrow-cookbook/pull/7 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-uns

[GitHub] [arrow] kkraus14 commented on a change in pull request #10856: [RFC] Arrow Compute Serialized Intermediate Representation draft for discussion

2021-08-05 Thread GitBox
kkraus14 commented on a change in pull request #10856: URL: https://github.com/apache/arrow/pull/10856#discussion_r683802355 ## File path: format/ComputeIR.fbs ## @@ -0,0 +1,521 @@ +/// Licensed to the Apache Software Foundation (ASF) under one +/// or more contributor license

[GitHub] [arrow] lidavidm closed pull request #10880: ARROW-13509: [C++] Take kernel with empty inputs

2021-08-05 Thread GitBox
lidavidm closed pull request #10880: URL: https://github.com/apache/arrow/pull/10880 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubs

[GitHub] [arrow] lidavidm closed pull request #10870: ARROW-12540: [C++] Implementing casting support from date32/date64 to uft8/large_utf8

2021-08-05 Thread GitBox
lidavidm closed pull request #10870: URL: https://github.com/apache/arrow/pull/10870 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubs

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #827: Use `RawTable` API in hash join

2021-08-05 Thread GitBox
Dandandan commented on a change in pull request #827: URL: https://github.com/apache/arrow-datafusion/pull/827#discussion_r683788036 ## File path: datafusion/src/physical_plan/hash_join.rs ## @@ -78,7 +81,14 @@ use log::debug; // but the values don't match. Those are checked i

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #827: Use `RawTable` API in hash join

2021-08-05 Thread GitBox
Dandandan commented on a change in pull request #827: URL: https://github.com/apache/arrow-datafusion/pull/827#discussion_r683787741 ## File path: datafusion/src/physical_plan/hash_join.rs ## @@ -476,18 +483,14 @@ fn update_hash( // insert hashes to key of the hashmap

[GitHub] [arrow-datafusion] alamb commented on pull request #827: Use `RawTable` API in hash join

2021-08-05 Thread GitBox
alamb commented on pull request #827: URL: https://github.com/apache/arrow-datafusion/pull/827#issuecomment-893795473 I will review this carefully tomorrow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #808: (WIP) Rework GroupByHash to for faster performance and support grouping by nulls

2021-08-05 Thread GitBox
alamb commented on a change in pull request #808: URL: https://github.com/apache/arrow-datafusion/pull/808#discussion_r683777504 ## File path: datafusion/src/physical_plan/hash_aggregate.rs ## @@ -363,55 +348,74 @@ fn group_aggregate_batch( let mut group_by_values = grou

[GitHub] [arrow] kou commented on pull request #10033: ARROW-12388: [C++][Gandiva] Implement cast numbers from varbinary functions in gandiva

2021-08-05 Thread GitBox
kou commented on pull request #10033: URL: https://github.com/apache/arrow/pull/10033#issuecomment-893779610 Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

[GitHub] [arrow-datafusion] alamb commented on issue #790: Rework GroupByHash for faster performance and support grouping by nulls

2021-08-05 Thread GitBox
alamb commented on issue #790: URL: https://github.com/apache/arrow-datafusion/issues/790#issuecomment-893778745 I got enough of the approach described by @Dandandan in https://github.com/apache/arrow-datafusion/issues/790#issuecomment-893232614 working in https://github.com/apache/arrow-

[GitHub] [arrow] kou commented on pull request #10865: ARROW-3699: [C++] Dockerfile for testing 32-bit C++ build

2021-08-05 Thread GitBox
kou commented on pull request #10865: URL: https://github.com/apache/arrow/pull/10865#issuecomment-89393 I don't think that we need more i386 builds. We can detect general i386 related build/test failures by the added Debian task. -- This is an automated message from the Apache Gi

[GitHub] [arrow] github-actions[bot] commented on pull request #10889: [WIP] Try LTO again

2021-08-05 Thread GitBox
github-actions[bot] commented on pull request #10889: URL: https://github.com/apache/arrow/pull/10889#issuecomment-893774630 Revision: 115afdc3c779847c2aed239c310001be94be35e2 Submitted crossbow builds: [ursacomputing/crossbow @ actions-726](https://github.com/ursacomputing/crossbow/

[GitHub] [arrow] jonkeane commented on pull request #10889: [WIP] Try LTO again

2021-08-05 Thread GitBox
jonkeane commented on pull request #10889: URL: https://github.com/apache/arrow/pull/10889#issuecomment-893774131 @github-actions crossbow submit test-r-rhub-debian-gcc-devel-lto-latest Still not totally general, but another try. -- This is an automated message from the Apache Git

[GitHub] [arrow] lidavidm commented on a change in pull request #10877: ARROW-13508: [C++] Support custom retry strategies in S3Options

2021-08-05 Thread GitBox
lidavidm commented on a change in pull request #10877: URL: https://github.com/apache/arrow/pull/10877#discussion_r683758563 ## File path: cpp/src/arrow/filesystem/s3fs.cc ## @@ -483,6 +484,29 @@ std::string FormatRange(int64_t start, int64_t length) { return ss.str(); }

[GitHub] [arrow] lidavidm commented on a change in pull request #10877: ARROW-13508: [C++] Support custom retry strategies in S3Options

2021-08-05 Thread GitBox
lidavidm commented on a change in pull request #10877: URL: https://github.com/apache/arrow/pull/10877#discussion_r683758372 ## File path: cpp/src/arrow/filesystem/s3fs.cc ## @@ -483,6 +484,29 @@ std::string FormatRange(int64_t start, int64_t length) { return ss.str(); }

[GitHub] [arrow] lidavidm commented on a change in pull request #10877: ARROW-13508: [C++] Support custom retry strategies in S3Options

2021-08-05 Thread GitBox
lidavidm commented on a change in pull request #10877: URL: https://github.com/apache/arrow/pull/10877#discussion_r683756665 ## File path: cpp/src/arrow/filesystem/s3fs.h ## @@ -69,6 +69,13 @@ enum class S3CredentialsKind : int8_t { WebIdentity }; +/// Pure virtual class

[GitHub] [arrow] lidavidm commented on a change in pull request #10877: ARROW-13508: [C++] Support custom retry strategies in S3Options

2021-08-05 Thread GitBox
lidavidm commented on a change in pull request #10877: URL: https://github.com/apache/arrow/pull/10877#discussion_r683755534 ## File path: cpp/src/arrow/filesystem/s3fs.h ## @@ -69,6 +69,22 @@ enum class S3CredentialsKind : int8_t { WebIdentity }; +/// Pure virtual class

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #808: (WIP) Rework GroupByHash to for faster performance and support grouping by nulls

2021-08-05 Thread GitBox
Dandandan commented on a change in pull request #808: URL: https://github.com/apache/arrow-datafusion/pull/808#discussion_r683754338 ## File path: datafusion/src/physical_plan/hash_aggregate.rs ## @@ -363,55 +348,74 @@ fn group_aggregate_batch( let mut group_by_values =

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #808: (WIP) Rework GroupByHash to for faster performance and support grouping by nulls

2021-08-05 Thread GitBox
Dandandan commented on a change in pull request #808: URL: https://github.com/apache/arrow-datafusion/pull/808#discussion_r683752478 ## File path: datafusion/src/physical_plan/hash_aggregate.rs ## @@ -363,55 +348,74 @@ fn group_aggregate_batch( let mut group_by_values =

[GitHub] [arrow-datafusion] kszucs commented on pull request #524: Expose ExecutionContext.register_csv to the python bindings

2021-08-05 Thread GitBox
kszucs commented on pull request #524: URL: https://github.com/apache/arrow-datafusion/pull/524#issuecomment-893760912 @alamb this should be good to go, though we should revisit the FFI bindings in arrow-rs and a potential `arrow-rs <-> pyarrow` bridge implemented in arrow-rs in the future

[GitHub] [arrow] github-actions[bot] commented on pull request #10841: ARROW-13511: [CI][R] Fail in the docker build step if R deps don't install

2021-08-05 Thread GitBox
github-actions[bot] commented on pull request #10841: URL: https://github.com/apache/arrow/pull/10841#issuecomment-893760223 Revision: 69ff20b2afba02b7ddfde316234a61403771959a Submitted crossbow builds: [ursacomputing/crossbow @ actions-725](https://github.com/ursacomputing/crossbow/

[GitHub] [arrow] jonkeane commented on pull request #10841: ARROW-13511: [CI][R] Fail in the docker build step if R deps don't install

2021-08-05 Thread GitBox
jonkeane commented on pull request #10841: URL: https://github.com/apache/arrow/pull/10841#issuecomment-893759682 @github-actions crossbow submit -g r -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [arrow] github-actions[bot] commented on pull request #10889: [WIP] Try LTO again

2021-08-05 Thread GitBox
github-actions[bot] commented on pull request #10889: URL: https://github.com/apache/arrow/pull/10889#issuecomment-893756286 Revision: 6d4f79567f6bda7ec0e21e3ae772143448cb583e Submitted crossbow builds: [ursacomputing/crossbow @ actions-724](https://github.com/ursacomputing/crossbow/

[GitHub] [arrow] jonkeane commented on pull request #10889: [WIP] Try LTO again

2021-08-05 Thread GitBox
jonkeane commented on pull request #10889: URL: https://github.com/apache/arrow/pull/10889#issuecomment-893755892 @github-actions crossbow submit test-r-rhub-debian-gcc-devel-lto-latest This isn't a complete solution (i.e. it likely will not work on non-lto builds), but should be eno

[GitHub] [arrow] github-actions[bot] removed a comment on pull request #10889: [WIP] Try LTO again

2021-08-05 Thread GitBox
github-actions[bot] removed a comment on pull request #10889: URL: https://github.com/apache/arrow/pull/10889#issuecomment-893755475 Revision: e0396dc7b609f01d3f95ad875c58bcccf5e03181 Submitted crossbow builds: [ursacomputing/crossbow @ actions-723](https://github.com/ursacomputing/c

[GitHub] [arrow] github-actions[bot] commented on pull request #10889: [WIP] Try LTO again

2021-08-05 Thread GitBox
github-actions[bot] commented on pull request #10889: URL: https://github.com/apache/arrow/pull/10889#issuecomment-893755475 Revision: e0396dc7b609f01d3f95ad875c58bcccf5e03181 Submitted crossbow builds: [ursacomputing/crossbow @ actions-723](https://github.com/ursacomputing/crossbow/

[GitHub] [arrow] jonkeane removed a comment on pull request #10889: [WIP] Try LTO again

2021-08-05 Thread GitBox
jonkeane removed a comment on pull request #10889: URL: https://github.com/apache/arrow/pull/10889#issuecomment-893754857 @github-actions crossbow submit test-r-rhub-debian-gcc-devel-lto-latest This is not a complete solution (e.g. won't work well on non-LTO platforms), but should te

[GitHub] [arrow] jonkeane commented on pull request #10889: [WIP] Try LTO again

2021-08-05 Thread GitBox
jonkeane commented on pull request #10889: URL: https://github.com/apache/arrow/pull/10889#issuecomment-893754857 @github-actions crossbow submit test-r-rhub-debian-gcc-devel-lto-latest This is not a complete solution (e.g. won't work well on non-LTO platforms), but should test if we

[GitHub] [arrow] nealrichardson commented on pull request #10889: [WIP] Try LTO again

2021-08-05 Thread GitBox
nealrichardson commented on pull request #10889: URL: https://github.com/apache/arrow/pull/10889#issuecomment-893727566 Ok, that builds and links but segfaults on load. @jonkeane could you add back in here your cmake changes (I think you force-pushed them out of your previous branch) and t

[GitHub] [arrow] lidavidm commented on a change in pull request #10877: ARROW-13508: [C++] Support custom retry strategies in S3Options

2021-08-05 Thread GitBox
lidavidm commented on a change in pull request #10877: URL: https://github.com/apache/arrow/pull/10877#discussion_r683730535 ## File path: cpp/src/arrow/filesystem/s3fs.h ## @@ -69,6 +69,13 @@ enum class S3CredentialsKind : int8_t { WebIdentity }; +/// Pure virtual class

[GitHub] [arrow] neil-b commented on a change in pull request #10877: ARROW-13508: [C++] Support custom retry strategies in S3Options

2021-08-05 Thread GitBox
neil-b commented on a change in pull request #10877: URL: https://github.com/apache/arrow/pull/10877#discussion_r683729032 ## File path: cpp/src/arrow/filesystem/s3fs.h ## @@ -69,6 +69,13 @@ enum class S3CredentialsKind : int8_t { WebIdentity }; +/// Pure virtual class fo

[GitHub] [arrow] github-actions[bot] commented on pull request #10890: ARROW-13575: [C++] Add hash_product kernel

2021-08-05 Thread GitBox
github-actions[bot] commented on pull request #10890: URL: https://github.com/apache/arrow/pull/10890#issuecomment-893724407 https://issues.apache.org/jira/browse/ARROW-13575 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [arrow-datafusion] alamb commented on issue #790: Rework GroupByHash for faster performance and support grouping by nulls

2021-08-05 Thread GitBox
alamb commented on issue #790: URL: https://github.com/apache/arrow-datafusion/issues/790#issuecomment-893712779 @Dandandan I used your approach in https://github.com/apache/arrow-datafusion/pull/808/commits/9ad6719155932e69e211e3f90cf5e4beb0bdc0ad and it seems to have worked. I want to

[GitHub] [arrow] westonpace commented on issue #10885: Does arrow intends to support IDL schema like protobuf?

2021-08-05 Thread GitBox
westonpace commented on issue #10885: URL: https://github.com/apache/arrow/issues/10885#issuecomment-893706079 Can you expand a little bit on what problem you are trying to solve? There is a schema. It can be serialized to parquet and to the Arrow IPC format. It specifies the name and d

[GitHub] [arrow] github-actions[bot] commented on pull request #10889: [WIP] Try LTO again

2021-08-05 Thread GitBox
github-actions[bot] commented on pull request #10889: URL: https://github.com/apache/arrow/pull/10889#issuecomment-893703559 Revision: e0396dc7b609f01d3f95ad875c58bcccf5e03181 Submitted crossbow builds: [ursacomputing/crossbow @ actions-722](https://github.com/ursacomputing/crossbow/

[GitHub] [arrow] github-actions[bot] commented on pull request #10889: [WIP] Try LTO again

2021-08-05 Thread GitBox
github-actions[bot] commented on pull request #10889: URL: https://github.com/apache/arrow/pull/10889#issuecomment-893701879 Thanks for opening a pull request! If this is not a [minor PR](https://github.com/apache/arrow/blob/master/CONTRIBUTING.md#Minor-Fixes). Could you ope

[GitHub] [arrow] nealrichardson commented on pull request #10889: [WIP] Try LTO again

2021-08-05 Thread GitBox
nealrichardson commented on pull request #10889: URL: https://github.com/apache/arrow/pull/10889#issuecomment-893701978 @github-actions crossbow submit test-r-rhub-debian-gcc-devel-lto-latest -- This is an automated message from the Apache Git Service. To respond to the message, please lo

[GitHub] [arrow] westonpace closed pull request #10882: ARROW-13567: [C++] ConvertOptions::Defaults leaves `timestamp_parsers` uninitialized

2021-08-05 Thread GitBox
westonpace closed pull request #10882: URL: https://github.com/apache/arrow/pull/10882 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsu

[GitHub] [arrow] westonpace commented on pull request #10882: ARROW-13567: [C++] ConvertOptions::Defaults leaves `timestamp_parsers` uninitialized

2021-08-05 Thread GitBox
westonpace commented on pull request #10882: URL: https://github.com/apache/arrow/pull/10882#issuecomment-893696362 Ah, please forgive my foolishness. I'll reopen if I find the root cause but now I'm wondering if I was just testing with R code built against an older version of Arrow. --

[GitHub] [arrow-datafusion] jorgecarleitao commented on issue #824: A global, shared `ExecutionContext`

2021-08-05 Thread GitBox
jorgecarleitao commented on issue #824: URL: https://github.com/apache/arrow-datafusion/issues/824#issuecomment-893683156 Thanks for the suggestion. I do not think we should do this in `DataFusion`. These use-cases imo should addressed by consumers of DataFusion that decide how they

[GitHub] [arrow] nealrichardson commented on pull request #10710: ARROW-11460: [R] Use system libraries if present on Linux

2021-08-05 Thread GitBox
nealrichardson commented on pull request #10710: URL: https://github.com/apache/arrow/pull/10710#issuecomment-893681346 @jonkeane can you help make sure that this isn't breaking any builds? crossbow keeps returning failures, I haven't had the chance to examine why. This PR shouldn't be cha

[GitHub] [arrow] bkietz commented on a change in pull request #10431: ARROW-12921: [C++][Dataset] Add RadosParquetFileFormat to Dataset API

2021-08-05 Thread GitBox
bkietz commented on a change in pull request #10431: URL: https://github.com/apache/arrow/pull/10431#discussion_r683685887 ## File path: cpp/src/arrow/dataset/file_skyhook.h ## @@ -0,0 +1,275 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribu

[GitHub] [arrow] nealrichardson commented on a change in pull request #10888: ARROW-13560: [R] Allow Scanner$create() to accept filter / project even with arrow_dplyr_querys

2021-08-05 Thread GitBox
nealrichardson commented on a change in pull request #10888: URL: https://github.com/apache/arrow/pull/10888#discussion_r683683809 ## File path: r/R/dataset-scan.R ## @@ -85,10 +85,22 @@ Scanner$create <- function(dataset, # To handle mutate() on Table/RecordBatch, we ne

[GitHub] [arrow] nealrichardson commented on a change in pull request #10888: ARROW-13560: [R] Allow Scanner$create() to accept filter / project even with arrow_dplyr_querys

2021-08-05 Thread GitBox
nealrichardson commented on a change in pull request #10888: URL: https://github.com/apache/arrow/pull/10888#discussion_r683681911 ## File path: r/R/dataset-scan.R ## @@ -85,10 +85,22 @@ Scanner$create <- function(dataset, # To handle mutate() on Table/RecordBatch, we ne

[GitHub] [arrow] github-actions[bot] commented on pull request #10888: ARROW-13560: [R] Allow Scanner$create() to accept filter / project even with arrow_dplyr_querys

2021-08-05 Thread GitBox
github-actions[bot] commented on pull request #10888: URL: https://github.com/apache/arrow/pull/10888#issuecomment-893669493 https://issues.apache.org/jira/browse/ARROW-13560 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [arrow] nealrichardson closed pull request #10829: ARROW-13489: [R] Bump CI jobs after 5.0.0

2021-08-05 Thread GitBox
nealrichardson closed pull request #10829: URL: https://github.com/apache/arrow/pull/10829 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-

[GitHub] [arrow-datafusion] alamb commented on issue #790: Rework GroupByHash for faster performance and support grouping by nulls

2021-08-05 Thread GitBox
alamb commented on issue #790: URL: https://github.com/apache/arrow-datafusion/issues/790#issuecomment-893661708 > I put an example of the latest suggestion here: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=9157e7ae2ad4d84f8bd6c358c42722cb That is very co

[GitHub] [arrow] github-actions[bot] commented on pull request #10887: ARROW-13311: [C++][Documentation] Document hash aggregate kernels

2021-08-05 Thread GitBox
github-actions[bot] commented on pull request #10887: URL: https://github.com/apache/arrow/pull/10887#issuecomment-893659166 https://issues.apache.org/jira/browse/ARROW-13311 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [arrow] lidavidm commented on pull request #10870: ARROW-12540: [C++] Implementing casting support from date32/date64 to uft8/large_utf8

2021-08-05 Thread GitBox
lidavidm commented on pull request #10870: URL: https://github.com/apache/arrow/pull/10870#issuecomment-893658081 I think the failures here are flukes (GHA seems to be having a bit of trouble), let's see when the Travis build finishes. -- This is an automated message from the Apache Git

[GitHub] [arrow] lidavidm commented on pull request #10880: ARROW-13509: [C++] Take kernel with empty inputs

2021-08-05 Thread GitBox
lidavidm commented on pull request #10880: URL: https://github.com/apache/arrow/pull/10880#issuecomment-893657720 The AppVeyor failure looks like a flake (s3fs on windows being flaky). I kicked off the 'Dev PR' pipeline again though I expect that to also be a fluke. -- This is an automat

[GitHub] [arrow] lidavidm opened a new pull request #10887: ARROW-13311: [C++][Documentation] Document hash aggregate kernels

2021-08-05 Thread GitBox
lidavidm opened a new pull request #10887: URL: https://github.com/apache/arrow/pull/10887 This adds a section about hash aggregates. Also, adds Kleene logic to the hash any/all kernels. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [arrow] lidavidm commented on pull request #10887: ARROW-13311: [C++][Documentation] Document hash aggregate kernels

2021-08-05 Thread GitBox
lidavidm commented on pull request #10887: URL: https://github.com/apache/arrow/pull/10887#issuecomment-893657014 ![image](https://user-images.githubusercontent.com/327919/128396204-b5ff53db-5882-4e69-b20a-4363332bb64f.png) -- This is an automated message from the Apache Git Service.

[GitHub] [arrow-datafusion] alamb commented on pull request #719: Optimize min/max queries with table statistics

2021-08-05 Thread GitBox
alamb commented on pull request #719: URL: https://github.com/apache/arrow-datafusion/pull/719#issuecomment-893655863 I will plan to merge this PR once the CI tests pass -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #811: Source ext for remote files read

2021-08-05 Thread GitBox
alamb commented on a change in pull request #811: URL: https://github.com/apache/arrow-datafusion/pull/811#discussion_r683659016 ## File path: datafusion/src/execution/context.rs ## @@ -125,12 +127,26 @@ pub struct ExecutionContext { pub state: Arc>, } +lazy_static! {

[GitHub] [arrow-datafusion] alamb merged pull request #797: Better join order resolution logic

2021-08-05 Thread GitBox
alamb merged pull request #797: URL: https://github.com/apache/arrow-datafusion/pull/797 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-un

[GitHub] [arrow-datafusion] alamb commented on pull request #797: Better join order resolution logic

2021-08-05 Thread GitBox
alamb commented on pull request #797: URL: https://github.com/apache/arrow-datafusion/pull/797#issuecomment-893654846 Thanks again @seddonm1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] romgrk-comparative commented on issue #10803: Reading strings efficiently in C++

2021-08-05 Thread GitBox
romgrk-comparative commented on issue #10803: URL: https://github.com/apache/arrow/issues/10803#issuecomment-893651302 Alright, yes that was `DictionaryArray`, thanks for the precisions! -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [arrow] romgrk-comparative closed issue #10803: Reading strings efficiently in C++

2021-08-05 Thread GitBox
romgrk-comparative closed issue #10803: URL: https://github.com/apache/arrow/issues/10803 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-u

[GitHub] [arrow-datafusion] Dandandan opened a new pull request #827: Use `RawTable` API in hash join

2021-08-05 Thread GitBox
Dandandan opened a new pull request #827: URL: https://github.com/apache/arrow-datafusion/pull/827 # Which issue does this PR close? Closes #826 # Rationale for this change # What changes are included in this PR? # Are there any user-facing change

[GitHub] [arrow] aucahuasi commented on pull request #10880: ARROW-13509: [C++] Take kernel with empty inputs

2021-08-05 Thread GitBox
aucahuasi commented on pull request #10880: URL: https://github.com/apache/arrow/pull/10880#issuecomment-893634764 > LGTM. I left a suggestion for the tests since we have some helper functions to make those cases easier to write. I applied the suggestions, but let me know if I forgot

  1   2   3   4   >