[GitHub] [arrow-datafusion] xudong963 commented on issue #1544: Streaming support for DataFusion

2022-01-17 Thread GitBox
xudong963 commented on issue #1544: URL: https://github.com/apache/arrow-datafusion/issues/1544#issuecomment-1014255048 > anyone looking to assist or just review. If ready, please ping me! -- This is an automated message from the Apache Git Service. To respond to the message, plea

[GitHub] [arrow] ursabot edited a comment on pull request #12164: ARROW-14183: [C++] Improve select_k_unstable performance

2022-01-17 Thread GitBox
ursabot edited a comment on pull request #12164: URL: https://github.com/apache/arrow/pull/12164#issuecomment-1014212304 Benchmark runs are scheduled for baseline = 0c1fd88953585485b772dfd405bbc5b1b5417324 and contender = f549a15c6e7613afee3a1af07c00bcc6959f7690. Results will be available

[GitHub] [arrow-datafusion] Jimexist commented on issue #1544: Streaming support for DataFusion

2022-01-17 Thread GitBox
Jimexist commented on issue #1544: URL: https://github.com/apache/arrow-datafusion/issues/1544#issuecomment-1014296068 @hntd187 if you want you can transfer that to datafusion-contrib for better visibility. -- This is an automated message from the Apache Git Service. To respond to the m

[GitHub] [arrow-datafusion] gaojun2048 opened a new issue #1594: [ballista] Reduce the dependencies between datafusion function and ballista proto

2022-01-17 Thread GitBox
gaojun2048 opened a new issue #1594: URL: https://github.com/apache/arrow-datafusion/issues/1594 Now if we want to define our own udf/udaf function, we also have to modify the implementation of ballista. Because `ScalarFunction` and `AggregateFunction` are defined as enum in ballista's pro

[GitHub] [arrow] zhanglistar opened a new issue #12166: Why arrow::adapaters::orc::ORCFileReader does not provide statistics interface?

2022-01-17 Thread GitBox
zhanglistar opened a new issue #12166: URL: https://github.com/apache/arrow/issues/12166 Now, there is no statistic interface for orc file in arrow::adapaters::orc::ORCFileReader, why? Thanks in advance. -- This is an automated message from the Apache Git Service. To respond to the me

[GitHub] [arrow-rs] tustvold commented on issue #1108: Add native comparison kernel support for BinaryArray

2022-01-17 Thread GitBox
tustvold commented on issue #1108: URL: https://github.com/apache/arrow-rs/issues/1108#issuecomment-1014309178 Yes it should be largely just a case of adding `_binary` versions of the `_utf8` kernels in that module. Typically there are three variants of each operator: * One w

[GitHub] [arrow-rs] tustvold edited a comment on issue #1108: Add native comparison kernel support for BinaryArray

2022-01-17 Thread GitBox
tustvold edited a comment on issue #1108: URL: https://github.com/apache/arrow-rs/issues/1108#issuecomment-1014309178 Yes it should be largely just a case of adding `_binary` versions of the `_utf8` kernels in that module. Typically there are three variants of each operator:

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #12153: ARROW-15338: [Python] Add `pyarrow.orc.read_table` API

2022-01-17 Thread GitBox
jorisvandenbossche commented on a change in pull request #12153: URL: https://github.com/apache/arrow/pull/12153#discussion_r785757447 ## File path: python/pyarrow/orc.py ## @@ -175,3 +176,33 @@ def write_table(table, where): writer = ORCWriter(where) writer.write(tab

[GitHub] [arrow] AlenkaF opened a new pull request #12167: ARROW-15337: [Doc] New contributors guide updates

2022-01-17 Thread GitBox
AlenkaF opened a new pull request #12167: URL: https://github.com/apache/arrow/pull/12167 Adding last corrections to the New Contributors Guide before the new release. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

[GitHub] [arrow] github-actions[bot] commented on pull request #12167: ARROW-15337: [Doc] New contributors guide updates

2022-01-17 Thread GitBox
github-actions[bot] commented on pull request #12167: URL: https://github.com/apache/arrow/pull/12167#issuecomment-1014323636 https://issues.apache.org/jira/browse/ARROW-15337 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[GitHub] [arrow] ursabot edited a comment on pull request #12164: ARROW-14183: [C++] Improve select_k_unstable performance

2022-01-17 Thread GitBox
ursabot edited a comment on pull request #12164: URL: https://github.com/apache/arrow/pull/12164#issuecomment-1014212304 Benchmark runs are scheduled for baseline = 0c1fd88953585485b772dfd405bbc5b1b5417324 and contender = f549a15c6e7613afee3a1af07c00bcc6959f7690. Results will be available

[GitHub] [arrow-rs] HaoYang670 commented on issue #1108: Add native comparison kernel support for BinaryArray

2022-01-17 Thread GitBox
HaoYang670 commented on issue #1108: URL: https://github.com/apache/arrow-rs/issues/1108#issuecomment-1014329838 Thank you, Raphael! I'd like to try if no one else has been doing it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow] thisisnic closed pull request #12167: ARROW-15337: [Doc] New contributors guide updates

2022-01-17 Thread GitBox
thisisnic closed pull request #12167: URL: https://github.com/apache/arrow/pull/12167 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsub

[GitHub] [arrow] ursabot commented on pull request #12167: ARROW-15337: [Doc] New contributors guide updates

2022-01-17 Thread GitBox
ursabot commented on pull request #12167: URL: https://github.com/apache/arrow/pull/12167#issuecomment-1014339535 Benchmark runs are scheduled for baseline = 13b66b57b454d2b6c4ea35e3d19adbdd85b17810 and contender = 7e012736611cdda2b1d7082f41bb2e77eb16bbbd. 7e012736611cdda2b1d7082f41bb2e77

[GitHub] [arrow-datafusion] viirya opened a new pull request #1595: Fix null comparison for Parquet pruning predicate

2022-01-17 Thread GitBox
viirya opened a new pull request #1595: URL: https://github.com/apache/arrow-datafusion/pull/1595 # Which issue does this PR close? Closes #1591. # Rationale for this change # What changes are included in this PR? # Are there any user-facing chang

[GitHub] [arrow] ursabot edited a comment on pull request #12167: ARROW-15337: [Doc] New contributors guide updates

2022-01-17 Thread GitBox
ursabot edited a comment on pull request #12167: URL: https://github.com/apache/arrow/pull/12167#issuecomment-1014339535 Benchmark runs are scheduled for baseline = 13b66b57b454d2b6c4ea35e3d19adbdd85b17810 and contender = 7e012736611cdda2b1d7082f41bb2e77eb16bbbd. 7e012736611cdda2b1d7082f4

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #11882: ARROW-9843: [C++][Python] Implement Between ternary kernel and Python bindings

2022-01-17 Thread GitBox
jorisvandenbossche commented on a change in pull request #11882: URL: https://github.com/apache/arrow/pull/11882#discussion_r785833236 ## File path: python/pyarrow/_compute.pyx ## @@ -785,6 +785,30 @@ class ElementWiseAggregateOptions(_ElementWiseAggregateOptions): se

[GitHub] [arrow] ursabot edited a comment on pull request #12151: ARROW-15335: [Java] Fix setPosition call in UnionListReader for empty List

2022-01-17 Thread GitBox
ursabot edited a comment on pull request #12151: URL: https://github.com/apache/arrow/pull/12151#issuecomment-1014218781 Benchmark runs are scheduled for baseline = 0c1fd88953585485b772dfd405bbc5b1b5417324 and contender = 13b66b57b454d2b6c4ea35e3d19adbdd85b17810. 13b66b57b454d2b6c4ea35e3d

[GitHub] [arrow-datafusion] liukun4515 commented on issue #1273: Question: Is the Ballista project providing value to the overall DataFusion project?

2022-01-17 Thread GitBox
liukun4515 commented on issue #1273: URL: https://github.com/apache/arrow-datafusion/issues/1273#issuecomment-1014403293 > @realno @liukun4515 do you have any plans for ballista that you think are worth adding to the Q1 roadmap? @matthewmturner Sorry for my limited time, May

[GitHub] [arrow] thisisnic commented on a change in pull request #12154: ARROW-14821: [R] Implement bindings for lubridate's floor_date, ceiling_date, and round_date

2022-01-17 Thread GitBox
thisisnic commented on a change in pull request #12154: URL: https://github.com/apache/arrow/pull/12154#discussion_r785872219 ## File path: r/R/util.R ## @@ -209,3 +209,74 @@ handle_csv_read_error <- function(e, schema) { abort(e) } + + +parse_period_unit <- function(x) {

[GitHub] [arrow-rs] codecov-commenter edited a comment on pull request #1082: Optimized ByteArrayReader (#1040)

2022-01-17 Thread GitBox
codecov-commenter edited a comment on pull request #1082: URL: https://github.com/apache/arrow-rs/pull/1082#issuecomment-998923929 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1082?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_

[GitHub] [arrow] rok commented on pull request #12154: ARROW-14821: [R] Implement bindings for lubridate's floor_date, ceiling_date, and round_date

2022-01-17 Thread GitBox
rok commented on pull request #12154: URL: https://github.com/apache/arrow/pull/12154#issuecomment-1014440937 > I think I've taken this as far as I can right now? I had been hoping to write shims to support the `week_start` and `change_on_boundary` arguments but I'm not sure if that's poss

[GitHub] [arrow] ursabot edited a comment on pull request #12151: ARROW-15335: [Java] Fix setPosition call in UnionListReader for empty List

2022-01-17 Thread GitBox
ursabot edited a comment on pull request #12151: URL: https://github.com/apache/arrow/pull/12151#issuecomment-1014218781 Benchmark runs are scheduled for baseline = 0c1fd88953585485b772dfd405bbc5b1b5417324 and contender = 13b66b57b454d2b6c4ea35e3d19adbdd85b17810. 13b66b57b454d2b6c4ea35e3d

[GitHub] [arrow-datafusion] yjshen opened a new pull request #1596: Consolidate sort and external_sort, consolidate N-way merge sort

2022-01-17 Thread GitBox
yjshen opened a new pull request #1596: URL: https://github.com/apache/arrow-datafusion/pull/1596 # Which issue does this PR close? Closes #1571 and #1572. # Rationale for this change 1. `in_mem_sort` and `SortPreservingMergeStream` share similar goal. 2.

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1589: support from_slice for binary, string, and boolean array types

2022-01-17 Thread GitBox
alamb commented on a change in pull request #1589: URL: https://github.com/apache/arrow-datafusion/pull/1589#discussion_r785931197 ## File path: datafusion/src/from_slice.rs ## @@ -19,27 +19,119 @@ //! //! This file essentially exists to ease the transition onto arrow2 -use

[GitHub] [arrow-datafusion] yjshen commented on a change in pull request #1596: Consolidate sort and external_sort, consolidate N-way merge sort

2022-01-17 Thread GitBox
yjshen commented on a change in pull request #1596: URL: https://github.com/apache/arrow-datafusion/pull/1596#discussion_r785935850 ## File path: datafusion/src/physical_plan/sorts/external_sort.rs ## @@ -353,12 +386,79 @@ pub struct ExternalSortExec { input: Arc, ///

[GitHub] [arrow-datafusion] Jimexist commented on a change in pull request #1589: support from_slice for binary, string, and boolean array types

2022-01-17 Thread GitBox
Jimexist commented on a change in pull request #1589: URL: https://github.com/apache/arrow-datafusion/pull/1589#discussion_r785949640 ## File path: datafusion/src/from_slice.rs ## @@ -19,27 +19,119 @@ //! //! This file essentially exists to ease the transition onto arrow2 -

[GitHub] [arrow-datafusion] Jimexist commented on a change in pull request #1589: support from_slice for binary, string, and boolean array types

2022-01-17 Thread GitBox
Jimexist commented on a change in pull request #1589: URL: https://github.com/apache/arrow-datafusion/pull/1589#discussion_r785950014 ## File path: datafusion/src/from_slice.rs ## @@ -19,27 +19,119 @@ //! //! This file essentially exists to ease the transition onto arrow2 -

[GitHub] [arrow] pitrou commented on pull request #12143: ARROW-15324: [C++] Avoid crashing when HDFS file fails closing

2022-01-17 Thread GitBox
pitrou commented on pull request #12143: URL: https://github.com/apache/arrow/pull/12143#issuecomment-1014478996 Ping @kszucs :-) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[GitHub] [arrow-rs] Jimexist opened a new pull request #1188: add from_iter_values for binary array

2022-01-17 Thread GitBox
Jimexist opened a new pull request #1188: URL: https://github.com/apache/arrow-rs/pull/1188 # Which issue does this PR close? Closes #. # Rationale for this change # What changes are included in this PR? # Are there any user-facing change

[GitHub] [arrow-datafusion] Jimexist commented on a change in pull request #1589: support from_slice for binary, string, and boolean array types

2022-01-17 Thread GitBox
Jimexist commented on a change in pull request #1589: URL: https://github.com/apache/arrow-datafusion/pull/1589#discussion_r785977125 ## File path: datafusion/src/from_slice.rs ## @@ -19,27 +19,119 @@ //! //! This file essentially exists to ease the transition onto arrow2 -

[GitHub] [arrow] ursabot edited a comment on pull request #12167: ARROW-15337: [Doc] New contributors guide updates

2022-01-17 Thread GitBox
ursabot edited a comment on pull request #12167: URL: https://github.com/apache/arrow/pull/12167#issuecomment-1014339535 Benchmark runs are scheduled for baseline = 13b66b57b454d2b6c4ea35e3d19adbdd85b17810 and contender = 7e012736611cdda2b1d7082f41bb2e77eb16bbbd. 7e012736611cdda2b1d7082f4

[GitHub] [arrow] github-actions[bot] commented on pull request #12168: ARROW-15316 [R] Make a one-function pointer function

2022-01-17 Thread GitBox
github-actions[bot] commented on pull request #12168: URL: https://github.com/apache/arrow/pull/12168#issuecomment-1014490049 https://issues.apache.org/jira/browse/ARROW-15316 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[GitHub] [arrow-rs] codecov-commenter commented on pull request #1188: add from_iter_values for binary array

2022-01-17 Thread GitBox
codecov-commenter commented on pull request #1188: URL: https://github.com/apache/arrow-rs/pull/1188#issuecomment-1014500835 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1188?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=T

[GitHub] [arrow] kszucs closed pull request #12143: ARROW-15324: [C++] Avoid crashing when HDFS file fails closing

2022-01-17 Thread GitBox
kszucs closed pull request #12143: URL: https://github.com/apache/arrow/pull/12143 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[GitHub] [arrow] ursabot commented on pull request #12143: ARROW-15324: [C++] Avoid crashing when HDFS file fails closing

2022-01-17 Thread GitBox
ursabot commented on pull request #12143: URL: https://github.com/apache/arrow/pull/12143#issuecomment-1014516988 Benchmark runs are scheduled for baseline = 7e012736611cdda2b1d7082f41bb2e77eb16bbbd and contender = c4ef0486b16112813857e587dab84b3461b90542. c4ef0486b16112813857e587dab84b34

[GitHub] [arrow] bkmgit commented on a change in pull request #11882: ARROW-9843: [C++][Python] Implement Between ternary kernel and Python bindings

2022-01-17 Thread GitBox
bkmgit commented on a change in pull request #11882: URL: https://github.com/apache/arrow/pull/11882#discussion_r786000919 ## File path: python/pyarrow/_compute.pyx ## @@ -785,6 +785,30 @@ class ElementWiseAggregateOptions(_ElementWiseAggregateOptions): self._set_opti

[GitHub] [arrow] ursabot edited a comment on pull request #12143: ARROW-15324: [C++] Avoid crashing when HDFS file fails closing

2022-01-17 Thread GitBox
ursabot edited a comment on pull request #12143: URL: https://github.com/apache/arrow/pull/12143#issuecomment-1014516988 Benchmark runs are scheduled for baseline = 7e012736611cdda2b1d7082f41bb2e77eb16bbbd and contender = c4ef0486b16112813857e587dab84b3461b90542. c4ef0486b16112813857e587d

[GitHub] [arrow] lidavidm commented on pull request #11882: ARROW-9843: [C++][Python] Implement Between ternary kernel and Python bindings

2022-01-17 Thread GitBox
lidavidm commented on pull request #11882: URL: https://github.com/apache/arrow/pull/11882#issuecomment-1014555006 @bkmgit as Eduardo mentioned, would you like to write a short description for the PR? (It's the first post at top.) This gets used in the commit message. -- This is an autom

[GitHub] [arrow] lidavidm commented on a change in pull request #11982: ARROW-15313: [C++][Java][FlightRPC] Implement type info method to flight-sql

2022-01-17 Thread GitBox
lidavidm commented on a change in pull request #11982: URL: https://github.com/apache/arrow/pull/11982#discussion_r786014493 ## File path: format/FlightSql.proto ## @@ -867,6 +867,69 @@ enum SqlSupportsConvert { SQL_CONVERT_VARCHAR = 19; } +/* + * Represents a request to

[GitHub] [arrow-rs] HaoYang670 commented on issue #1108: Add native comparison kernel support for BinaryArray

2022-01-17 Thread GitBox
HaoYang670 commented on issue #1108: URL: https://github.com/apache/arrow-rs/issues/1108#issuecomment-1014561953 Following your advice, I will file 2 PRs to match this feature. The first PR will add support for fully qualified binary array. The second PR will support comparison for dynamic

[GitHub] [arrow] lidavidm commented on a change in pull request #12100: ARROW-15061: [C++] Add logging for kernel functions and exec plan nodes

2022-01-17 Thread GitBox
lidavidm commented on a change in pull request #12100: URL: https://github.com/apache/arrow/pull/12100#discussion_r786017483 ## File path: cpp/src/arrow/compute/exec/aggregate_node.cc ## @@ -165,7 +166,18 @@ class ScalarAggregateNode : public ExecNode { const char* kind_name

[GitHub] [arrow] lidavidm commented on a change in pull request #12100: ARROW-15061: [C++] Add logging for kernel functions and exec plan nodes

2022-01-17 Thread GitBox
lidavidm commented on a change in pull request #12100: URL: https://github.com/apache/arrow/pull/12100#discussion_r786021110 ## File path: cpp/src/arrow/compute/exec/sink_node.cc ## @@ -275,6 +302,11 @@ struct OrderBySinkNode final : public SinkNode { } void InputReceiv

[GitHub] [arrow] lidavidm commented on a change in pull request #12162: ARROW-15089: [C++][Compute] Implement kernel to lookup a MapArray item for a given key

2022-01-17 Thread GitBox
lidavidm commented on a change in pull request #12162: URL: https://github.com/apache/arrow/pull/12162#discussion_r786022104 ## File path: cpp/src/arrow/compute/kernels/scalar_nested.cc ## @@ -17,6 +17,7 @@ // Vector kernels involving nested types +#include Review commen

[GitHub] [arrow] bkmgit commented on pull request #11882: ARROW-9843: [C++][Python] Implement Between ternary kernel and Python bindings

2022-01-17 Thread GitBox
bkmgit commented on pull request #11882: URL: https://github.com/apache/arrow/pull/11882#issuecomment-1014568870 @lidavidm Hopefully ok now. Thanks for the explanation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [arrow-datafusion] alamb merged pull request #1562: Consolidate `batch_size` configuration in `ExecutionConfig`, `RuntimeConfig` and `PhysicalPlanConfig`

2022-01-17 Thread GitBox
alamb merged pull request #1562: URL: https://github.com/apache/arrow-datafusion/pull/1562 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-

[GitHub] [arrow-datafusion] alamb closed issue #1565: Consolidate various configurations options

2022-01-17 Thread GitBox
alamb closed issue #1565: URL: https://github.com/apache/arrow-datafusion/issues/1565 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsub

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1593: Add support show tables and show columns for ballista

2022-01-17 Thread GitBox
alamb commented on a change in pull request #1593: URL: https://github.com/apache/arrow-datafusion/pull/1593#discussion_r786025309 ## File path: ballista/rust/client/src/context.rs ## @@ -256,6 +288,14 @@ impl BallistaContext { ) }; +let is_show

[GitHub] [arrow-datafusion] alamb commented on pull request #1559: Remove call_ip in the SchedulerServer

2022-01-17 Thread GitBox
alamb commented on pull request #1559: URL: https://github.com/apache/arrow-datafusion/pull/1559#issuecomment-1014572980 I took the liberty of `git merge apache/master` on this branch to hopefully get a clean CI run and then will merge. Sorry for the delay @yahoNanJing -- This is an au

[GitHub] [arrow-rs] Jimexist opened a new pull request #1189: update nightly version for miri

2022-01-17 Thread GitBox
Jimexist opened a new pull request #1189: URL: https://github.com/apache/arrow-rs/pull/1189 # Which issue does this PR close? Closes #. # Rationale for this change # What changes are included in this PR? # Are there any user-facing change

[GitHub] [arrow-datafusion] Jimexist opened a new pull request #1597: update nightly version

2022-01-17 Thread GitBox
Jimexist opened a new pull request #1597: URL: https://github.com/apache/arrow-datafusion/pull/1597 # Which issue does this PR close? Closes #. # Rationale for this change # What changes are included in this PR? # Are there any user-facing changes

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1554: support mathematics operation for decimal data type

2022-01-17 Thread GitBox
alamb commented on a change in pull request #1554: URL: https://github.com/apache/arrow-datafusion/pull/1554#discussion_r786034055 ## File path: datafusion/src/physical_plan/coercion_rule/binary_rule.rs ## @@ -162,12 +162,141 @@ fn get_comparison_common_decimal_type( } }

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1582: remove update and merge from accumulator

2022-01-17 Thread GitBox
alamb commented on a change in pull request #1582: URL: https://github.com/apache/arrow-datafusion/pull/1582#discussion_r786044972 ## File path: datafusion/src/physical_plan/expressions/array_agg.rs ## @@ -137,6 +129,39 @@ impl Accumulator for ArrayAggAccumulator { }

[GitHub] [arrow-datafusion] alamb commented on issue #1591: Parquet pruning predicate is not handling null comparisons correctly

2022-01-17 Thread GitBox
alamb commented on issue #1591: URL: https://github.com/apache/arrow-datafusion/issues/1591#issuecomment-1014593828 It should be noted that this is an 'optimization' bug rather than a correctness bug -- in the sense that returning `false` means "don't filter the row group" and returning

[GitHub] [arrow] ursabot edited a comment on pull request #12167: ARROW-15337: [Doc] New contributors guide updates

2022-01-17 Thread GitBox
ursabot edited a comment on pull request #12167: URL: https://github.com/apache/arrow/pull/12167#issuecomment-1014339535 Benchmark runs are scheduled for baseline = 13b66b57b454d2b6c4ea35e3d19adbdd85b17810 and contender = 7e012736611cdda2b1d7082f41bb2e77eb16bbbd. 7e012736611cdda2b1d7082f4

[GitHub] [arrow] kszucs closed pull request #12165: ARROW-15334: [CI][GLib][Windows] Use Ruby 3.1

2022-01-17 Thread GitBox
kszucs closed pull request #12165: URL: https://github.com/apache/arrow/pull/12165 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[GitHub] [arrow-cookbook] lidavidm commented on a change in pull request #114: Fix hyperlink

2022-01-17 Thread GitBox
lidavidm commented on a change in pull request #114: URL: https://github.com/apache/arrow-cookbook/pull/114#discussion_r786057619 ## File path: r/content/arrays.Rmd ## @@ -185,4 +185,4 @@ Scalar, Array, and ChunkedArray objects. The returned object will be an Arrow o ### S

[GitHub] [arrow-datafusion] alamb commented on issue #924: Add a separate configuration setting for parallelism of scanning parquet files

2022-01-17 Thread GitBox
alamb commented on issue #924: URL: https://github.com/apache/arrow-datafusion/issues/924#issuecomment-1014606297 @yjshen started consolidating such config settings in https://github.com/apache/arrow-datafusion/pull/1562 -- This is an automated message from the Apache Git Service.

[GitHub] [arrow] ursabot commented on pull request #12165: ARROW-15334: [CI][GLib][Windows] Use Ruby 3.1

2022-01-17 Thread GitBox
ursabot commented on pull request #12165: URL: https://github.com/apache/arrow/pull/12165#issuecomment-1014606048 Benchmark runs are scheduled for baseline = c4ef0486b16112813857e587dab84b3461b90542 and contender = bbbe668ef54f7679f01a3d8b76cf23a365006e74. bbbe668ef54f7679f01a3d8b76cf23a3

[GitHub] [arrow-datafusion] Jimexist commented on a change in pull request #1582: remove update and merge from accumulator

2022-01-17 Thread GitBox
Jimexist commented on a change in pull request #1582: URL: https://github.com/apache/arrow-datafusion/pull/1582#discussion_r786060808 ## File path: datafusion/src/physical_plan/expressions/array_agg.rs ## @@ -137,6 +129,39 @@ impl Accumulator for ArrayAggAccumulator {

[GitHub] [arrow-datafusion] xudong963 commented on pull request #1566: fix: sql planner creates cross join instead of inner join from select predicates

2022-01-17 Thread GitBox
xudong963 commented on pull request #1566: URL: https://github.com/apache/arrow-datafusion/pull/1566#issuecomment-1014608623 add tests and fix the extreme case. cc @alamb @pjmore -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

[GitHub] [arrow-datafusion] Jimexist opened a new issue #1598: Follow up on the removal of `update` and `merge` in `Accumulator`

2022-01-17 Thread GitBox
Jimexist opened a new issue #1598: URL: https://github.com/apache/arrow-datafusion/issues/1598 I think it would be a good idea as a follow on PR to update these implementations to avoid the use of ScalarValue. I'll try find some time this week to do it (I need some coding time :) --

[GitHub] [arrow-datafusion] xudong963 commented on a change in pull request #1566: fix: sql planner creates cross join instead of inner join from select predicates

2022-01-17 Thread GitBox
xudong963 commented on a change in pull request #1566: URL: https://github.com/apache/arrow-datafusion/pull/1566#discussion_r786061330 ## File path: datafusion/src/sql/planner.rs ## @@ -3818,6 +3832,31 @@ mod tests { \n TableScan: public.person projection=None";

[GitHub] [arrow-datafusion] xudong963 commented on a change in pull request #1566: fix: sql planner creates cross join instead of inner join from select predicates

2022-01-17 Thread GitBox
xudong963 commented on a change in pull request #1566: URL: https://github.com/apache/arrow-datafusion/pull/1566#discussion_r786062776 ## File path: datafusion/src/logical_plan/plan.rs ## @@ -214,8 +220,14 @@ pub struct Extension { pub node: Arc, } +impl PartialEq for E

[GitHub] [arrow] pitrou commented on a change in pull request #11882: ARROW-9843: [C++][Python] Implement Between ternary kernel and Python bindings

2022-01-17 Thread GitBox
pitrou commented on a change in pull request #11882: URL: https://github.com/apache/arrow/pull/11882#discussion_r786042927 ## File path: cpp/src/arrow/compute/kernels/codegen_internal.h ## @@ -445,6 +445,29 @@ static void VisitTwoArrayValuesInline(const ArrayData& arr0, const

[GitHub] [arrow-datafusion] alamb commented on issue #1527: Error reading Parquet files after schema evolution

2022-01-17 Thread GitBox
alamb commented on issue #1527: URL: https://github.com/apache/arrow-datafusion/issues/1527#issuecomment-1014610661 Thanks for the report @capkurmagati -- I am not sure if your usecase ever worked (in which case it is a bug). Regardless, as @tustvold mentions, we basically have th

[GitHub] [arrow] kszucs commented on pull request #11182: ARROW-14034: [Java] Unexpected Allocator states created after allocating buffer whose AllocationManager has different size from the requested

2022-01-17 Thread GitBox
kszucs commented on pull request #11182: URL: https://github.com/apache/arrow/pull/11182#issuecomment-1014613271 ping @zhztheplayer @emkornfield -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow-datafusion] xudong963 commented on issue #1576: casting `Int64` to `Float64` unsuccessfully caused tpch8 to fail

2022-01-17 Thread GitBox
xudong963 commented on issue #1576: URL: https://github.com/apache/arrow-datafusion/issues/1576#issuecomment-1014614011 Thanks @alamb , I'll try it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] kszucs commented on pull request #10114: ARROW-12480: [Java][Dataset] FileSystemDataset: Support reading from a directory

2022-01-17 Thread GitBox
kszucs commented on pull request #10114: URL: https://github.com/apache/arrow/pull/10114#issuecomment-1014615916 Thanks everyone, merging! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [arrow] kszucs closed pull request #10114: ARROW-12480: [Java][Dataset] FileSystemDataset: Support reading from a directory

2022-01-17 Thread GitBox
kszucs closed pull request #10114: URL: https://github.com/apache/arrow/pull/10114 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[GitHub] [arrow-datafusion] alamb merged pull request #1589: support from_slice for binary, string, and boolean array types

2022-01-17 Thread GitBox
alamb merged pull request #1589: URL: https://github.com/apache/arrow-datafusion/pull/1589 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-

[GitHub] [arrow-datafusion] alamb commented on issue #162: TPC-H Query 8

2022-01-17 Thread GitBox
alamb commented on issue #162: URL: https://github.com/apache/arrow-datafusion/issues/162#issuecomment-1014618116 @xudong963 has filed https://github.com/apache/arrow-datafusion/issues/1576 to track this issue 👍 -- This is an automated message from the Apache Git Service. To respond t

[GitHub] [arrow-datafusion] alamb closed issue #1321: Sql query `LEFT JOIN WHERE right IS NULL` return unexpected result.

2022-01-17 Thread GitBox
alamb closed issue #1321: URL: https://github.com/apache/arrow-datafusion/issues/1321 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsub

[GitHub] [arrow-datafusion] alamb commented on issue #1321: Sql query `LEFT JOIN WHERE right IS NULL` return unexpected result.

2022-01-17 Thread GitBox
alamb commented on issue #1321: URL: https://github.com/apache/arrow-datafusion/issues/1321#issuecomment-1014620290 closing as a dupe of https://github.com/apache/arrow-datafusion/issues/1586 -- This is an automated message from the Apache Git Service. To respond to the message, please l

[GitHub] [arrow] ursabot edited a comment on pull request #12165: ARROW-15334: [CI][GLib][Windows] Use Ruby 3.1

2022-01-17 Thread GitBox
ursabot edited a comment on pull request #12165: URL: https://github.com/apache/arrow/pull/12165#issuecomment-1014606048 Benchmark runs are scheduled for baseline = c4ef0486b16112813857e587dab84b3461b90542 and contender = bbbe668ef54f7679f01a3d8b76cf23a365006e74. bbbe668ef54f7679f01a3d8b7

[GitHub] [arrow] ursabot commented on pull request #10114: ARROW-12480: [Java][Dataset] FileSystemDataset: Support reading from a directory

2022-01-17 Thread GitBox
ursabot commented on pull request #10114: URL: https://github.com/apache/arrow/pull/10114#issuecomment-1014623484 Benchmark runs are scheduled for baseline = bbbe668ef54f7679f01a3d8b76cf23a365006e74 and contender = e12a4545bdc5a8683c8dfdbb0468922d444c0500. e12a4545bdc5a8683c8dfdbb0468922d

[GitHub] [arrow] lidavidm commented on a change in pull request #12162: ARROW-15089: [C++][Compute] Implement kernel to lookup a MapArray item for a given key

2022-01-17 Thread GitBox
lidavidm commented on a change in pull request #12162: URL: https://github.com/apache/arrow/pull/12162#discussion_r786060176 ## File path: cpp/src/arrow/compute/api_scalar.cc ## @@ -573,6 +605,7 @@ void RegisterScalarOptions(FunctionRegistry* registry) { DCHECK_OK(registry->

[GitHub] [arrow] bkmgit commented on a change in pull request #11882: ARROW-9843: [C++][Python] Implement Between ternary kernel and Python bindings

2022-01-17 Thread GitBox
bkmgit commented on a change in pull request #11882: URL: https://github.com/apache/arrow/pull/11882#discussion_r786076345 ## File path: cpp/src/arrow/compute/kernels/scalar_compare.cc ## @@ -156,39 +212,52 @@ struct Maximum { } }; +// Check if timestamp timezones are com

[GitHub] [arrow-datafusion] hntd187 commented on issue #1544: Streaming support for DataFusion

2022-01-17 Thread GitBox
hntd187 commented on issue #1544: URL: https://github.com/apache/arrow-datafusion/issues/1544#issuecomment-1014625104 > @hntd187 if you want you can transfer that to datafusion-contrib for better visibility. Perfect I wasn’t sure what my permissions were I’ll do that right now. -

[GitHub] [arrow] pitrou commented on a change in pull request #11882: ARROW-9843: [C++][Python] Implement Between ternary kernel and Python bindings

2022-01-17 Thread GitBox
pitrou commented on a change in pull request #11882: URL: https://github.com/apache/arrow/pull/11882#discussion_r786076989 ## File path: cpp/src/arrow/compute/kernels/scalar_compare.cc ## @@ -156,39 +212,52 @@ struct Maximum { } }; +// Check if timestamp timezones are com

[GitHub] [arrow-datafusion] liukun4515 commented on a change in pull request #1554: support mathematics operation for decimal data type

2022-01-17 Thread GitBox
liukun4515 commented on a change in pull request #1554: URL: https://github.com/apache/arrow-datafusion/pull/1554#discussion_r786079635 ## File path: datafusion/src/physical_plan/coercion_rule/binary_rule.rs ## @@ -162,12 +162,141 @@ fn get_comparison_common_decimal_type(

[GitHub] [arrow-datafusion] liukun4515 commented on a change in pull request #1554: support mathematics operation for decimal data type

2022-01-17 Thread GitBox
liukun4515 commented on a change in pull request #1554: URL: https://github.com/apache/arrow-datafusion/pull/1554#discussion_r786079490 ## File path: datafusion/src/physical_plan/coercion_rule/binary_rule.rs ## @@ -162,12 +162,141 @@ fn get_comparison_common_decimal_type(

[GitHub] [arrow] bkmgit commented on a change in pull request #11882: ARROW-9843: [C++][Python] Implement Between ternary kernel and Python bindings

2022-01-17 Thread GitBox
bkmgit commented on a change in pull request #11882: URL: https://github.com/apache/arrow/pull/11882#discussion_r786079703 ## File path: cpp/src/arrow/compute/kernels/scalar_compare.cc ## @@ -772,6 +1107,16 @@ const FunctionDoc less_equal_doc{ ("A null on either side emits

[GitHub] [arrow] lidavidm commented on a change in pull request #12162: ARROW-15089: [C++][Compute] Implement kernel to lookup a MapArray item for a given key

2022-01-17 Thread GitBox
lidavidm commented on a change in pull request #12162: URL: https://github.com/apache/arrow/pull/12162#discussion_r786082875 ## File path: cpp/src/arrow/compute/kernels/scalar_nested.cc ## @@ -429,6 +430,97 @@ const FunctionDoc make_struct_doc{"Wrap Arrays into a StructArray",

[GitHub] [arrow-datafusion] hntd187 edited a comment on issue #1544: Streaming support for DataFusion

2022-01-17 Thread GitBox
hntd187 edited a comment on issue #1544: URL: https://github.com/apache/arrow-datafusion/issues/1544#issuecomment-1014625104 > @hntd187 if you want you can transfer that to datafusion-contrib for better visibility. Perfect I wasn’t sure what my permissions were I’ll do that right no

[GitHub] [arrow] lidavidm commented on a change in pull request #12162: ARROW-15089: [C++][Compute] Implement kernel to lookup a MapArray item for a given key

2022-01-17 Thread GitBox
lidavidm commented on a change in pull request #12162: URL: https://github.com/apache/arrow/pull/12162#discussion_r786084635 ## File path: cpp/src/arrow/compute/kernels/scalar_nested_test.cc ## @@ -225,6 +225,30 @@ TEST(TestScalarNested, StructField) { } } +TEST(TestScala

[GitHub] [arrow-datafusion] alamb commented on issue #587: Optionally Limit memory used by DataFusion plan

2022-01-17 Thread GitBox
alamb commented on issue #587: URL: https://github.com/apache/arrow-datafusion/issues/587#issuecomment-1014631605 I wrote up some thoughts about externalized joins on https://github.com/apache/arrow-datafusion/issues/1599 -- This is an automated message from the Apache Git Service. To r

[GitHub] [arrow] ursabot edited a comment on pull request #10114: ARROW-12480: [Java][Dataset] FileSystemDataset: Support reading from a directory

2022-01-17 Thread GitBox
ursabot edited a comment on pull request #10114: URL: https://github.com/apache/arrow/pull/10114#issuecomment-1014623484 Benchmark runs are scheduled for baseline = bbbe668ef54f7679f01a3d8b76cf23a365006e74 and contender = e12a4545bdc5a8683c8dfdbb0468922d444c0500. e12a4545bdc5a8683c8dfdbb0

[GitHub] [arrow-datafusion] alamb commented on issue #1599: Memory Limited Joins (Externalized / Spill)

2022-01-17 Thread GitBox
alamb commented on issue #1599: URL: https://github.com/apache/arrow-datafusion/issues/1599#issuecomment-1014635302 I would love to implement this algorithm in DataFusion: https://arxiv.org/abs/2010.00152 Sort-based grouping and aggregation Thanh Do, Goetz Graefe https://a

[GitHub] [arrow] dhruv9vats commented on a change in pull request #12162: ARROW-15089: [C++][Compute] Implement kernel to lookup a MapArray item for a given key

2022-01-17 Thread GitBox
dhruv9vats commented on a change in pull request #12162: URL: https://github.com/apache/arrow/pull/12162#discussion_r786089476 ## File path: cpp/src/arrow/compute/kernels/scalar_nested_test.cc ## @@ -225,6 +225,30 @@ TEST(TestScalarNested, StructField) { } } +TEST(TestSca

[GitHub] [arrow-datafusion] alamb edited a comment on issue #1599: Memory Limited Joins (Externalized / Spill)

2022-01-17 Thread GitBox
alamb edited a comment on issue #1599: URL: https://github.com/apache/arrow-datafusion/issues/1599#issuecomment-1014635302 I would love to implement this algorithm in DataFusion: https://arxiv.org/abs/2010.00152 Sort-based grouping and aggregation Thanh Do, Goetz Graefe

[GitHub] [arrow-datafusion] xudong963 commented on issue #1599: Memory Limited Joins (Externalized / Spill)

2022-01-17 Thread GitBox
xudong963 commented on issue #1599: URL: https://github.com/apache/arrow-datafusion/issues/1599#issuecomment-1014644855 > I would love to implement this algorithm in DataFusion: > > https://arxiv.org/abs/2010.00152 Sort-based grouping and aggregation Thanh Do, Goetz Graefe Ma

[GitHub] [arrow-datafusion] hntd187 edited a comment on issue #1544: Streaming support for DataFusion

2022-01-17 Thread GitBox
hntd187 edited a comment on issue #1544: URL: https://github.com/apache/arrow-datafusion/issues/1544#issuecomment-1014206250 I checking my basic PoC work, I have much to do still but for anyone looking to assist or just review. https://github.com/datafusion-contrib/datafusion-stream

[GitHub] [arrow] ursabot edited a comment on pull request #12143: ARROW-15324: [C++] Avoid crashing when HDFS file fails closing

2022-01-17 Thread GitBox
ursabot edited a comment on pull request #12143: URL: https://github.com/apache/arrow/pull/12143#issuecomment-1014516988 Benchmark runs are scheduled for baseline = 7e012736611cdda2b1d7082f41bb2e77eb16bbbd and contender = c4ef0486b16112813857e587dab84b3461b90542. c4ef0486b16112813857e587d

[GitHub] [arrow-datafusion] Jimexist merged pull request #1597: update nightly version

2022-01-17 Thread GitBox
Jimexist merged pull request #1597: URL: https://github.com/apache/arrow-datafusion/pull/1597 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gith

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1554: support mathematics operation for decimal data type

2022-01-17 Thread GitBox
alamb commented on a change in pull request #1554: URL: https://github.com/apache/arrow-datafusion/pull/1554#discussion_r786099180 ## File path: datafusion/src/physical_plan/coercion_rule/binary_rule.rs ## @@ -162,12 +162,141 @@ fn get_comparison_common_decimal_type( } }

[GitHub] [arrow-datafusion] xudong963 commented on a change in pull request #1566: fix: sql planner creates cross join instead of inner join from select predicates

2022-01-17 Thread GitBox
xudong963 commented on a change in pull request #1566: URL: https://github.com/apache/arrow-datafusion/pull/1566#discussion_r786099463 ## File path: datafusion/src/sql/planner.rs ## @@ -731,20 +732,33 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { }

[GitHub] [arrow-datafusion] alamb commented on pull request #1595: Fix null comparison for Parquet pruning predicate

2022-01-17 Thread GitBox
alamb commented on pull request #1595: URL: https://github.com/apache/arrow-datafusion/pull/1595#issuecomment-1014653042 Thank you @viirya -- I will try to review this carefully, but likely won't be able to do so until tomorrow -- This is an automated message from the Apache Git Servic

[GitHub] [arrow-datafusion] alamb commented on pull request #1596: Consolidate sort and external_sort, consolidate N-way merge sort

2022-01-17 Thread GitBox
alamb commented on pull request #1596: URL: https://github.com/apache/arrow-datafusion/pull/1596#issuecomment-1014654729 Thank you @yjshen -- this looks awesome -- I will try and review this carefully, but likely won't have time until tomorrow -- This is an automated message from the A

[GitHub] [arrow-datafusion] hntd187 opened a new pull request #1600: Experimental commit for stream processing

2022-01-17 Thread GitBox
hntd187 opened a new pull request #1600: URL: https://github.com/apache/arrow-datafusion/pull/1600 # Which issue does this PR close? #1544 # Rationale for this change Just some basic hooking into logical plan where necessary. I'm not sure if this is the correct way to do

  1   2   3   >