[GitHub] [arrow] ovr commented on pull request #9449: ARROW-11563: [Rust] Support Cast(Utf8, TimeStamp(Nanoseconds, None))

2021-02-12 Thread GitBox
ovr commented on pull request #9449: URL: https://github.com/apache/arrow/pull/9449#issuecomment-778066965 Just a notice: Rebased This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [arrow] liyafan82 commented on a change in pull request #8949: ARROW-10880: [Java] Support compressing RecordBatch IPC buffers by LZ4

2021-02-12 Thread GitBox
liyafan82 commented on a change in pull request #8949: URL: https://github.com/apache/arrow/pull/8949#discussion_r575091316 ## File path: java/vector/pom.xml ## @@ -74,6 +74,11 @@ org.slf4j slf4j-api + Review comment: @emkornfield Sounds reasonab

[GitHub] [arrow] liyafan82 commented on pull request #8949: ARROW-10880: [Java] Support compressing RecordBatch IPC buffers by LZ4

2021-02-12 Thread GitBox
liyafan82 commented on pull request #8949: URL: https://github.com/apache/arrow/pull/8949#issuecomment-778084083 > @liyafan82 could you enable the java integration test to confirm that reading the files generated by C++ works before we merge (once we verify it is working I can take a final

[GitHub] [arrow] alamb commented on pull request #9376: ARROW-11446: [DataFusion] Added support for scalarValue in Builtin functions.

2021-02-12 Thread GitBox
alamb commented on pull request #9376: URL: https://github.com/apache/arrow/pull/9376#issuecomment-778126649 @seddonm1 -- what do you think about merge order of this PR and #9243 ? (which will conflict) This is an automated

[GitHub] [arrow] alamb commented on pull request #9449: ARROW-11563: [Rust] Support Cast(Utf8, TimeStamp(Nanoseconds, None))

2021-02-12 Thread GitBox
alamb commented on pull request #9449: URL: https://github.com/apache/arrow/pull/9449#issuecomment-778131728 Thanks again @ovr ! This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [arrow] alamb closed pull request #9449: ARROW-11563: [Rust] Support Cast(Utf8, TimeStamp(Nanoseconds, None))

2021-02-12 Thread GitBox
alamb closed pull request #9449: URL: https://github.com/apache/arrow/pull/9449 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] seddonm1 commented on pull request #9376: ARROW-11446: [DataFusion] Added support for scalarValue in Builtin functions.

2021-02-12 Thread GitBox
seddonm1 commented on pull request #9376: URL: https://github.com/apache/arrow/pull/9376#issuecomment-778133305 Unfortunately (for me) this logically does go first as being able to identify ScalarValue would give a huge performance advantage. I am happy to rework the other one after

[GitHub] [arrow] alamb commented on pull request #9402: ARROW-11481: [Rust] More cast implementations

2021-02-12 Thread GitBox
alamb commented on pull request #9402: URL: https://github.com/apache/arrow/pull/9402#issuecomment-778133435 @jorgecarleitao do you think this one is ready to go ? This is an automated message from the Apache Git Service. To

[GitHub] [arrow] alamb closed pull request #9445: ARROW-11557: [Rust][Datafusion] Add deregister_table

2021-02-12 Thread GitBox
alamb closed pull request #9445: URL: https://github.com/apache/arrow/pull/9445 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] alamb commented on pull request #9445: ARROW-11557: [Rust][Datafusion] Add deregister_table

2021-02-12 Thread GitBox
alamb commented on pull request #9445: URL: https://github.com/apache/arrow/pull/9445#issuecomment-778134379 Thanks @marcprux ! This is an automated message from the Apache Git Service. To respond to the message, please log o

[GitHub] [arrow] alamb commented on pull request #9412: ARROW-11491: [Rust] support JSON schema inference for nested list and struct

2021-02-12 Thread GitBox
alamb commented on pull request #9412: URL: https://github.com/apache/arrow/pull/9412#issuecomment-778134781 @houqp -- I think this one is ready to go other than > The only question I think should be answered / explained before this is merged is why the checked in file test/data/mi

[GitHub] [arrow] alamb commented on pull request #9353: ARROW-11420: [Rust] Added support to length of Binary and List.

2021-02-12 Thread GitBox
alamb commented on pull request #9353: URL: https://github.com/apache/arrow/pull/9353#issuecomment-778135054 @jorgecarleitao -- this one needs a rebase and then I think it is ready to go This is an automated message from th

[GitHub] [arrow] aitor94 commented on pull request #8491: ARROW-10349: [Python] linux aarch64 wheels

2021-02-12 Thread GitBox
aitor94 commented on pull request #8491: URL: https://github.com/apache/arrow/pull/8491#issuecomment-778135205 I'm trying to install pyarrow in a m6g instance on AWS and I can't. This PR may help to solve my issue. [https://stackoverflow.com/questions/64928357/how-to-install-pyarrow-in-ama

[GitHub] [arrow] alamb commented on pull request #9233: ARROW-11289: [Rust][DataFusion] Implement GROUP BY support for Dictionary Encoded columns

2021-02-12 Thread GitBox
alamb commented on pull request #9233: URL: https://github.com/apache/arrow/pull/9233#issuecomment-778135913 This one is now ready for review. cc @andygrove @seddonm1 @jhorstmann and @Dandandan This is an automated message

[GitHub] [arrow] alamb commented on a change in pull request #9233: ARROW-11289: [Rust][DataFusion] Implement GROUP BY support for Dictionary Encoded columns

2021-02-12 Thread GitBox
alamb commented on a change in pull request #9233: URL: https://github.com/apache/arrow/pull/9233#discussion_r575154628 ## File path: rust/datafusion/src/physical_plan/hash_aggregate.rs ## @@ -398,97 +405,165 @@ fn group_aggregate_batch( Ok(accumulators) } -/// Create a

[GitHub] [arrow] alamb commented on pull request #9376: ARROW-11446: [DataFusion] Added support for scalarValue in Builtin functions.

2021-02-12 Thread GitBox
alamb commented on pull request #9376: URL: https://github.com/apache/arrow/pull/9376#issuecomment-778137269 Sounds good -- @jorgecarleitao let's get it merged ! It looks like it needs another rebase and then we'll get it in ---

[GitHub] [arrow] nevi-me commented on a change in pull request #9353: ARROW-11420: [Rust] Added support to length of Binary and List.

2021-02-12 Thread GitBox
nevi-me commented on a change in pull request #9353: URL: https://github.com/apache/arrow/pull/9353#discussion_r575168065 ## File path: rust/arrow/src/compute/kernels/length.rs ## @@ -24,42 +24,77 @@ use crate::{ }; use std::sync::Arc; -fn length_string(array: &Array, data_

[GitHub] [arrow] lidavidm closed pull request #9433: ARROW-11539: [Developer][Archery] Change items_per_seconds units

2021-02-12 Thread GitBox
lidavidm closed pull request #9433: URL: https://github.com/apache/arrow/pull/9433 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[GitHub] [arrow] marcprux opened a new pull request #9479: ARROW-11586: [Rust][Datafusion][WIP] Remove force unwrap

2021-02-12 Thread GitBox
marcprux opened a new pull request #9479: URL: https://github.com/apache/arrow/pull/9479 Fix for https://issues.apache.org/jira/browse/ARROW-11586 This is an automated message from the Apache Git Service. To respond to the me

[GitHub] [arrow] github-actions[bot] commented on pull request #9479: ARROW-11586: [Rust][Datafusion][WIP] Remove force unwrap

2021-02-12 Thread GitBox
github-actions[bot] commented on pull request #9479: URL: https://github.com/apache/arrow/pull/9479#issuecomment-778196690 https://issues.apache.org/jira/browse/ARROW-11586 This is an automated message from the Apache Git Ser

[GitHub] [arrow] codecov-io commented on pull request #9479: ARROW-11586: [Rust][Datafusion][WIP] Remove force unwrap

2021-02-12 Thread GitBox
codecov-io commented on pull request #9479: URL: https://github.com/apache/arrow/pull/9479#issuecomment-778207974 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9479?src=pr&el=h1) Report > Merging [#9479](https://codecov.io/gh/apache/arrow/pull/9479?src=pr&el=desc) (ed800b7) into

[GitHub] [arrow] abreis commented on pull request #9454: ARROW-11572: [Rust] Add a kernel for division by single scalar

2021-02-12 Thread GitBox
abreis commented on pull request #9454: URL: https://github.com/apache/arrow/pull/9454#issuecomment-778208765 > @abreis , I am not convinced that that is sufficient, unfortunately, because it excludes all types that are not Numeric (i.e. all dates and times for primitives, as well as all o

[GitHub] [arrow] abreis edited a comment on pull request #9454: ARROW-11572: [Rust] Add a kernel for division by single scalar

2021-02-12 Thread GitBox
abreis edited a comment on pull request #9454: URL: https://github.com/apache/arrow/pull/9454#issuecomment-778208765 > @abreis , I am not convinced that that is sufficient, unfortunately, because it excludes all types that are not Numeric (i.e. all dates and times for primitives, as well a

[GitHub] [arrow] lidavidm opened a new pull request #9480: ARROW-11596: [Python][Dataset] make ScanTask.execute() eager

2021-02-12 Thread GitBox
lidavidm opened a new pull request #9480: URL: https://github.com/apache/arrow/pull/9480 This changes 2 things: - ScanTask.execute() is now eagerly evaluated, so any work involved in creating a record batch reader is done up front. For example, a Parquet file will actually begin reading

[GitHub] [arrow] github-actions[bot] commented on pull request #9480: ARROW-11596: [Python][Dataset] make ScanTask.execute() eager

2021-02-12 Thread GitBox
github-actions[bot] commented on pull request #9480: URL: https://github.com/apache/arrow/pull/9480#issuecomment-778212915 https://issues.apache.org/jira/browse/ARROW-11596 This is an automated message from the Apache Git Ser

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #9480: ARROW-11596: [Python][Dataset] make ScanTask.execute() eager

2021-02-12 Thread GitBox
jorisvandenbossche commented on a change in pull request #9480: URL: https://github.com/apache/arrow/pull/9480#discussion_r575250046 ## File path: python/pyarrow/_dataset.pyx ## @@ -2125,12 +2125,44 @@ cdef class ScanTask(_Weakrefable): --- record_batches

[GitHub] [arrow] lidavidm commented on a change in pull request #9480: ARROW-11596: [Python][Dataset] make ScanTask.execute() eager

2021-02-12 Thread GitBox
lidavidm commented on a change in pull request #9480: URL: https://github.com/apache/arrow/pull/9480#discussion_r575252412 ## File path: python/pyarrow/_dataset.pyx ## @@ -2125,12 +2125,44 @@ cdef class ScanTask(_Weakrefable): --- record_batches : iterator

[GitHub] [arrow] emkornfield commented on pull request #9474: ARROW-10420: [C++] Refactor io and filesystem APIs to take an IOContext

2021-02-12 Thread GitBox
emkornfield commented on pull request #9474: URL: https://github.com/apache/arrow/pull/9474#issuecomment-778245429 Drive by naming nit: IoContext This is an automated message from the Apache Git Service. To respond to the mes

[GitHub] [arrow] emkornfield edited a comment on pull request #9474: ARROW-10420: [C++] Refactor io and filesystem APIs to take an IOContext

2021-02-12 Thread GitBox
emkornfield edited a comment on pull request #9474: URL: https://github.com/apache/arrow/pull/9474#issuecomment-778245429 naming nit: IoContext? This is an automated message from the Apache Git Service. To respond to the mes

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #9386: ARROW-11373: [Python][Docs] Add example of specifying type for a column when reading csv file

2021-02-12 Thread GitBox
jorisvandenbossche commented on a change in pull request #9386: URL: https://github.com/apache/arrow/pull/9386#discussion_r575303498 ## File path: docs/source/python/csv.rst ## @@ -75,7 +75,22 @@ Customized conversion - To alter how CSV data is converted

[GitHub] [arrow] jorisvandenbossche commented on pull request #9466: ARROW-11379: [C++][Dataset] Better formatting for timestamp scalars

2021-02-12 Thread GitBox
jorisvandenbossche commented on pull request #9466: URL: https://github.com/apache/arrow/pull/9466#issuecomment-778278059 The diff shows a change in the testing submodule, is that expected? This is an automated message from t

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #9470: ARROW-11480: [Python] Test filtering on INT96 timestamps

2021-02-12 Thread GitBox
jorisvandenbossche commented on a change in pull request #9470: URL: https://github.com/apache/arrow/pull/9470#discussion_r575324170 ## File path: python/pyarrow/tests/parquet/test_dataset.py ## @@ -352,6 +352,25 @@ def test_filters_cutoff_exclusive_datetime(tempdir, use_legac

[GitHub] [arrow] bkietz commented on pull request #9466: ARROW-11379: [C++][Dataset] Better formatting for timestamp scalars

2021-02-12 Thread GitBox
bkietz commented on pull request #9466: URL: https://github.com/apache/arrow/pull/9466#issuecomment-778279411 No, there should be no change to `testing/`.I'll fix that This is an automated message from the Apache Git Service.

[GitHub] [arrow] bkietz commented on a change in pull request #9470: ARROW-11480: [Python] Test filtering on INT96 timestamps

2021-02-12 Thread GitBox
bkietz commented on a change in pull request #9470: URL: https://github.com/apache/arrow/pull/9470#discussion_r575325063 ## File path: python/pyarrow/tests/parquet/test_dataset.py ## @@ -352,6 +352,25 @@ def test_filters_cutoff_exclusive_datetime(tempdir, use_legacy_dataset):

[GitHub] [arrow] andygrove opened a new pull request #9481: ARROW-11606: [Rust] [DataFusion] Add input schema to HashAggregateExec [WIP]

2021-02-12 Thread GitBox
andygrove opened a new pull request #9481: URL: https://github.com/apache/arrow/pull/9481 To make it easier to implement serde for `HashAggregateExec` we need access to the schema that the aggregate expressions are compiled against. For `Partial` aggregates this is the same as the schema o

[GitHub] [arrow] github-actions[bot] commented on pull request #9481: ARROW-11606: [Rust] [DataFusion] Add input schema to HashAggregateExec [WIP]

2021-02-12 Thread GitBox
github-actions[bot] commented on pull request #9481: URL: https://github.com/apache/arrow/pull/9481#issuecomment-778280588 https://issues.apache.org/jira/browse/ARROW-11606 This is an automated message from the Apache Git Ser

[GitHub] [arrow] andygrove commented on pull request #9481: ARROW-11606: [Rust] [DataFusion] Add input schema to HashAggregateExec [WIP]

2021-02-12 Thread GitBox
andygrove commented on pull request #9481: URL: https://github.com/apache/arrow/pull/9481#issuecomment-778280783 @jorgecarleitao Does this make sense? This is an automated message from the Apache Git Service. To respond to th

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #9323: ARROW-10438: [C++][Dataset] Partitioning::Format on nulls

2021-02-12 Thread GitBox
jorisvandenbossche commented on a change in pull request #9323: URL: https://github.com/apache/arrow/pull/9323#discussion_r575328856 ## File path: python/pyarrow/tests/test_dataset.py ## @@ -1587,33 +1587,54 @@ def test_open_dataset_non_existing_file(): @pytest.mark.parquet

[GitHub] [arrow] lidavidm opened a new pull request #9482: ARROW-11601: [C++][Python][Dataset] expose Parquet pre-buffer option

2021-02-12 Thread GitBox
lidavidm opened a new pull request #9482: URL: https://github.com/apache/arrow/pull/9482 This exposes the pre-buffering option that was implemented for the base Parquet reader in Datasets. To summarize, the option coalesces and buffers ranges of a file based on the columns and row g

[GitHub] [arrow] github-actions[bot] commented on pull request #9482: ARROW-11601: [C++][Python][Dataset] expose Parquet pre-buffer option

2021-02-12 Thread GitBox
github-actions[bot] commented on pull request #9482: URL: https://github.com/apache/arrow/pull/9482#issuecomment-778300420 https://issues.apache.org/jira/browse/ARROW-11601 This is an automated message from the Apache Git Ser

[GitHub] [arrow] jorgecarleitao commented on pull request #9481: ARROW-11606: [Rust] [DataFusion] Add input schema to HashAggregateExec [WIP]

2021-02-12 Thread GitBox
jorgecarleitao commented on pull request #9481: URL: https://github.com/apache/arrow/pull/9481#issuecomment-778320725 Makes a lot of sense to me. 👍 Does it address the issue on Ballista? This is an automated message from the

[GitHub] [arrow] andygrove commented on pull request #9481: ARROW-11606: [Rust] [DataFusion] Add input schema to HashAggregateExec [WIP]

2021-02-12 Thread GitBox
andygrove commented on pull request #9481: URL: https://github.com/apache/arrow/pull/9481#issuecomment-778323450 Yes, I think so. See https://github.com/ballista-compute/ballista/pull/505 ... I am going to finish testing this over the weekend. -

[GitHub] [arrow] marcprux commented on pull request #9479: ARROW-11586: [Rust][Datafusion] Remove force unwrap

2021-02-12 Thread GitBox
marcprux commented on pull request #9479: URL: https://github.com/apache/arrow/pull/9479#issuecomment-778369173 I've un-drafted it. This is an automated message from the Apache Git Service. To respond to the message, please l

[GitHub] [arrow] westonpace commented on a change in pull request #9323: ARROW-10438: [C++][Dataset] Partitioning::Format on nulls

2021-02-12 Thread GitBox
westonpace commented on a change in pull request #9323: URL: https://github.com/apache/arrow/pull/9323#discussion_r575451640 ## File path: cpp/src/arrow/dataset/partition.cc ## @@ -74,15 +74,26 @@ Status KeyValuePartitioning::SetDefaultValuesFromKeys(const Expression& expr,

[GitHub] [arrow] codecov-io commented on pull request #9376: ARROW-11446: [DataFusion] Added support for scalarValue in Builtin functions.

2021-02-12 Thread GitBox
codecov-io commented on pull request #9376: URL: https://github.com/apache/arrow/pull/9376#issuecomment-778382279 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9376?src=pr&el=h1) Report > Merging [#9376](https://codecov.io/gh/apache/arrow/pull/9376?src=pr&el=desc) (41e8f26) into

[GitHub] [arrow] westonpace commented on a change in pull request #9323: ARROW-10438: [C++][Dataset] Partitioning::Format on nulls

2021-02-12 Thread GitBox
westonpace commented on a change in pull request #9323: URL: https://github.com/apache/arrow/pull/9323#discussion_r575453002 ## File path: python/pyarrow/tests/test_dataset.py ## @@ -1587,33 +1587,54 @@ def test_open_dataset_non_existing_file(): @pytest.mark.parquet @pytest

[GitHub] [arrow] dianaclarke commented on pull request #9272: [WIP] Benchmark placebo

2021-02-12 Thread GitBox
dianaclarke commented on pull request #9272: URL: https://github.com/apache/arrow/pull/9272#issuecomment-778424894 @ursabot please benchmark This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] seddonm1 commented on a change in pull request #9233: ARROW-11289: [Rust][DataFusion] Implement GROUP BY support for Dictionary Encoded columns

2021-02-12 Thread GitBox
seddonm1 commented on a change in pull request #9233: URL: https://github.com/apache/arrow/pull/9233#discussion_r575535244 ## File path: rust/datafusion/src/physical_plan/hash_aggregate.rs ## @@ -398,97 +405,165 @@ fn group_aggregate_batch( Ok(accumulators) } -/// Creat

[GitHub] [arrow] seddonm1 commented on a change in pull request #9233: ARROW-11289: [Rust][DataFusion] Implement GROUP BY support for Dictionary Encoded columns

2021-02-12 Thread GitBox
seddonm1 commented on a change in pull request #9233: URL: https://github.com/apache/arrow/pull/9233#discussion_r575535374 ## File path: rust/datafusion/src/physical_plan/hash_aggregate.rs ## @@ -398,97 +405,165 @@ fn group_aggregate_batch( Ok(accumulators) } -/// Creat

[GitHub] [arrow] nealrichardson commented on pull request #9272: [WIP] Benchmark placebo

2021-02-12 Thread GitBox
nealrichardson commented on pull request #9272: URL: https://github.com/apache/arrow/pull/9272#issuecomment-778471641 @github-actions rebase This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] seddonm1 commented on a change in pull request #9233: ARROW-11289: [Rust][DataFusion] Implement GROUP BY support for Dictionary Encoded columns

2021-02-12 Thread GitBox
seddonm1 commented on a change in pull request #9233: URL: https://github.com/apache/arrow/pull/9233#discussion_r575535635 ## File path: rust/datafusion/src/physical_plan/hash_aggregate.rs ## @@ -398,97 +405,165 @@ fn group_aggregate_batch( Ok(accumulators) } -/// Creat

[GitHub] [arrow] nealrichardson opened a new pull request #9483: ARROW-11610: [C++] Download boost from sourceforge instead of bintray

2021-02-12 Thread GitBox
nealrichardson opened a new pull request #9483: URL: https://github.com/apache/arrow/pull/9483 This also removes some bintray URLs where we had mirrored dependency versions for redundancy (cf. ARROW-11611); they're not active anymore anyway because we've bumped the dependency versions and

[GitHub] [arrow] seddonm1 commented on pull request #9233: ARROW-11289: [Rust][DataFusion] Implement GROUP BY support for Dictionary Encoded columns

2021-02-12 Thread GitBox
seddonm1 commented on pull request #9233: URL: https://github.com/apache/arrow/pull/9233#issuecomment-778472259 @alamb looks good to me 👍 This is an automated message from the Apache Git Service. To respond to the message, p

[GitHub] [arrow] github-actions[bot] commented on pull request #9483: ARROW-11610: [C++] Download boost from sourceforge instead of bintray

2021-02-12 Thread GitBox
github-actions[bot] commented on pull request #9483: URL: https://github.com/apache/arrow/pull/9483#issuecomment-778472380 https://issues.apache.org/jira/browse/ARROW-11610 This is an automated message from the Apache Git Ser

[GitHub] [arrow] github-actions[bot] commented on pull request #9484: ARROW-11614: Fix round() logic to return positive zero when argument is zero

2021-02-12 Thread GitBox
github-actions[bot] commented on pull request #9484: URL: https://github.com/apache/arrow/pull/9484#issuecomment-778482095 https://issues.apache.org/jira/browse/ARROW-11614 This is an automated message from the Apache Git Ser

[GitHub] [arrow] sagnikc-dremio opened a new pull request #9484: ARROW-11614: Fix round() logic to return positive zero when argument is zero

2021-02-12 Thread GitBox
sagnikc-dremio opened a new pull request #9484: URL: https://github.com/apache/arrow/pull/9484 Previously, round(0.0) and round(0.0, out_scale) were returning -0.0, with this patch round() returns +0.0 This is an automated m

[GitHub] [arrow] alamb closed pull request #9481: ARROW-11606: [Rust] [DataFusion] Add input schema to HashAggregateExec

2021-02-12 Thread GitBox
alamb closed pull request #9481: URL: https://github.com/apache/arrow/pull/9481 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] alamb closed pull request #9479: ARROW-11586: [Rust][Datafusion] Remove force unwrap

2021-02-12 Thread GitBox
alamb closed pull request #9479: URL: https://github.com/apache/arrow/pull/9479 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] alamb commented on pull request #9479: ARROW-11586: [Rust][Datafusion] Remove force unwrap

2021-02-12 Thread GitBox
alamb commented on pull request #9479: URL: https://github.com/apache/arrow/pull/9479#issuecomment-778491382 Thanks @marcprux This is an automated message from the Apache Git Service. To respond to the message, please log on

[GitHub] [arrow] alamb commented on pull request #9481: ARROW-11606: [Rust] [DataFusion] Add input schema to HashAggregateExec

2021-02-12 Thread GitBox
alamb commented on pull request #9481: URL: https://github.com/apache/arrow/pull/9481#issuecomment-778492158 FWIW @jorgecarleitao and @andygrove -- I have been using DataFusion quite heavily in IOx (e.g. https://github.com/influxdata/influxdb_iox/pull/795) and it is doing great. DataFusi

[GitHub] [arrow] andygrove commented on pull request #9481: ARROW-11606: [Rust] [DataFusion] Add input schema to HashAggregateExec

2021-02-12 Thread GitBox
andygrove commented on pull request #9481: URL: https://github.com/apache/arrow/pull/9481#issuecomment-778503743 Thanks for the quick review and merge @jorgecarleitao and @alamb. I think distributed query execution will be working well enough in Ballista to support some TPC-H queries this

[GitHub] [arrow] BryanCutler commented on pull request #9187: ARROW-11223: [Java] Fix: BaseVariableWidthVector/BaseLargeVariableWidthVector setNull() and getBufferSizeFor() trigger offset buffer overf

2021-02-12 Thread GitBox
BryanCutler commented on pull request #9187: URL: https://github.com/apache/arrow/pull/9187#issuecomment-778506839 @WeichenXu123 we would just need to add the following API to `BaseVariableWidthVector` (and possibly `BaseRepeatedValueVector`) ```Java public int getBufferSizeFor(fi

[GitHub] [arrow] ursabot commented on pull request #9272: [WIP] Benchmark placebo

2021-02-12 Thread GitBox
ursabot commented on pull request #9272: URL: https://github.com/apache/arrow/pull/9272#issuecomment-778520842 ubuntu-20.04-x86_64: https://conbench.ursa.dev/compare/runs/139e0ea9-58ba-4b90-9eb7-c1e91f4daebd...223ea2f6-3e1f-4bc2-86d9-36113b073403/ dgx-ubuntu-18.04-x86_64: https://conben

[GitHub] [arrow] nealrichardson commented on a change in pull request #9423: ARROW-9856: [R] Add bindings for string compute functions

2021-02-12 Thread GitBox
nealrichardson commented on a change in pull request #9423: URL: https://github.com/apache/arrow/pull/9423#discussion_r575584575 ## File path: r/tests/testthat/test-dplyr.R ## @@ -256,6 +262,29 @@ test_that("filter() with %in%", { ) }) +test_that("filter() with string ops

[GitHub] [arrow] nealrichardson commented on a change in pull request #9423: ARROW-9856: [R] Add bindings for string compute functions

2021-02-12 Thread GitBox
nealrichardson commented on a change in pull request #9423: URL: https://github.com/apache/arrow/pull/9423#discussion_r575584575 ## File path: r/tests/testthat/test-dplyr.R ## @@ -256,6 +262,29 @@ test_that("filter() with %in%", { ) }) +test_that("filter() with string ops

[GitHub] [arrow] houqp commented on pull request #9412: ARROW-11491: [Rust] support JSON schema inference for nested list and struct

2021-02-12 Thread GitBox
houqp commented on pull request #9412: URL: https://github.com/apache/arrow/pull/9412#issuecomment-778534603 sorry for the delay, i have been busy lately. will update the PR this weekend to address all the feedback commented so far.

[GitHub] [arrow] seddonm1 opened a new pull request #9485: ARROW-11616: [Rust][DataFusion] Add collect_partitioned on DataFrame

2021-02-12 Thread GitBox
seddonm1 opened a new pull request #9485: URL: https://github.com/apache/arrow/pull/9485 The DataFrame API has a `collect` method which invokes the `collect(plan: Arc) -> Result>` function which will collect records into a single vector of RecordBatches removing any partitioning via `Merg

[GitHub] [arrow] github-actions[bot] commented on pull request #9485: ARROW-11616: [Rust][DataFusion] Add collect_partitioned on DataFrame

2021-02-12 Thread GitBox
github-actions[bot] commented on pull request #9485: URL: https://github.com/apache/arrow/pull/9485#issuecomment-778550493 https://issues.apache.org/jira/browse/ARROW-11616 This is an automated message from the Apache Git Ser

[GitHub] [arrow] codecov-io commented on pull request #9485: ARROW-11616: [Rust][DataFusion] Add collect_partitioned on DataFrame

2021-02-12 Thread GitBox
codecov-io commented on pull request #9485: URL: https://github.com/apache/arrow/pull/9485#issuecomment-778552128 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9485?src=pr&el=h1) Report > Merging [#9485](https://codecov.io/gh/apache/arrow/pull/9485?src=pr&el=desc) (5ed4321) into

[GitHub] [arrow] emkornfield commented on pull request #8949: ARROW-10880: [Java] Support compressing RecordBatch IPC buffers by LZ4

2021-02-12 Thread GitBox
emkornfield commented on pull request #8949: URL: https://github.com/apache/arrow/pull/8949#issuecomment-778562980 > > efore we merge (once we verify it is working I can take a final look) > > Sure. I will do some tests for that. To run tests it should be sufficient to unskip t

[GitHub] [arrow] projjal opened a new pull request #9486: ARROW-11617: [C++][Gandiva] Fix nested if-else optimisation in gandiva

2021-02-12 Thread GitBox
projjal opened a new pull request #9486: URL: https://github.com/apache/arrow/pull/9486 In gandiva, when we have nested if-else statements we reuse the local bitmap and treat it is a single logical if - elseif - .. - --else condition. However, when he have say another function between them

[GitHub] [arrow] github-actions[bot] commented on pull request #9486: ARROW-11617: [C++][Gandiva] Fix nested if-else optimisation in gandiva

2021-02-12 Thread GitBox
github-actions[bot] commented on pull request #9486: URL: https://github.com/apache/arrow/pull/9486#issuecomment-778565617 https://issues.apache.org/jira/browse/ARROW-11617 This is an automated message from the Apache Git Ser

[GitHub] [arrow] nevi-me commented on a change in pull request #9425: ARROW-11504: [Rust] Added checks to List DataType.

2021-02-12 Thread GitBox
nevi-me commented on a change in pull request #9425: URL: https://github.com/apache/arrow/pull/9425#discussion_r575625396 ## File path: rust/arrow/src/array/array_list.rs ## @@ -31,12 +31,19 @@ use crate::datatypes::*; /// trait declaring an offset size, relevant for i32 vs

[GitHub] [arrow] codecov-io commented on pull request #9469: ARROW-11599: [Rust] Add function to create array with all nulls

2021-02-12 Thread GitBox
codecov-io commented on pull request #9469: URL: https://github.com/apache/arrow/pull/9469#issuecomment-778569205 # [Codecov](https://codecov.io/gh/apache/arrow/pull/9469?src=pr&el=h1) Report > Merging [#9469](https://codecov.io/gh/apache/arrow/pull/9469?src=pr&el=desc) (8c45e47) into

[GitHub] [arrow] projjal closed pull request #9486: ARROW-11617: [C++][Gandiva] Fix nested if-else optimisation in gandiva

2021-02-12 Thread GitBox
projjal closed pull request #9486: URL: https://github.com/apache/arrow/pull/9486 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th