[GitHub] [arrow] ursabot edited a comment on pull request #12174: ARROW-15356: [Ruby] Add support for .arrows extension

2022-01-18 Thread GitBox
ursabot edited a comment on pull request #12174: URL: https://github.com/apache/arrow/pull/12174#issuecomment-1015100154 Benchmark runs are scheduled for baseline = 1fc9a2982d5876b84c4cf64d557054480a65d0c9 and contender = 8254615b9af90ea35583c1f2903bcc3f7f966968. 8254615b9af90ea35583c1f29

[GitHub] [arrow-datafusion] liukun4515 commented on a change in pull request #1554: support mathematics operation for decimal data type

2022-01-18 Thread GitBox
liukun4515 commented on a change in pull request #1554: URL: https://github.com/apache/arrow-datafusion/pull/1554#discussion_r786495477 ## File path: datafusion/src/physical_plan/coercion_rule/binary_rule.rs ## @@ -162,12 +162,141 @@ fn get_comparison_common_decimal_type(

[GitHub] [arrow] bkmgit commented on a change in pull request #11882: ARROW-9843: [C++][Python] Implement Between ternary kernel and Python bindings

2022-01-18 Thread GitBox
bkmgit commented on a change in pull request #11882: URL: https://github.com/apache/arrow/pull/11882#discussion_r786497997 ## File path: cpp/src/arrow/compute/kernels/scalar_compare.cc ## @@ -156,39 +212,52 @@ struct Maximum { } }; +// Check if timestamp timezones are com

[GitHub] [arrow-datafusion] yjshen commented on pull request #1596: Consolidate sort and external_sort, consolidate N-way merge sort

2022-01-18 Thread GitBox
yjshen commented on pull request #1596: URL: https://github.com/apache/arrow-datafusion/pull/1596#issuecomment-1015165921 I first run bench `sort_limit_query_sql` ``` cargo criterion --bench sort_limit_query_sql ``` and witness no noticeable difference between this branch with

[GitHub] [arrow-datafusion] yjshen edited a comment on pull request #1596: Consolidate sort and external_sort, consolidate N-way merge sort

2022-01-18 Thread GitBox
yjshen edited a comment on pull request #1596: URL: https://github.com/apache/arrow-datafusion/pull/1596#issuecomment-1015165921 # [bench] `sort_limit_query_sql` ``` cargo criterion --bench sort_limit_query_sql ``` No noticeable difference between this branch with which it [or

[GitHub] [arrow-datafusion] yjshen edited a comment on pull request #1596: Consolidate sort and external_sort, consolidate N-way merge sort

2022-01-18 Thread GitBox
yjshen edited a comment on pull request #1596: URL: https://github.com/apache/arrow-datafusion/pull/1596#issuecomment-1015165921 ## [bench] `sort_limit_query_sql` ``` cargo criterion --bench sort_limit_query_sql ``` No noticeable difference between this branch with which it [o

[GitHub] [arrow] AlenkaF commented on pull request #12081: ARROW-10643: [Python] Pandas<->pyarrow roundtrip failing to recreate index for empty dataframe

2022-01-18 Thread GitBox
AlenkaF commented on pull request #12081: URL: https://github.com/apache/arrow/pull/12081#issuecomment-1015171005 @jorisvandenbossche could you give a final look at this PR please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

[GitHub] [arrow-datafusion] yjshen edited a comment on pull request #1596: Consolidate sort and external_sort, consolidate N-way merge sort

2022-01-18 Thread GitBox
yjshen edited a comment on pull request #1596: URL: https://github.com/apache/arrow-datafusion/pull/1596#issuecomment-1015165921 ## 1. [bench] `sort_limit_query_sql` ``` cargo criterion --bench sort_limit_query_sql ``` No noticeable difference between this branch with which it

[GitHub] [arrow] ursabot edited a comment on pull request #12174: ARROW-15356: [Ruby] Add support for .arrows extension

2022-01-18 Thread GitBox
ursabot edited a comment on pull request #12174: URL: https://github.com/apache/arrow/pull/12174#issuecomment-1015100154 Benchmark runs are scheduled for baseline = 1fc9a2982d5876b84c4cf64d557054480a65d0c9 and contender = 8254615b9af90ea35583c1f2903bcc3f7f966968. 8254615b9af90ea35583c1f29

[GitHub] [arrow] jorisvandenbossche closed pull request #12081: ARROW-10643: [Python] Pandas<->pyarrow roundtrip failing to recreate index for empty dataframe

2022-01-18 Thread GitBox
jorisvandenbossche closed pull request #12081: URL: https://github.com/apache/arrow/pull/12081 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: git

[GitHub] [arrow] jorisvandenbossche closed pull request #12169: MINOR: [Docs] Fix tabs usage in building.rst

2022-01-18 Thread GitBox
jorisvandenbossche closed pull request #12169: URL: https://github.com/apache/arrow/pull/12169 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: git

[GitHub] [arrow] jorisvandenbossche commented on pull request #12171: ARROW-15355: [Docs] Trigger sphinx build on documentation changes

2022-01-18 Thread GitBox
jorisvandenbossche commented on pull request #12171: URL: https://github.com/apache/arrow/pull/12171#issuecomment-1015184600 > We could merge this one and rebase #12169 to fix the actual build error. Whoops, I only see this now and just merged #12169 -- This is an automated messag

[GitHub] [arrow] ursabot commented on pull request #12081: ARROW-10643: [Python] Pandas<->pyarrow roundtrip failing to recreate index for empty dataframe

2022-01-18 Thread GitBox
ursabot commented on pull request #12081: URL: https://github.com/apache/arrow/pull/12081#issuecomment-1015185515 Benchmark runs are scheduled for baseline = 8254615b9af90ea35583c1f2903bcc3f7f966968 and contender = cec5a178e101e101d678776021d4469ec5f4947c. cec5a178e101e101d678776021d4469e

[GitHub] [arrow] ursabot commented on pull request #12169: MINOR: [Docs] Fix tabs usage in building.rst

2022-01-18 Thread GitBox
ursabot commented on pull request #12169: URL: https://github.com/apache/arrow/pull/12169#issuecomment-1015185569 Benchmark runs are scheduled for baseline = cec5a178e101e101d678776021d4469ec5f4947c and contender = 3e7a9f3e97f4cd847887703da039e88a608780c7. 3e7a9f3e97f4cd847887703da039e88a

[GitHub] [arrow-datafusion] houqp commented on a change in pull request #1566: fix: sql planner creates cross join instead of inner join from select predicates

2022-01-18 Thread GitBox
houqp commented on a change in pull request #1566: URL: https://github.com/apache/arrow-datafusion/pull/1566#discussion_r786524099 ## File path: datafusion/src/sql/planner.rs ## @@ -731,20 +732,33 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { }

[GitHub] [arrow-datafusion] houqp commented on pull request #1566: fix: sql planner creates cross join instead of inner join from select predicates

2022-01-18 Thread GitBox
houqp commented on pull request #1566: URL: https://github.com/apache/arrow-datafusion/pull/1566#issuecomment-1015189038 As a future follow up task, I think there is value in handling this plan rewrite in our optimizer layer so dataframe users can benefit from it as well. -- This is an

[GitHub] [arrow-datafusion] houqp edited a comment on pull request #1566: fix: sql planner creates cross join instead of inner join from select predicates

2022-01-18 Thread GitBox
houqp edited a comment on pull request #1566: URL: https://github.com/apache/arrow-datafusion/pull/1566#issuecomment-1015189038 As a future follow up task, I think there is value in handling this plan rewrite in our optimizer layer so dataframe API users can benefit from it as well. --

[GitHub] [arrow-datafusion] xudong963 commented on a change in pull request #1566: fix: sql planner creates cross join instead of inner join from select predicates

2022-01-18 Thread GitBox
xudong963 commented on a change in pull request #1566: URL: https://github.com/apache/arrow-datafusion/pull/1566#discussion_r786527051 ## File path: datafusion/src/sql/planner.rs ## @@ -731,20 +732,33 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { }

[GitHub] [arrow-datafusion] xudong963 commented on a change in pull request #1566: fix: sql planner creates cross join instead of inner join from select predicates

2022-01-18 Thread GitBox
xudong963 commented on a change in pull request #1566: URL: https://github.com/apache/arrow-datafusion/pull/1566#discussion_r786529381 ## File path: datafusion/src/sql/planner.rs ## @@ -731,20 +732,33 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { }

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #11993: ARROW-15153: [Python] Expose ReferencedBufferSize to python

2022-01-18 Thread GitBox
jorisvandenbossche commented on a change in pull request #11993: URL: https://github.com/apache/arrow/pull/11993#discussion_r786528062 ## File path: python/pyarrow/array.pxi ## @@ -988,13 +988,51 @@ cdef class Array(_PandasConvertible): def nbytes(self): """

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #12171: ARROW-15355: [Docs] Trigger sphinx build on documentation changes

2022-01-18 Thread GitBox
jorisvandenbossche commented on a change in pull request #12171: URL: https://github.com/apache/arrow/pull/12171#discussion_r786536217 ## File path: .github/workflows/docs.yml ## @@ -32,6 +32,7 @@ on: - 'ci/scripts/js_build.sh' - 'ci/scripts/python_build.sh'

[GitHub] [arrow] ursabot edited a comment on pull request #12081: ARROW-10643: [Python] Pandas<->pyarrow roundtrip failing to recreate index for empty dataframe

2022-01-18 Thread GitBox
ursabot edited a comment on pull request #12081: URL: https://github.com/apache/arrow/pull/12081#issuecomment-1015185515 Benchmark runs are scheduled for baseline = 8254615b9af90ea35583c1f2903bcc3f7f966968 and contender = cec5a178e101e101d678776021d4469ec5f4947c. cec5a178e101e101d67877602

[GitHub] [arrow] vibhatha commented on a change in pull request #11993: ARROW-15153: [Python] Expose ReferencedBufferSize to python

2022-01-18 Thread GitBox
vibhatha commented on a change in pull request #11993: URL: https://github.com/apache/arrow/pull/11993#discussion_r786540742 ## File path: python/pyarrow/array.pxi ## @@ -988,13 +988,51 @@ cdef class Array(_PandasConvertible): def nbytes(self): """ Total

[GitHub] [arrow] vibhatha commented on pull request #11993: ARROW-15153: [Python] Expose ReferencedBufferSize to python

2022-01-18 Thread GitBox
vibhatha commented on pull request #11993: URL: https://github.com/apache/arrow/pull/11993#issuecomment-1015206272 @jorisvandenbossche also do you see why the Python/AMD64 Conda Python 3.9 Sphinx & Numpydoc workflow is failing? I also cannot build the docs locally due to the same error.

[GitHub] [arrow] ursabot edited a comment on pull request #12169: MINOR: [Docs] Fix tabs usage in building.rst

2022-01-18 Thread GitBox
ursabot edited a comment on pull request #12169: URL: https://github.com/apache/arrow/pull/12169#issuecomment-1015185569 Benchmark runs are scheduled for baseline = cec5a178e101e101d678776021d4469ec5f4947c and contender = 3e7a9f3e97f4cd847887703da039e88a608780c7. 3e7a9f3e97f4cd847887703da

[GitHub] [arrow] vibhatha opened a new pull request #12175: ARROW-15154: [R] Expose ReferencedBufferSize to R [WIP]

2022-01-18 Thread GitBox
vibhatha opened a new pull request #12175: URL: https://github.com/apache/arrow/pull/12175 In this PR, the `ReferencedBufferSize` functionality exposed in R API This is integrated for - [x] Array - [ ] ChunkedArray - [ ] RecordBatch - [ ] Table -- This is an autom

[GitHub] [arrow] github-actions[bot] commented on pull request #12175: ARROW-15154: [R] Expose ReferencedBufferSize to R [WIP]

2022-01-18 Thread GitBox
github-actions[bot] commented on pull request #12175: URL: https://github.com/apache/arrow/pull/12175#issuecomment-1015213943 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [arrow-datafusion] yjshen edited a comment on pull request #1596: Consolidate sort and external_sort, consolidate N-way merge sort

2022-01-18 Thread GitBox
yjshen edited a comment on pull request #1596: URL: https://github.com/apache/arrow-datafusion/pull/1596#issuecomment-1015165921 ## 1. [bench] `sort_limit_query_sql` ``` cargo criterion --bench sort_limit_query_sql ``` No noticeable difference between this branch with which it

[GitHub] [arrow-datafusion] yjshen edited a comment on pull request #1596: Consolidate sort and external_sort, consolidate N-way merge sort

2022-01-18 Thread GitBox
yjshen edited a comment on pull request #1596: URL: https://github.com/apache/arrow-datafusion/pull/1596#issuecomment-1015165921 ## 1. [bench] `sort_limit_query_sql` ``` cargo criterion --bench sort_limit_query_sql ``` No noticeable difference between this branch with which it

[GitHub] [arrow] AlenkaF opened a new pull request #12176: ARROW-10140: [Python][C++] Add test for map column of a parquet file created from pyarrow and pandas

2022-01-18 Thread GitBox
AlenkaF opened a new pull request #12176: URL: https://github.com/apache/arrow/pull/12176 Adding a test to `parquet/test_pandas.py` for a case when `pa.Table` is created from Pandas with map column. cc @jorisvandenbossche -- This is an automated message from the Apache Git Servic

[GitHub] [arrow] github-actions[bot] commented on pull request #12176: ARROW-10140: [Python][C++] Add test for map column of a parquet file created from pyarrow and pandas

2022-01-18 Thread GitBox
github-actions[bot] commented on pull request #12176: URL: https://github.com/apache/arrow/pull/12176#issuecomment-1015226918 https://issues.apache.org/jira/browse/ARROW-10140 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[GitHub] [arrow] kszucs commented on a change in pull request #12171: ARROW-15355: [Docs] Trigger sphinx build on documentation changes

2022-01-18 Thread GitBox
kszucs commented on a change in pull request #12171: URL: https://github.com/apache/arrow/pull/12171#discussion_r786583420 ## File path: .github/workflows/docs.yml ## @@ -32,6 +32,7 @@ on: - 'ci/scripts/js_build.sh' - 'ci/scripts/python_build.sh' - 'ci/scri

[GitHub] [arrow] kszucs commented on a change in pull request #12171: ARROW-15355: [Docs] Trigger sphinx build on documentation changes

2022-01-18 Thread GitBox
kszucs commented on a change in pull request #12171: URL: https://github.com/apache/arrow/pull/12171#discussion_r786583420 ## File path: .github/workflows/docs.yml ## @@ -32,6 +32,7 @@ on: - 'ci/scripts/js_build.sh' - 'ci/scripts/python_build.sh' - 'ci/scri

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #12171: ARROW-15355: [Docs] Trigger sphinx build on documentation changes

2022-01-18 Thread GitBox
jorisvandenbossche commented on a change in pull request #12171: URL: https://github.com/apache/arrow/pull/12171#discussion_r786591471 ## File path: .github/workflows/docs.yml ## @@ -32,6 +32,7 @@ on: - 'ci/scripts/js_build.sh' - 'ci/scripts/python_build.sh'

[GitHub] [arrow] pitrou commented on a change in pull request #12091: ARROW-14798: [C++][Python][R] Add container window to PrettyPrintOptions

2022-01-18 Thread GitBox
pitrou commented on a change in pull request #12091: URL: https://github.com/apache/arrow/pull/12091#discussion_r786581854 ## File path: cpp/src/arrow/pretty_print.h ## @@ -36,12 +36,13 @@ struct PrettyPrintOptions { PrettyPrintOptions() = default; PrettyPrintOptions(in

[GitHub] [arrow] kszucs commented on a change in pull request #12171: ARROW-15355: [Docs] Trigger sphinx build on documentation changes

2022-01-18 Thread GitBox
kszucs commented on a change in pull request #12171: URL: https://github.com/apache/arrow/pull/12171#discussion_r786594998 ## File path: .github/workflows/docs.yml ## @@ -32,6 +32,7 @@ on: - 'ci/scripts/js_build.sh' - 'ci/scripts/python_build.sh' - 'ci/scri

[GitHub] [arrow-rs] tustvold commented on pull request #1189: update nightly version for miri

2022-01-18 Thread GitBox
tustvold commented on pull request #1189: URL: https://github.com/apache/arrow-rs/pull/1189#issuecomment-1015258084 I think this might have broken something, the nightly SIMD tests are now failing complaining about missing llvm_asm, which appears to relate to https://github.com/rust-lang/p

[GitHub] [arrow] jorisvandenbossche commented on pull request #11993: ARROW-15153: [Python] Expose ReferencedBufferSize to python

2022-01-18 Thread GitBox
jorisvandenbossche commented on pull request #11993: URL: https://github.com/apache/arrow/pull/11993#issuecomment-1015264187 You can ignore the python doc failure, that was failing on master as well (I actually just merged a fix for that, so if you rebase that should be resolved) -- This

[GitHub] [arrow-datafusion] yjshen commented on pull request #1596: Consolidate sort and external_sort, consolidate N-way merge sort

2022-01-18 Thread GitBox
yjshen commented on pull request #1596: URL: https://github.com/apache/arrow-datafusion/pull/1596#issuecomment-1015265203 Flamegraph with this PR: ![flamegraph_svg](https://user-images.githubusercontent.com/1387718/149917828-b4a6618b-bc57-4ecc-bd9a-e24422e4e1fd.png) After a q

[GitHub] [arrow-datafusion] yjshen edited a comment on pull request #1596: Consolidate sort and external_sort, consolidate N-way merge sort

2022-01-18 Thread GitBox
yjshen edited a comment on pull request #1596: URL: https://github.com/apache/arrow-datafusion/pull/1596#issuecomment-1015265203 After a quick investigation with flamegraph: Flamegraph with this PR: ![flamegraph_svg](https://user-images.githubusercontent.com/1387718/149917828-b4a

[GitHub] [arrow] rok commented on a change in pull request #12141: ARROW-14100: [C++] subtract(duration, duration) -> duration kernel

2022-01-18 Thread GitBox
rok commented on a change in pull request #12141: URL: https://github.com/apache/arrow/pull/12141#discussion_r786609488 ## File path: cpp/src/arrow/compute/kernels/scalar_temporal_test.cc ## @@ -976,6 +976,23 @@ TEST_F(ScalarTemporalTest, TestTemporalDifference) { } } +TE

[GitHub] [arrow] ursabot edited a comment on pull request #12081: ARROW-10643: [Python] Pandas<->pyarrow roundtrip failing to recreate index for empty dataframe

2022-01-18 Thread GitBox
ursabot edited a comment on pull request #12081: URL: https://github.com/apache/arrow/pull/12081#issuecomment-1015185515 Benchmark runs are scheduled for baseline = 8254615b9af90ea35583c1f2903bcc3f7f966968 and contender = cec5a178e101e101d678776021d4469ec5f4947c. cec5a178e101e101d67877602

[GitHub] [arrow] kszucs closed pull request #12171: ARROW-15355: [Docs] Trigger sphinx build on documentation changes

2022-01-18 Thread GitBox
kszucs closed pull request #12171: URL: https://github.com/apache/arrow/pull/12171 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[GitHub] [arrow] pitrou commented on a change in pull request #12077: ARROW-15109: [Python] Add show_info() to print build, component, and system info

2022-01-18 Thread GitBox
pitrou commented on a change in pull request #12077: URL: https://github.com/apache/arrow/pull/12077#discussion_r786623993 ## File path: python/pyarrow/__init__.py ## @@ -30,7 +30,9 @@ """ import gc as _gc +import importlib import os as _os +import platform Review comment

[GitHub] [arrow] ursabot commented on pull request #12171: ARROW-15355: [Docs] Trigger sphinx build on documentation changes

2022-01-18 Thread GitBox
ursabot commented on pull request #12171: URL: https://github.com/apache/arrow/pull/12171#issuecomment-1015292438 Benchmark runs are scheduled for baseline = 3e7a9f3e97f4cd847887703da039e88a608780c7 and contender = 0174d394a00e9985cd6fd238cf1a409792c583ea. 0174d394a00e9985cd6fd238cf1a4097

[GitHub] [arrow] ursabot edited a comment on pull request #12081: ARROW-10643: [Python] Pandas<->pyarrow roundtrip failing to recreate index for empty dataframe

2022-01-18 Thread GitBox
ursabot edited a comment on pull request #12081: URL: https://github.com/apache/arrow/pull/12081#issuecomment-1015185515 Benchmark runs are scheduled for baseline = 8254615b9af90ea35583c1f2903bcc3f7f966968 and contender = cec5a178e101e101d678776021d4469ec5f4947c. cec5a178e101e101d67877602

[GitHub] [arrow] ursabot edited a comment on pull request #12171: ARROW-15355: [Docs] Trigger sphinx build on documentation changes

2022-01-18 Thread GitBox
ursabot edited a comment on pull request #12171: URL: https://github.com/apache/arrow/pull/12171#issuecomment-1015292438 Benchmark runs are scheduled for baseline = 3e7a9f3e97f4cd847887703da039e88a608780c7 and contender = 0174d394a00e9985cd6fd238cf1a409792c583ea. 0174d394a00e9985cd6fd238c

[GitHub] [arrow-datafusion] yjshen commented on pull request #1596: Consolidate sort and external_sort, consolidate N-way merge sort

2022-01-18 Thread GitBox
yjshen commented on pull request #1596: URL: https://github.com/apache/arrow-datafusion/pull/1596#issuecomment-1015308440 The logic for ExternalSort: 1. get batch from input, sort it, and buffer it in memory 2. when memory threshold meet, do in-mem-sort to do "N-way merge" and spill

[GitHub] [arrow-datafusion] yjshen edited a comment on pull request #1596: Consolidate sort and external_sort, consolidate N-way merge sort

2022-01-18 Thread GitBox
yjshen edited a comment on pull request #1596: URL: https://github.com/apache/arrow-datafusion/pull/1596#issuecomment-1015308440 The logic for ExternalSort: 1. get batch from input, sort it, and buffer it in memory 2. when memory threshold meet, do "N-way merge" and spill the results

[GitHub] [arrow] kszucs commented on pull request #12177: ARROW-15323: [CI] Nightly spark integration builds are failing

2022-01-18 Thread GitBox
kszucs commented on pull request #12177: URL: https://github.com/apache/arrow/pull/12177#issuecomment-1015311048 @github-actions crossbow submit *spark* -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow] github-actions[bot] commented on pull request #12177: ARROW-15323: [CI] Nightly spark integration builds are failing

2022-01-18 Thread GitBox
github-actions[bot] commented on pull request #12177: URL: https://github.com/apache/arrow/pull/12177#issuecomment-1015311205 https://issues.apache.org/jira/browse/ARROW-15323 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

[GitHub] [arrow] github-actions[bot] commented on pull request #12177: ARROW-15323: [CI] Nightly spark integration builds are failing

2022-01-18 Thread GitBox
github-actions[bot] commented on pull request #12177: URL: https://github.com/apache/arrow/pull/12177#issuecomment-1015313897 Revision: eab972fad8d48774729f83527d6f34902fb662bb Submitted crossbow builds: [ursacomputing/crossbow @ actions-1398](https://github.com/ursacomputing/crossbo

[GitHub] [arrow-datafusion] yjshen edited a comment on pull request #1596: Consolidate sort and external_sort, consolidate N-way merge sort

2022-01-18 Thread GitBox
yjshen edited a comment on pull request #1596: URL: https://github.com/apache/arrow-datafusion/pull/1596#issuecomment-1015308440 The logic for ExternalSort: 1. get batch from input, sort it, and buffer it in memory 2. when memory threshold meet, do "N-way merge" and spill the results

[GitHub] [arrow-datafusion] xudong963 edited a comment on issue #1586: datafusion doesn't process predicate pushdown correctly when there is outer join

2022-01-18 Thread GitBox
xudong963 edited a comment on issue #1586: URL: https://github.com/apache/arrow-datafusion/issues/1586#issuecomment-1014726665 > I can try this one if no one is working on it yet. I believe no one is working on it, please do it! Thanks, @james727 If you have any problems, please fee

[GitHub] [arrow-rs] alamb commented on issue #1191: Parquet Scan Filter

2022-01-18 Thread GitBox
alamb commented on issue #1191: URL: https://github.com/apache/arrow-rs/issues/1191#issuecomment-1015324441 > This would allow IOx, or potentially DataFusion depending on where the logic for this eventually sits, to do the following for pushing down predicates, in addition to the current r

[GitHub] [arrow-rs] nevi-me commented on issue #1191: Parquet Scan Filter

2022-01-18 Thread GitBox
nevi-me commented on issue #1191: URL: https://github.com/apache/arrow-rs/issues/1191#issuecomment-1015330895 In the Impala implementation, there was negligible impact on unsorted/random data https://blog.cloudera.com/speeding-up-select-queries-with-parquet-page-indexes/. If parquet-rs ca

[GitHub] [arrow-datafusion] alamb commented on issue #1273: Question: Is the Ballista project providing value to the overall DataFusion project?

2022-01-18 Thread GitBox
alamb commented on issue #1273: URL: https://github.com/apache/arrow-datafusion/issues/1273#issuecomment-1015344104 > > @alamb Actually I'm quite curious on the point of datafusion not being used standalone. > > On my side, my plan was to use datafusion (likely via the Python binding

[GitHub] [arrow-rs] e-dard opened a new pull request #1196: feat: add support for casting Duration/Interval to Int64Array

2022-01-18 Thread GitBox
e-dard opened a new pull request #1196: URL: https://github.com/apache/arrow-rs/pull/1196 # Which issue does this PR close? Closes #685. # What changes are included in this PR? This PR adds support for casting from all the `Duration` types to `Int64A

[GitHub] [arrow-rs] alamb commented on pull request #1082: parquet: Optimized ByteArrayReader, Add UTF-8 Validation (#1040)

2022-01-18 Thread GitBox
alamb commented on pull request #1082: URL: https://github.com/apache/arrow-rs/pull/1082#issuecomment-1015349375 I also ran the tests from the latest master branch of datafusion against this branch and they all passed. Not that it is the most thorough coverage of the parquet format, but it

[GitHub] [arrow-rs] alamb closed issue #1186: Parquet reader should be able to read structs within list

2022-01-18 Thread GitBox
alamb closed issue #1186: URL: https://github.com/apache/arrow-rs/issues/1186 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@a

[GitHub] [arrow-rs] alamb merged pull request #1187: feat(parquet): support for reading structs nested within lists

2022-01-18 Thread GitBox
alamb merged pull request #1187: URL: https://github.com/apache/arrow-rs/pull/1187 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[GitHub] [arrow-rs] alamb commented on pull request #1187: feat(parquet): support for reading structs nested within lists

2022-01-18 Thread GitBox
alamb commented on pull request #1187: URL: https://github.com/apache/arrow-rs/pull/1187#issuecomment-1015349868 Thanks again @helgikrs -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

[GitHub] [arrow-rs] tustvold commented on issue #1191: Parquet Scan Filter

2022-01-18 Thread GitBox
tustvold commented on issue #1191: URL: https://github.com/apache/arrow-rs/issues/1191#issuecomment-1015350353 > I think it would be best to implement in DataFusion if at all possible Agreed, I was somewhat hedging here :laughing: > In the Impala implementation, there was neg

[GitHub] [arrow-rs] alamb opened a new issue #1197: Remove `ArrowArrayReader` in parquet implementation

2022-01-18 Thread GitBox
alamb opened a new issue #1197: URL: https://github.com/apache/arrow-rs/issues/1197 https://github.com/apache/arrow-rs/pull/1082 removed the use of `ArrowArrayReader` in favor of a different approach; This ticket tracks removing `ArrowArrayReader` added in #384? . @yordan-pavlov rep

[GitHub] [arrow-rs] alamb commented on pull request #1082: parquet: Optimized ByteArrayReader, Add UTF-8 Validation (#1040)

2022-01-18 Thread GitBox
alamb commented on pull request #1082: URL: https://github.com/apache/arrow-rs/pull/1082#issuecomment-1015352358 https://github.com/apache/arrow-rs/issues/1197 tracks ArrowArrayReader removal -- This is an automated message from the Apache Git Service. To respond to the message, please l

[GitHub] [arrow-rs] alamb merged pull request #1082: parquet: Optimized ByteArrayReader, Add UTF-8 Validation (#1040)

2022-01-18 Thread GitBox
alamb merged pull request #1082: URL: https://github.com/apache/arrow-rs/pull/1082 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[GitHub] [arrow-rs] alamb closed issue #786: Soundness: reading parquet with invalid utf8 results in UB

2022-01-18 Thread GitBox
alamb closed issue #786: URL: https://github.com/apache/arrow-rs/issues/786 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arr

[GitHub] [arrow] ursabot edited a comment on pull request #12169: MINOR: [Docs] Fix tabs usage in building.rst

2022-01-18 Thread GitBox
ursabot edited a comment on pull request #12169: URL: https://github.com/apache/arrow/pull/12169#issuecomment-1015185569 Benchmark runs are scheduled for baseline = cec5a178e101e101d678776021d4469ec5f4947c and contender = 3e7a9f3e97f4cd847887703da039e88a608780c7. 3e7a9f3e97f4cd847887703da

[GitHub] [arrow-rs] alamb commented on issue #1191: Parquet Scan Filter

2022-01-18 Thread GitBox
alamb commented on issue #1191: URL: https://github.com/apache/arrow-rs/issues/1191#issuecomment-1015358947 > I think the predicate evaluation would best live in parquet as it can get complex for some pages. So datafusion and other processing engine implementing the logic on their own woul

[GitHub] [arrow-datafusion] alamb commented on pull request #1582: remove update and merge from accumulator

2022-01-18 Thread GitBox
alamb commented on pull request #1582: URL: https://github.com/apache/arrow-datafusion/pull/1582#issuecomment-1015359458 Thanks @Jimexist -- this is a great piece of work -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow-datafusion] liukun4515 opened a new pull request #1603: add test for decimal to decimal

2022-01-18 Thread GitBox
liukun4515 opened a new pull request #1603: URL: https://github.com/apache/arrow-datafusion/pull/1603 # Which issue does this PR close? - add test for decimal to decimal - move decimal test together part of #1443 # Rationale for this change # What cha

[GitHub] [arrow-datafusion] liukun4515 commented on a change in pull request #1603: add test for decimal to decimal

2022-01-18 Thread GitBox
liukun4515 commented on a change in pull request #1603: URL: https://github.com/apache/arrow-datafusion/pull/1603#discussion_r786701728 ## File path: datafusion/src/physical_plan/expressions/cast.rs ## @@ -269,6 +269,168 @@ mod tests { }}; } +fn create_decim

[GitHub] [arrow-datafusion] liukun4515 commented on a change in pull request #1603: add test for decimal to decimal

2022-01-18 Thread GitBox
liukun4515 commented on a change in pull request #1603: URL: https://github.com/apache/arrow-datafusion/pull/1603#discussion_r786702031 ## File path: datafusion/src/physical_plan/expressions/try_cast.rs ## @@ -227,6 +227,155 @@ mod tests { }}; } +#[test] +

[GitHub] [arrow-datafusion] liukun4515 commented on a change in pull request #1603: add test for decimal to decimal

2022-01-18 Thread GitBox
liukun4515 commented on a change in pull request #1603: URL: https://github.com/apache/arrow-datafusion/pull/1603#discussion_r786702227 ## File path: datafusion/src/physical_plan/expressions/try_cast.rs ## @@ -227,6 +227,155 @@ mod tests { }}; } +#[test] +

[GitHub] [arrow-datafusion] liukun4515 commented on a change in pull request #1603: add test for decimal to decimal

2022-01-18 Thread GitBox
liukun4515 commented on a change in pull request #1603: URL: https://github.com/apache/arrow-datafusion/pull/1603#discussion_r786702536 ## File path: datafusion/src/physical_plan/expressions/cast.rs ## @@ -269,6 +269,168 @@ mod tests { }}; } +fn create_decim

[GitHub] [arrow] dhruv9vats commented on a change in pull request #12162: ARROW-15089: [C++][Compute] Implement kernel to lookup a MapArray item for a given key

2022-01-18 Thread GitBox
dhruv9vats commented on a change in pull request #12162: URL: https://github.com/apache/arrow/pull/12162#discussion_r786706752 ## File path: cpp/src/arrow/compute/api_scalar.cc ## @@ -254,6 +254,26 @@ struct EnumTraits } }; +template <> +struct EnumTraits +: BasicEnum

[GitHub] [arrow] dhruv9vats commented on a change in pull request #12162: ARROW-15089: [C++][Compute] Implement kernel to lookup a MapArray item for a given key

2022-01-18 Thread GitBox
dhruv9vats commented on a change in pull request #12162: URL: https://github.com/apache/arrow/pull/12162#discussion_r786708740 ## File path: cpp/src/arrow/compute/kernels/scalar_nested_test.cc ## @@ -225,6 +225,56 @@ TEST(TestScalarNested, StructField) { } } +TEST(TestSca

[GitHub] [arrow] dhruv9vats commented on a change in pull request #12162: ARROW-15089: [C++][Compute] Implement kernel to lookup a MapArray item for a given key

2022-01-18 Thread GitBox
dhruv9vats commented on a change in pull request #12162: URL: https://github.com/apache/arrow/pull/12162#discussion_r786716020 ## File path: cpp/src/arrow/compute/kernels/scalar_nested_test.cc ## @@ -225,6 +225,56 @@ TEST(TestScalarNested, StructField) { } } +TEST(TestSca

[GitHub] [arrow] dhruv9vats commented on a change in pull request #12162: ARROW-15089: [C++][Compute] Implement kernel to lookup a MapArray item for a given key

2022-01-18 Thread GitBox
dhruv9vats commented on a change in pull request #12162: URL: https://github.com/apache/arrow/pull/12162#discussion_r786708740 ## File path: cpp/src/arrow/compute/kernels/scalar_nested_test.cc ## @@ -225,6 +225,56 @@ TEST(TestScalarNested, StructField) { } } +TEST(TestSca

[GitHub] [arrow] pitrou commented on a change in pull request #12105: ARROW-14098: [C++] subtract(time, time) -> duration kernel

2022-01-18 Thread GitBox
pitrou commented on a change in pull request #12105: URL: https://github.com/apache/arrow/pull/12105#discussion_r786724844 ## File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc ## @@ -2428,6 +2435,20 @@ void RegisterScalarArithmetic(FunctionRegistry* registry) {

[GitHub] [arrow-rs] codecov-commenter commented on pull request #1192: Update parquet crate readme

2022-01-18 Thread GitBox
codecov-commenter commented on pull request #1192: URL: https://github.com/apache/arrow-rs/pull/1192#issuecomment-1015385265 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1192?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=T

[GitHub] [arrow-rs] tustvold commented on issue #1191: Parquet Scan Filter

2022-01-18 Thread GitBox
tustvold commented on issue #1191: URL: https://github.com/apache/arrow-rs/issues/1191#issuecomment-1015385787 Shall I create a separate ticket in that case for directly evaluating predicates against encoded data, I think the two problems are separable? -- This is an automated message fr

[GitHub] [arrow] pitrou commented on pull request #9702: ARROW-11297: [C++][Python] Add ORC writer options

2022-01-18 Thread GitBox
pitrou commented on pull request #9702: URL: https://github.com/apache/arrow/pull/9702#issuecomment-1015391525 > Oops I was using my Chloe DB / Chloe QL account again.. No worries :-) -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [arrow] AlenkaF opened a new pull request #12178: ARROW-9664: [Python] Array/ChunkedArray.to_pandas do not support types_mapper keyword

2022-01-18 Thread GitBox
AlenkaF opened a new pull request #12178: URL: https://github.com/apache/arrow/pull/12178 This PR tires to add `types_mapper` argument to Array and ChunkedArray `to_pandas` method. To be used like in `Table.to_pandas()` where `types_mapper` needs to be a function or a dictionary mapping (`

[GitHub] [arrow] github-actions[bot] commented on pull request #12178: ARROW-9664: [Python] Array/ChunkedArray.to_pandas do not support types_mapper keyword

2022-01-18 Thread GitBox
github-actions[bot] commented on pull request #12178: URL: https://github.com/apache/arrow/pull/12178#issuecomment-1015397639 https://issues.apache.org/jira/browse/ARROW-9664 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [arrow] ursabot edited a comment on pull request #12169: MINOR: [Docs] Fix tabs usage in building.rst

2022-01-18 Thread GitBox
ursabot edited a comment on pull request #12169: URL: https://github.com/apache/arrow/pull/12169#issuecomment-1015185569 Benchmark runs are scheduled for baseline = cec5a178e101e101d678776021d4469ec5f4947c and contender = 3e7a9f3e97f4cd847887703da039e88a608780c7. 3e7a9f3e97f4cd847887703da

[GitHub] [arrow] dragosmg opened a new pull request #12179: ARROW-14609 [R] left_join by argument error message mismatch

2022-01-18 Thread GitBox
dragosmg opened a new pull request #12179: URL: https://github.com/apache/arrow/pull/12179 This PR makes {arrow} join error messages triggered by wrong column specification in `by` closer to the {dplyr} ones ``` # dplyr error message > left_join(iris, iris, by = "made_up_colname")

[GitHub] [arrow] github-actions[bot] commented on pull request #12179: ARROW-14609 [R] left_join by argument error message mismatch

2022-01-18 Thread GitBox
github-actions[bot] commented on pull request #12179: URL: https://github.com/apache/arrow/pull/12179#issuecomment-1015412111 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [arrow] pitrou commented on a change in pull request #9702: ARROW-11297: [C++][Python] Add ORC writer options

2022-01-18 Thread GitBox
pitrou commented on a change in pull request #9702: URL: https://github.com/apache/arrow/pull/9702#discussion_r786735652 ## File path: cpp/src/arrow/adapters/orc/adapter.cc ## @@ -628,41 +733,86 @@ class ArrowOutputStream : public liborc::OutputStream { int64_t length_; };

[GitHub] [arrow-rs] alamb opened a new issue #1198: Broken CI SIMD and wasm checks on master

2022-01-18 Thread GitBox
alamb opened a new issue #1198: URL: https://github.com/apache/arrow-rs/issues/1198 The CI wasm and SIMD tests are now failing on master: https://github.com/apache/arrow-rs/runs/4852992130?check_suite_focus=true https://github.com/apache/arrow-rs/runs/4852992553?check_suite_focus=t

[GitHub] [arrow] dhruv9vats commented on a change in pull request #12162: ARROW-15089: [C++][Compute] Implement kernel to lookup a MapArray item for a given key

2022-01-18 Thread GitBox
dhruv9vats commented on a change in pull request #12162: URL: https://github.com/apache/arrow/pull/12162#discussion_r786716020 ## File path: cpp/src/arrow/compute/kernels/scalar_nested_test.cc ## @@ -225,6 +225,56 @@ TEST(TestScalarNested, StructField) { } } +TEST(TestSca

[GitHub] [arrow-rs] alamb opened a new pull request #1199: Revert "update nightly version for miri (#1189)"

2022-01-18 Thread GitBox
alamb opened a new pull request #1199: URL: https://github.com/apache/arrow-rs/pull/1199 # Which issue does this PR close? Closes https://github.com/apache/arrow-rs/issues/1198 # Rationale for this change Master CI checks seem to be broken # What changes

[GitHub] [arrow] dhruv9vats commented on a change in pull request #12162: ARROW-15089: [C++][Compute] Implement kernel to lookup a MapArray item for a given key

2022-01-18 Thread GitBox
dhruv9vats commented on a change in pull request #12162: URL: https://github.com/apache/arrow/pull/12162#discussion_r786716020 ## File path: cpp/src/arrow/compute/kernels/scalar_nested_test.cc ## @@ -225,6 +225,56 @@ TEST(TestScalarNested, StructField) { } } +TEST(TestSca

[GitHub] [arrow-rs] alamb commented on pull request #1189: update nightly version for miri

2022-01-18 Thread GitBox
alamb commented on pull request #1189: URL: https://github.com/apache/arrow-rs/pull/1189#issuecomment-1015421681 PR to revert: https://github.com/apache/arrow-rs/pull/1199 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[GitHub] [arrow-rs] alamb commented on pull request #1195: Add comparison support for fully qualified BinaryArray

2022-01-18 Thread GitBox
alamb commented on pull request #1195: URL: https://github.com/apache/arrow-rs/pull/1195#issuecomment-1015422203 Hi @HaoYang670 and @gangliao - sorry for the CI failures. I believe they will be fixed one https://github.com/apache/arrow-rs/pull/1199 is merged -- This is an automated me

[GitHub] [arrow] iajoiner commented on a change in pull request #9702: ARROW-11297: [C++][Python] Add ORC writer options

2022-01-18 Thread GitBox
iajoiner commented on a change in pull request #9702: URL: https://github.com/apache/arrow/pull/9702#discussion_r786768417 ## File path: cpp/src/arrow/adapters/orc/adapter.cc ## @@ -628,41 +733,86 @@ class ArrowOutputStream : public liborc::OutputStream { int64_t length_; }

[GitHub] [arrow] iajoiner commented on a change in pull request #9702: ARROW-11297: [C++][Python] Add ORC writer options

2022-01-18 Thread GitBox
iajoiner commented on a change in pull request #9702: URL: https://github.com/apache/arrow/pull/9702#discussion_r786769958 ## File path: python/pyarrow/_orc.pyx ## @@ -36,7 +36,233 @@ from pyarrow.lib cimport (check_status, _Weakrefable, pyarrow_unwra

[GitHub] [arrow-rs] alamb commented on issue #1191: Parquet Scan Filter

2022-01-18 Thread GitBox
alamb commented on issue #1191: URL: https://github.com/apache/arrow-rs/issues/1191#issuecomment-1015423740 > Shall I create a separate ticket in that case for directly evaluating predicates against encoded data, I think the two problems are separable? I probably don't fully unde

[GitHub] [arrow] pitrou commented on a change in pull request #9702: ARROW-11297: [C++][Python] Add ORC writer options

2022-01-18 Thread GitBox
pitrou commented on a change in pull request #9702: URL: https://github.com/apache/arrow/pull/9702#discussion_r786770655 ## File path: python/pyarrow/_orc.pyx ## @@ -36,7 +36,233 @@ from pyarrow.lib cimport (check_status, _Weakrefable, pyarrow_unwrap_

[GitHub] [arrow-datafusion] alamb merged pull request #1554: support mathematics operation for decimal data type

2022-01-18 Thread GitBox
alamb merged pull request #1554: URL: https://github.com/apache/arrow-datafusion/pull/1554 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-

  1   2   3   4   5   >