[GitHub] [arrow-rs] Ted-Jiang commented on a diff in pull request #1791: feat:Implement page filtering with Row Alignment

2022-06-05 Thread GitBox
Ted-Jiang commented on code in PR #1791: URL: https://github.com/apache/arrow-rs/pull/1791#discussion_r889867688 ## parquet/src/file/metadata.rs: ## @@ -223,6 +223,7 @@ pub struct RowGroupMetaData { num_rows: i64, total_byte_size: i64, schema_descr: SchemaDescPtr,

[GitHub] [arrow-rs] Ted-Jiang commented on a diff in pull request #1791: feat:Implement page filtering with Row Alignment

2022-06-05 Thread GitBox
Ted-Jiang commented on code in PR #1791: URL: https://github.com/apache/arrow-rs/pull/1791#discussion_r889861992 ## parquet/src/file/page_index/range.rs: ## @@ -0,0 +1,472 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreement

[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #2702: Make sure that the data types are supported in hashjoin before genera…

2022-06-05 Thread GitBox
codecov-commenter commented on PR #2702: URL: https://github.com/apache/arrow-datafusion/pull/2702#issuecomment-1147067806 # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/2702?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_

[GitHub] [arrow] github-actions[bot] commented on pull request #13157: ARROW-16584: [Java] Java JNI with S3 support

2022-06-05 Thread GitBox
github-actions[bot] commented on PR #13157: URL: https://github.com/apache/arrow/pull/13157#issuecomment-1147061211 Revision: debbd3e4a290e4cbe9975b46ef8a5839e29a81b1 Submitted crossbow builds: [ursacomputing/crossbow @ actions-5dfae3fa6a](https://github.com/ursacomputing/crossbow/bra

[GitHub] [arrow] kou commented on pull request #13157: ARROW-16584: [Java] Java JNI with S3 support

2022-06-05 Thread GitBox
kou commented on PR #13157: URL: https://github.com/apache/arrow/pull/13157#issuecomment-1147060454 @github-actions crossbow submit java-jars -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [arrow] kou commented on pull request #13157: ARROW-16584: [Java] Java JNI with S3 support

2022-06-05 Thread GitBox
kou commented on PR #13157: URL: https://github.com/apache/arrow/pull/13157#issuecomment-1147060381 No problem. :-) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

[GitHub] [arrow-rs] Ted-Jiang commented on a diff in pull request #1791: feat:Implement page filtering with Row Alignment

2022-06-05 Thread GitBox
Ted-Jiang commented on code in PR #1791: URL: https://github.com/apache/arrow-rs/pull/1791#discussion_r889827232 ## parquet/src/file/page_index/range.rs: ## @@ -0,0 +1,472 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreement

[GitHub] [arrow] kou commented on pull request #13292: ARROW-16721: [C++] Drop support for bundled Thrift < 0.13

2022-06-05 Thread GitBox
kou commented on PR #13292: URL: https://github.com/apache/arrow/pull/13292#issuecomment-1147046948 I've fixed R failures but new R failures are happen. The new failures are also happen on master: https://github.com/apache/arrow/runs/6747296145?check_suite_focus=true It seems that du

[GitHub] [arrow-rs] Ted-Jiang commented on a diff in pull request #1791: feat:Implement page filtering with Row Alignment

2022-06-05 Thread GitBox
Ted-Jiang commented on code in PR #1791: URL: https://github.com/apache/arrow-rs/pull/1791#discussion_r889821082 ## parquet/src/file/page_index/range.rs: ## @@ -0,0 +1,472 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreement

[GitHub] [arrow-rs] Ted-Jiang commented on a diff in pull request #1791: feat:Implement page filtering with Row Alignment

2022-06-05 Thread GitBox
Ted-Jiang commented on code in PR #1791: URL: https://github.com/apache/arrow-rs/pull/1791#discussion_r889820667 ## parquet/src/file/page_index/range.rs: ## @@ -0,0 +1,472 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreement

[GitHub] [arrow] REASY commented on pull request #13157: ARROW-16584: [Java] Java JNI with S3 support

2022-06-05 Thread GitBox
REASY commented on PR #13157: URL: https://github.com/apache/arrow/pull/13157#issuecomment-1147041598 @kou seems like I messed up with rebase/force-push, I can see you had to fix the errors again, sorry. -- This is an automated message from the Apache Git Service. To respond to the messag

[GitHub] [arrow-rs] codecov-commenter commented on pull request #1798: add parquet-fromcsv (#1)

2022-06-05 Thread GitBox
codecov-commenter commented on PR #1798: URL: https://github.com/apache/arrow-rs/pull/1798#issuecomment-1147026395 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1798?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+S

[GitHub] [arrow-rs] kazuk opened a new pull request, #1798: add parquet-fromcsv (#1)

2022-06-05 Thread GitBox
kazuk opened a new pull request, #1798: URL: https://github.com/apache/arrow-rs/pull/1798 add command line tool for convert csv to parquet. # Which issue does this PR close? Closes #1797 . # Rationale for this change # What changes are included in

[GitHub] [arrow-rs] kazuk opened a new issue, #1797: Command line tool for convert CSV to Parquet

2022-06-05 Thread GitBox
kazuk opened a new issue, #1797: URL: https://github.com/apache/arrow-rs/issues/1797 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

[GitHub] [arrow] cyb70289 commented on a diff in pull request #13244: ARROW-12626: [C++] Support toolchain xsimd, update toolchain version to version 8.1.0

2022-06-05 Thread GitBox
cyb70289 commented on code in PR #13244: URL: https://github.com/apache/arrow/pull/13244#discussion_r889794635 ## cpp/cmake_modules/ThirdpartyToolchain.cmake: ## @@ -2234,16 +2234,22 @@ if((NOT ARROW_SIMD_LEVEL STREQUAL "NONE") OR (NOT ARROW_RUNTIME_SIMD_LEVEL STREQ else()

[GitHub] [arrow-datafusion] AssHero commented on issue #2145: Join on path partitioned columns fails with error

2022-06-05 Thread GitBox
AssHero commented on issue #2145: URL: https://github.com/apache/arrow-datafusion/issues/2145#issuecomment-1146984550 I think we can check whether the data types are supported in hash join or not. If not supported, we can use cross join instead, and we can support more data types later in

[GitHub] [arrow] kou merged pull request #13307: ARROW-16745: [Packaging][RPM] Add support for AlmaLinux 9

2022-06-05 Thread GitBox
kou merged PR #13307: URL: https://github.com/apache/arrow/pull/13307 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

[GitHub] [arrow] kou commented on pull request #13307: ARROW-16745: [Packaging][RPM] Add support for AlmaLinux 9

2022-06-05 Thread GitBox
kou commented on PR #13307: URL: https://github.com/apache/arrow/pull/13307#issuecomment-1146983525 +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mai

[GitHub] [arrow-datafusion] AssHero opened a new pull request, #2702: Make sure that the data types are supported in hashjoin before genera…

2022-06-05 Thread GitBox
AssHero opened a new pull request, #2702: URL: https://github.com/apache/arrow-datafusion/pull/2702 # Which issue does this PR close? Closes #2145 # Rationale for this change Before generating the hash join logical plan, make sure the data types in equal conditions are sup

[GitHub] [arrow] kou commented on pull request #13244: ARROW-12626: [C++] Support toolchain xsimd, update toolchain version to version 8.1.0

2022-06-05 Thread GitBox
kou commented on PR #13244: URL: https://github.com/apache/arrow/pull/13244#issuecomment-1146974625 > (Can we build a Docker image in the CI job when we don't have `ghcr.io/ursacomputing/arrow:python-*` image?) In other words, why we specify `--no-build` option for Windows jobs?

[GitHub] [arrow] kou commented on pull request #13244: ARROW-12626: [C++] Support toolchain xsimd, update toolchain version to version 8.1.0

2022-06-05 Thread GitBox
kou commented on PR #13244: URL: https://github.com/apache/arrow/pull/13244#issuecomment-1146973126 @kszucs I want to update vcpkg's version but it causes `wheel-windows-*` jobs failures. e.g.: https://github.com/ursacomputing/crossbow/runs/6719159641?check_suite_focus=true#step:6:13

[GitHub] [arrow] github-actions[bot] commented on pull request #13157: ARROW-16584: [Java] Java JNI with S3 support

2022-06-05 Thread GitBox
github-actions[bot] commented on PR #13157: URL: https://github.com/apache/arrow/pull/13157#issuecomment-1146971134 Revision: 6ba22ed36390cafae66906b1e9e6c5090c9e1896 Submitted crossbow builds: [ursacomputing/crossbow @ actions-97b278922c](https://github.com/ursacomputing/crossbow/bra

[GitHub] [arrow] zhangyingmath opened a new issue, #13316: ValueError: assignment destination is read-only after converting pyarrow table to pandas with split_blocks=True

2022-06-05 Thread GitBox
zhangyingmath opened a new issue, #13316: URL: https://github.com/apache/arrow/issues/13316 ```tbl = pa.Table.from_arrays([pa.array([1,2]), pa.array([1.0, 2.0])], names=['f', 'g']) df1 = tbl.to_pandas(split_blocks=True) df1.loc[0, 'f'] = 100 -

[GitHub] [arrow] kou commented on pull request #13157: ARROW-16584: [Java] Java JNI with S3 support

2022-06-05 Thread GitBox
kou commented on PR #13157: URL: https://github.com/apache/arrow/pull/13157#issuecomment-1146970433 @github-actions crossbow submit java-jars -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [arrow] kou commented on a diff in pull request #13244: ARROW-12626: [C++] Support toolchain xsimd, update toolchain version to version 8.1.0

2022-06-05 Thread GitBox
kou commented on code in PR #13244: URL: https://github.com/apache/arrow/pull/13244#discussion_r889780669 ## cpp/cmake_modules/ThirdpartyToolchain.cmake: ## @@ -2234,16 +2234,22 @@ if((NOT ARROW_SIMD_LEVEL STREQUAL "NONE") OR (NOT ARROW_RUNTIME_SIMD_LEVEL STREQ else() set(A

[GitHub] [arrow] djnavarro commented on pull request #12154: ARROW-14821: [R] Implement bindings for lubridate's floor_date, ceiling_date, and round_date

2022-06-05 Thread GitBox
djnavarro commented on PR #12154: URL: https://github.com/apache/arrow/pull/12154#issuecomment-1146910405 @rok Awesome! I'll rebase and update the lubridate bindings asap. Yay! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [arrow] nealrichardson merged pull request #13312: MINOR: [R] Drop opensuse42 build and update opensuse15

2022-06-05 Thread GitBox
nealrichardson merged PR #13312: URL: https://github.com/apache/arrow/pull/13312 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

[GitHub] [arrow-rs] codecov-commenter commented on pull request #1796: Use IPC row count info in IPC reader

2022-06-05 Thread GitBox
codecov-commenter commented on PR #1796: URL: https://github.com/apache/arrow-rs/pull/1796#issuecomment-1146881007 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1796?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+S

[GitHub] [arrow-rs] viirya opened a new pull request, #1796: Use IPC row count info in IPC reader

2022-06-05 Thread GitBox
viirya opened a new pull request, #1796: URL: https://github.com/apache/arrow-rs/pull/1796 # Which issue does this PR close? Closes #1783. # Rationale for this change # What changes are included in this PR? # Are there any user-facing chan

[GitHub] [arrow-ballista] andygrove commented on pull request #59: [Draft] Support for multi-scheduler deployments

2022-06-05 Thread GitBox
andygrove commented on PR #59: URL: https://github.com/apache/arrow-ballista/pull/59#issuecomment-1146874637 I ran the integration tests (`./dev/integration-tests.sh`) and they ran without issue, so that gives me confidence that no regressions are introduced here. -- This is an automated

[GitHub] [arrow-ballista] andygrove commented on a diff in pull request #59: [Draft] Support for multi-scheduler deployments

2022-06-05 Thread GitBox
andygrove commented on code in PR #59: URL: https://github.com/apache/arrow-ballista/pull/59#discussion_r889731148 ## ballista/rust/core/proto/ballista.proto: ## @@ -622,6 +622,37 @@ enum JoinSide{ ///

[GitHub] [arrow-rs] tustvold commented on issue #1783: `flight_data_to_arrow_batch` does not support `RecordBatch`es with no columns

2022-06-05 Thread GitBox
tustvold commented on issue #1783: URL: https://github.com/apache/arrow-rs/issues/1783#issuecomment-1146873675 Thanks for the report, I'll bash this one out before the next release unless someone else picks it up 😀 -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow-ballista] thinkharderdev commented on a diff in pull request #59: [Draft] Support for multi-scheduler deployments

2022-06-05 Thread GitBox
thinkharderdev commented on code in PR #59: URL: https://github.com/apache/arrow-ballista/pull/59#discussion_r889722012 ## ballista/rust/core/proto/ballista.proto: ## @@ -622,6 +622,37 @@ enum JoinSide{ //

[GitHub] [arrow-ballista] andygrove commented on a diff in pull request #59: [Draft] Support for multi-scheduler deployments

2022-06-05 Thread GitBox
andygrove commented on code in PR #59: URL: https://github.com/apache/arrow-ballista/pull/59#discussion_r889718557 ## ballista/rust/core/proto/ballista.proto: ## @@ -622,6 +622,37 @@ enum JoinSide{ ///

[GitHub] [arrow-rs] datapythonista opened a new pull request, #1795: Fix typos in the Memory and Buffers section of the docs home

2022-06-05 Thread GitBox
datapythonista opened a new pull request, #1795: URL: https://github.com/apache/arrow-rs/pull/1795 # Which issue does this PR close? None # Rationale for this change Just fixing couple of minor typos in https://docs.rs/arrow/latest/arrow/index.html#memory-and-buffers

[GitHub] [arrow-rs] codecov-commenter commented on pull request #1794: Write validity buffer for UnionArray in V4 IPC message

2022-06-05 Thread GitBox
codecov-commenter commented on PR #1794: URL: https://github.com/apache/arrow-rs/pull/1794#issuecomment-1146855155 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1794?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+S

[GitHub] [arrow-rs] viirya opened a new pull request, #1794: Write validity buffer for UnionArray in V4 IPC message

2022-06-05 Thread GitBox
viirya opened a new pull request, #1794: URL: https://github.com/apache/arrow-rs/pull/1794 # Which issue does this PR close? Closes #1793. # Rationale for this change # What changes are included in this PR? # Are there any user-facing chan

[GitHub] [arrow-rs] viirya opened a new issue, #1793: IPC writer should write validity buffer for UnionArray in V4 IPC message

2022-06-05 Thread GitBox
viirya opened a new issue, #1793: URL: https://github.com/apache/arrow-rs/issues/1793 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** In V4, only null types have no validity bitmap. So in V4 message, IPC writer should still wr

[GitHub] [arrow-ballista] thinkharderdev commented on pull request #59: [Draft] Support for multi-scheduler deployments

2022-06-05 Thread GitBox
thinkharderdev commented on PR #59: URL: https://github.com/apache/arrow-ballista/pull/59#issuecomment-1146851519 cc @andygrove @yahoNanJing @Ted-Jiang -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow-ballista] thinkharderdev opened a new pull request, #59: [Draft] Support for multi-scheduler deployments

2022-06-05 Thread GitBox
thinkharderdev opened a new pull request, #59: URL: https://github.com/apache/arrow-ballista/pull/59 # Which issue does this PR close? Closes #39 Posting this draft PR for review and feedback but there are some more TODO items still in progress (mostly around cleanup a

[GitHub] [arrow] github-actions[bot] commented on pull request #13307: ARROW-16745: [Packaging][RPM] Add support for AlmaLinux 9

2022-06-05 Thread GitBox
github-actions[bot] commented on PR #13307: URL: https://github.com/apache/arrow/pull/13307#issuecomment-1146842582 Revision: e94f7ecde6285576b98c47991f54a50d8237bc0d Submitted crossbow builds: [ursacomputing/crossbow @ actions-52476e28be](https://github.com/ursacomputing/crossbow/bra

[GitHub] [arrow] kou commented on pull request #13307: ARROW-16745: [Packaging][RPM] Add support for AlmaLinux 9

2022-06-05 Thread GitBox
kou commented on PR #13307: URL: https://github.com/apache/arrow/pull/13307#issuecomment-1146842451 @github-actions crossbow submit almalinux-* amazon-linux-* centos-* -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[GitHub] [arrow-datafusion] waynexia commented on pull request #2648: Support logical plan compilation

2022-06-05 Thread GitBox
waynexia commented on PR #2648: URL: https://github.com/apache/arrow-datafusion/pull/2648#issuecomment-1146836673 >`datafusion-contrib` is not under Apache governance so there is more freedom there to move fast when prototyping. You can merge your own PRs for example while you are the only

[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #2701: Move DefaultTableSource to datasource module

2022-06-05 Thread GitBox
codecov-commenter commented on PR #2701: URL: https://github.com/apache/arrow-datafusion/pull/2701#issuecomment-1146825389 # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/2701?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_

[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #2700: MINOR: Move some tests from core to expr

2022-06-05 Thread GitBox
codecov-commenter commented on PR #2700: URL: https://github.com/apache/arrow-datafusion/pull/2700#issuecomment-1146817032 # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/2700?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_

[GitHub] [arrow-datafusion] andygrove commented on issue #2683: Remove some of the re-exports within the core crate

2022-06-05 Thread GitBox
andygrove commented on issue #2683: URL: https://github.com/apache/arrow-datafusion/issues/2683#issuecomment-1146816350 I now plan on working on this after the 9.0.0 release -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow-datafusion] andygrove opened a new pull request, #2701: MINOR: Move DefaultTableSource to datasource module

2022-06-05 Thread GitBox
andygrove opened a new pull request, #2701: URL: https://github.com/apache/arrow-datafusion/pull/2701 # Which issue does this PR close? N/A # Rationale for this change The `core/logical_plan` module is now redundant and mostly consists of re-exports. `Defaul

[GitHub] [arrow-ballista] andygrove commented on pull request #58: Add ballista python module

2022-06-05 Thread GitBox
andygrove commented on PR #58: URL: https://github.com/apache/arrow-ballista/pull/58#issuecomment-1146815286 @thinkharderdev @yahoNanJing @realno @Ted-Jiang fyi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [arrow-datafusion] andygrove opened a new pull request, #2700: MINOR: Move some tests from core to expr

2022-06-05 Thread GitBox
andygrove opened a new pull request, #2700: URL: https://github.com/apache/arrow-datafusion/pull/2700 # Which issue does this PR close? N/A # Rationale for this change Some tests got left behind during the recent refactoring # What changes are included

[GitHub] [arrow-rs] datapythonista commented on issue #1613: Improve arrow-rs examples

2022-06-05 Thread GitBox
datapythonista commented on issue #1613: URL: https://github.com/apache/arrow-rs/issues/1613#issuecomment-1146809855 I'm planning to work on this. What I'd personally do, is to have many small examples of increasing complexity. So, besides examples and recipes, it can be used as a tutorial,

[GitHub] [arrow] cyb70289 commented on a diff in pull request #13244: ARROW-12626: [C++] Support toolchain xsimd, update toolchain version to version 8.1.0

2022-06-05 Thread GitBox
cyb70289 commented on code in PR #13244: URL: https://github.com/apache/arrow/pull/13244#discussion_r889692916 ## cpp/cmake_modules/ThirdpartyToolchain.cmake: ## @@ -2234,16 +2234,22 @@ if((NOT ARROW_SIMD_LEVEL STREQUAL "NONE") OR (NOT ARROW_RUNTIME_SIMD_LEVEL STREQ else()

[GitHub] [arrow] github-actions[bot] commented on pull request #13307: ARROW-16745: [Packaging][RPM] Add support for AlmaLinux 9

2022-06-05 Thread GitBox
github-actions[bot] commented on PR #13307: URL: https://github.com/apache/arrow/pull/13307#issuecomment-1146800854 Revision: f13deda865d4735a632e5896f57e8633dfff7d68 Submitted crossbow builds: [ursacomputing/crossbow @ actions-3c3046a807](https://github.com/ursacomputing/crossbow/bra

[GitHub] [arrow] kou commented on pull request #13307: ARROW-16745: [Packaging][RPM] Add support for AlmaLinux 9

2022-06-05 Thread GitBox
kou commented on PR #13307: URL: https://github.com/apache/arrow/pull/13307#issuecomment-1146800691 @github-actions crossbow submit almalinux-* amazon-linux-* centos-* -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[GitHub] [arrow-ballista] andygrove commented on pull request #58: Add ballista python module

2022-06-05 Thread GitBox
andygrove commented on PR #58: URL: https://github.com/apache/arrow-ballista/pull/58#issuecomment-1146800436 Thank you for the contribution @nl5887. Having a Python repl as part of Ballista will make it much more accessible. Because this builds on the `datafusion-python` module which

[GitHub] [arrow-datafusion] andygrove merged pull request #2686: MINOR: Move `simplify_expression` rule to `datafusion-optimizer` crate

2022-06-05 Thread GitBox
andygrove merged PR #2686: URL: https://github.com/apache/arrow-datafusion/pull/2686 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #2699: Sort preserving `SortMergeJoin`

2022-06-05 Thread GitBox
codecov-commenter commented on PR #2699: URL: https://github.com/apache/arrow-datafusion/pull/2699#issuecomment-1146775508 # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/2699?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_

[GitHub] [arrow-datafusion] korowa opened a new pull request, #2699: Sort preserving `SortMergeJoin`

2022-06-05 Thread GitBox
korowa opened a new pull request, #2699: URL: https://github.com/apache/arrow-datafusion/pull/2699 # Which issue does this PR close? Closes #2698. # Rationale for this change Defined ordering of `SortMergeJoin` output may be helpful for query planner/optimiz

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #1791: feat:Implement page filtering with Row Alignment

2022-06-05 Thread GitBox
tustvold commented on code in PR #1791: URL: https://github.com/apache/arrow-rs/pull/1791#discussion_r889668210 ## parquet/src/file/metadata.rs: ## @@ -223,6 +223,7 @@ pub struct RowGroupMetaData { num_rows: i64, total_byte_size: i64, schema_descr: SchemaDescPtr,

[GitHub] [arrow-datafusion] korowa opened a new issue, #2698: Sort preserving MergeJoin

2022-06-05 Thread GitBox
korowa opened a new issue, #2698: URL: https://github.com/apache/arrow-datafusion/issues/2698 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Existing `SortMergeJoin` creates output batches [in following sequence](https://gith

[GitHub] [arrow-rs] nevi-me merged pull request #1789: Read and skip validity buffer of UnionType Array for V4 ipc message

2022-06-05 Thread GitBox
nevi-me merged PR #1789: URL: https://github.com/apache/arrow-rs/pull/1789 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apac

[GitHub] [arrow-rs] nevi-me closed issue #1788: Rust IPC Read should be able to read V4 UnionType Array

2022-06-05 Thread GitBox
nevi-me closed issue #1788: Rust IPC Read should be able to read V4 UnionType Array URL: https://github.com/apache/arrow-rs/issues/1788 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[GitHub] [arrow-rs] viirya commented on pull request #1789: Read and skip validity buffer of UnionType Array for V4 ipc message

2022-06-05 Thread GitBox
viirya commented on PR #1789: URL: https://github.com/apache/arrow-rs/pull/1789#issuecomment-1146761003 Thanks @nevi-me Yes, if specifying `metadata_version` as V4, the ipc writer also need to write validity buffer for UnionType Array. Currently it doesn't, although it doesn't affec

[GitHub] [arrow-ballista] nl5887 opened a new pull request, #58: Add ballista python module

2022-06-05 Thread GitBox
nl5887 opened a new pull request, #58: URL: https://github.com/apache/arrow-ballista/pull/58 # Rationale for this change This will introduce the ballista python module, similar to the datafusion python module. -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow-rs] Ted-Jiang opened a new issue, #1792: Enable column_page_reader read specific row ranges record

2022-06-05 Thread GitBox
Ted-Jiang opened a new issue, #1792: URL: https://github.com/apache/arrow-rs/issues/1792 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

[GitHub] [arrow-rs] Ted-Jiang commented on a diff in pull request #1791: feat:Implement page filtering with Row Alignment

2022-06-05 Thread GitBox
Ted-Jiang commented on code in PR #1791: URL: https://github.com/apache/arrow-rs/pull/1791#discussion_r889659815 ## parquet/src/file/page_index/range.rs: ## @@ -0,0 +1,474 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreement

[GitHub] [arrow-rs] Ted-Jiang commented on a diff in pull request #1791: feat:Implement page filtering with Row Alignment

2022-06-05 Thread GitBox
Ted-Jiang commented on code in PR #1791: URL: https://github.com/apache/arrow-rs/pull/1791#discussion_r889658973 ## parquet/src/file/page_index/range.rs: ## @@ -0,0 +1,474 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreement

[GitHub] [arrow-rs] Ted-Jiang commented on a diff in pull request #1791: feat:Implement page filtering with Row Alignment

2022-06-05 Thread GitBox
Ted-Jiang commented on code in PR #1791: URL: https://github.com/apache/arrow-rs/pull/1791#discussion_r889658890 ## parquet/src/file/page_index/range.rs: ## @@ -0,0 +1,474 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreement

[GitHub] [arrow-rs] viirya commented on a diff in pull request #1791: feat:Implement page filtering with Row Alignment

2022-06-05 Thread GitBox
viirya commented on code in PR #1791: URL: https://github.com/apache/arrow-rs/pull/1791#discussion_r889658426 ## parquet/src/file/page_index/range.rs: ## @@ -0,0 +1,474 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

[GitHub] [arrow-rs] viirya commented on a diff in pull request #1791: feat:Implement page filtering with Row Alignment

2022-06-05 Thread GitBox
viirya commented on code in PR #1791: URL: https://github.com/apache/arrow-rs/pull/1791#discussion_r889658333 ## parquet/src/file/page_index/range.rs: ## @@ -0,0 +1,474 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements.

[GitHub] [arrow-datafusion] Ted-Jiang commented on a diff in pull request #2694: combine limit and offset

2022-06-05 Thread GitBox
Ted-Jiang commented on code in PR #2694: URL: https://github.com/apache/arrow-datafusion/pull/2694#discussion_r889657737 ## datafusion/core/src/physical_plan/limit.rs: ## @@ -176,17 +184,18 @@ impl ExecutionPlan for GlobalLimitExec { fn statistics(&self) -> Statistics {