[GitHub] [arrow-rs] codecov-commenter commented on pull request #1721: Remove `null_count` from `ArrayData::try_new()`

2022-05-21 Thread GitBox
codecov-commenter commented on PR #1721: URL: https://github.com/apache/arrow-rs/pull/1721#issuecomment-1133802617 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1721?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+S

[GitHub] [arrow-rs] HaoYang670 opened a new pull request, #1721: Remove `null_count` from `ArrayData::try_new()`

2022-05-21 Thread GitBox
HaoYang670 opened a new pull request, #1721: URL: https://github.com/apache/arrow-rs/pull/1721 Signed-off-by: remzi <1371656737...@gmail.com> # Which issue does this PR close? Closes #911. # Rationale for this change Avoid the inconsistency of between `ArrayDat

[GitHub] [arrow-rs] HaoYang670 commented on issue #807: Buffer::bit_slice has incorrect length for aligned offset

2022-05-21 Thread GitBox
HaoYang670 commented on issue #807: URL: https://github.com/apache/arrow-rs/issues/807#issuecomment-1133781662 BTW, is it better to move `bit_slice` to `Bitmap`, as it is a bitwise operation? -- This is an automated message from the Apache Git Service. To respond to the message, please lo

[GitHub] [arrow-rs] Ismail-Maj opened a new pull request, #1720: Implementation string concat

2022-05-21 Thread GitBox
Ismail-Maj opened a new pull request, #1720: URL: https://github.com/apache/arrow-rs/pull/1720 # Which issue does this PR close? Closes #1540 . # Rationale for this change # What changes are included in this PR? Add an implementation for string concat and

[GitHub] [arrow] kou commented on pull request #13203: ARROW-16617: [C++] Add support for multi-byte system error message on Windows

2022-05-21 Thread GitBox
kou commented on PR #13203: URL: https://github.com/apache/arrow/pull/13203#issuecomment-1133777285 > Did you validate that it fixes the issue? Partially, yes. I just confirmed that C++ uses UTF-8 instead of not the current code page but didn't confirm for PyArrow. I used `$BUI

[GitHub] [arrow-rs] viirya commented on issue #1404: Triage remaining integration test failures with other Arrow implementations

2022-05-21 Thread GitBox
viirya commented on issue #1404: URL: https://github.com/apache/arrow-rs/issues/1404#issuecomment-1133776644 Okay, I fixed all integration test failures. Currently the only failing test case is `generate_decimal256_case` which needs #131. Not sure if it is straightforward to add the Decimal

[GitHub] [arrow-rs] viirya commented on pull request #1713: Fix incorrect null_count in `generate_unions_case` integration test

2022-05-21 Thread GitBox
viirya commented on PR #1713: URL: https://github.com/apache/arrow-rs/pull/1713#issuecomment-1133776061 Merged. Thanks @tustvold for review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[GitHub] [arrow-rs] viirya merged pull request #1713: Fix incorrect null_count in `generate_unions_case` integration test

2022-05-21 Thread GitBox
viirya merged PR #1713: URL: https://github.com/apache/arrow-rs/pull/1713 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apach

[GitHub] [arrow-rs] viirya closed issue #1712: Fix incorrect null_count in `generate_unions_case` integration test

2022-05-21 Thread GitBox
viirya closed issue #1712: Fix incorrect null_count in `generate_unions_case` integration test URL: https://github.com/apache/arrow-rs/issues/1712 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow-rs] viirya merged pull request #1714: Check the length of `null_bit_buffer` in `ArrayData::try_new()`

2022-05-21 Thread GitBox
viirya merged PR #1714: URL: https://github.com/apache/arrow-rs/pull/1714 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apach

[GitHub] [arrow-rs] viirya closed issue #1707: `ArrayData::try_new` cannot always return expected error.

2022-05-21 Thread GitBox
viirya closed issue #1707: `ArrayData::try_new` cannot always return expected error. URL: https://github.com/apache/arrow-rs/issues/1707 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [arrow] kou commented on a diff in pull request #13203: ARROW-16617: [C++] Add support for multi-byte system error message on Windows

2022-05-21 Thread GitBox
kou commented on code in PR #13203: URL: https://github.com/apache/arrow/pull/13203#discussion_r878765786 ## cpp/src/arrow/util/io_util.cc: ## @@ -203,16 +203,26 @@ std::string ErrnoMessage(int errnum) { return std::strerror(errnum); } #if _WIN32 std::string WinErrorMessage

[GitHub] [arrow] kou commented on a diff in pull request #13203: ARROW-16617: [C++] Add support for multi-byte system error message on Windows

2022-05-21 Thread GitBox
kou commented on code in PR #13203: URL: https://github.com/apache/arrow/pull/13203#discussion_r878765662 ## cpp/src/arrow/util/io_util.cc: ## @@ -203,16 +203,26 @@ std::string ErrnoMessage(int errnum) { return std::strerror(errnum); } #if _WIN32 std::string WinErrorMessage

[GitHub] [arrow-rs] HaoYang670 commented on pull request #1714: Check the length of `null_bit_buffer` in `ArrayData::try_new()`

2022-05-21 Thread GitBox
HaoYang670 commented on PR #1714: URL: https://github.com/apache/arrow-rs/pull/1714#issuecomment-1133774255 Thank you for your explanation @tustvold. I will do some experiments to test how much performance `null_count` could contribute. -- This is an automated message from the Apache Git

[GitHub] [arrow] kou commented on pull request #13157: ARROW-16584: [Java] Java JNI with S3 support

2022-05-21 Thread GitBox
kou commented on PR #13157: URL: https://github.com/apache/arrow/pull/13157#issuecomment-1133773589 @raulcd Right. We don't test the built jars for now. We should test them. We can add extra (GitHub Actions) jobs to https://github.com/apache/arrow/blob/master/dev/tasks/java-jars/github.yml

[GitHub] [arrow] github-actions[bot] commented on pull request #13157: ARROW-16584: [Java] Java JNI with S3 support

2022-05-21 Thread GitBox
github-actions[bot] commented on PR #13157: URL: https://github.com/apache/arrow/pull/13157#issuecomment-1133772582 Revision: ad62c6f561d846838dd868dc2fb992706005e043 Submitted crossbow builds: [ursacomputing/crossbow @ actions-f6bd02cb91](https://github.com/ursacomputing/crossbow/bra

[GitHub] [arrow] kou commented on pull request #13157: ARROW-16584: [Java] Java JNI with S3 support

2022-05-21 Thread GitBox
kou commented on PR #13157: URL: https://github.com/apache/arrow/pull/13157#issuecomment-1133772469 @github-actions crossbow submit java-jars -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

[GitHub] [arrow] kou commented on pull request #13184: ARROW-16602: [Dev] Use GitHub API to merge pull request

2022-05-21 Thread GitBox
kou commented on PR #13184: URL: https://github.com/apache/arrow/pull/13184#issuecomment-1133771030 Good catch! I've updated the documentation and added support for `~/.config/arrow/merge.py` and prompt for no `ARROW_GITHUB_API_TOKEN` case. -- This is an automated message from the Apac

[GitHub] [arrow-ballista] andygrove opened a new pull request, #34: MINOR: Improve the examples

2022-05-21 Thread GitBox
andygrove opened a new pull request, #34: URL: https://github.com/apache/arrow-ballista/pull/34 # Which issue does this PR close? N/A # Rationale for this change Make the examples easier to run # What changes are included in this PR? Better

[GitHub] [arrow-ballista] andygrove merged pull request #33: Improve top-level README

2022-05-21 Thread GitBox
andygrove merged PR #33: URL: https://github.com/apache/arrow-ballista/pull/33 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.

[GitHub] [arrow] kou merged pull request #13209: MINOR: [Docs] Fix typo in communication.rst

2022-05-21 Thread GitBox
kou merged PR #13209: URL: https://github.com/apache/arrow/pull/13209 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.or

[GitHub] [arrow] kou commented on a diff in pull request #13206: ARROW-15906: [C++][Python][R] By default, don't create or delete S3 buckets

2022-05-21 Thread GitBox
kou commented on code in PR #13206: URL: https://github.com/apache/arrow/pull/13206#discussion_r878756026 ## cpp/src/arrow/filesystem/s3fs.h: ## @@ -130,6 +130,9 @@ struct ARROW_EXPORT S3Options { /// Whether OutputStream writes will be issued in the background, without bloc

[GitHub] [arrow-rs] codecov-commenter commented on pull request #1719: Rustify parquet writer (#1717) (#1163)

2022-05-21 Thread GitBox
codecov-commenter commented on PR #1719: URL: https://github.com/apache/arrow-rs/pull/1719#issuecomment-1133756007 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1719?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+S

[GitHub] [arrow-ballista] andygrove commented on issue #30: [Discuss] Ballista Future Direction

2022-05-21 Thread GitBox
andygrove commented on issue #30: URL: https://github.com/apache/arrow-ballista/issues/30#issuecomment-1133747900 I have a PR open (https://github.com/apache/arrow-ballista/pull/33) to improve the top-level README to better describe the current state of the project and the future direction,

[GitHub] [arrow-ballista] andygrove opened a new pull request, #33: Improve top-level README

2022-05-21 Thread GitBox
andygrove opened a new pull request, #33: URL: https://github.com/apache/arrow-ballista/pull/33 # Which issue does this PR close? Closes https://github.com/apache/arrow-ballista/issues/3 # Rationale for this change The README has not been updated in a long ti

[GitHub] [arrow-ballista] andygrove opened a new issue, #32: Adopt substrait.io for serializing query plans

2022-05-21 Thread GitBox
andygrove opened a new issue, #32: URL: https://github.com/apache/arrow-ballista/issues/32 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Ballista (and DataFusion) has a proprietary protobuf-based format for serializing query pla

[GitHub] [arrow-ballista] thinkharderdev commented on issue #30: [Discuss] Ballista Future Direction

2022-05-21 Thread GitBox
thinkharderdev commented on issue #30: URL: https://github.com/apache/arrow-ballista/issues/30#issuecomment-1133741610 Great, thanks for the feedback everyone! I would summarize the key points as: 1. Based on this and previous discussions, the goal is not to create yet another

[GitHub] [arrow-ballista] GavinRay97 commented on issue #30: [Discuss] Ballista Future Direction

2022-05-21 Thread GitBox
GavinRay97 commented on issue #30: URL: https://github.com/apache/arrow-ballista/issues/30#issuecomment-1133730265 > To be transparent, my team is building a query engine which is sensitive to time-to-first-result latency so we are very interested in fully streaming execution (and hoping to

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #1719: Rustify parquet writer (#1717) (#1163)

2022-05-21 Thread GitBox
tustvold commented on code in PR #1719: URL: https://github.com/apache/arrow-rs/pull/1719#discussion_r878743074 ## parquet/src/file/writer.rs: ## @@ -92,102 +111,90 @@ pub trait FileWriter { /// All columns should be written sequentially; the main workflow is: /// - Request th

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #1719: Rustify parquet writer (#1717) (#1163)

2022-05-21 Thread GitBox
tustvold commented on code in PR #1719: URL: https://github.com/apache/arrow-rs/pull/1719#discussion_r878742794 ## parquet/src/arrow/arrow_writer.rs: ## @@ -65,7 +67,7 @@ pub struct ArrowWriter { max_row_group_size: usize, } -impl ArrowWriter { +impl ArrowWriter { Revie

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #1719: Rustify parquet writer (#1717) (#1163)

2022-05-21 Thread GitBox
tustvold commented on code in PR #1719: URL: https://github.com/apache/arrow-rs/pull/1719#discussion_r878742710 ## parquet/src/file/writer.rs: ## @@ -393,39 +387,91 @@ impl RowGroupWriter for SerializedRowGroupWriter .set_num_rows(self.total_rows_written.unwrap

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #1719: Rustify parquet writer (#1717) (#1163)

2022-05-21 Thread GitBox
tustvold commented on code in PR #1719: URL: https://github.com/apache/arrow-rs/pull/1719#discussion_r878742614 ## parquet/src/file/mod.rs: ## @@ -48,12 +48,14 @@ //! let props = Arc::new(WriterProperties::builder().build()); //! let file = fs::File::create(&path).unwrap(); /

[GitHub] [arrow-rs] tustvold opened a new pull request, #1719: Rustify parquet writer (#1717) (#1163)

2022-05-21 Thread GitBox
tustvold opened a new pull request, #1719: URL: https://github.com/apache/arrow-rs/pull/1719 # Which issue does this PR close? Closes #1717 Part of #1163 # Rationale for this change See tickets, but in short the current write path makes use of a lot of custom IO

[GitHub] [arrow-ballista] andygrove commented on issue #30: [Discuss] Ballista Future Direction

2022-05-21 Thread GitBox
andygrove commented on issue #30: URL: https://github.com/apache/arrow-ballista/issues/30#issuecomment-1133705734 Thanks for starting this discussion @thinkharderdev :heart: With Ballista moving to this new repository I think it is an excellent time to "reboot" the project and assess

[GitHub] [arrow-datafusion] thinkharderdev commented on pull request #2572: Decouple FileFormat from datafusion_data_access

2022-05-21 Thread GitBox
thinkharderdev commented on PR #2572: URL: https://github.com/apache/arrow-datafusion/pull/2572#issuecomment-1133703572 Seems reasonable to me -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] nealrichardson commented on issue #13211: [R] Can't binary install arrow 8.0.0 from RStudio Public Package Maneger

2022-05-21 Thread GitBox
nealrichardson commented on issue #13211: URL: https://github.com/apache/arrow/issues/13211#issuecomment-1133700301 @glin FYI, is this known? Let me know if I can help debug anything. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

[GitHub] [arrow-rs] viirya commented on pull request #1713: Fix incorrect null_count in `generate_unions_case` integration test

2022-05-21 Thread GitBox
viirya commented on PR #1713: URL: https://github.com/apache/arrow-rs/pull/1713#issuecomment-1133669525 @tustvold Yea, that's correct. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [arrow-rs] pacman82 commented on issue #1687: Support writing parquet to stdout

2022-05-21 Thread GitBox
pacman82 commented on issue #1687: URL: https://github.com/apache/arrow-rs/issues/1687#issuecomment-1133664190 @tustvold Great! This will unblock new features in downstream crate `odbc2parqet`. 🙇 -- This is an automated message from the Apache Git Service. To respond to the message, plea

[GitHub] [arrow] eitsupi commented on issue #13211: [R] Can't binary install arrow 8.0.0 from RStudio Public Package Maneger

2022-05-21 Thread GitBox
eitsupi commented on issue #13211: URL: https://github.com/apache/arrow/issues/13211#issuecomment-1133660907 It appears that binary installation is possible with bionic. https://packagemanager.rstudio.com/__api__/repos/1/binaries?distribution=focal&r_version=4.2&packages=arrow htt

[GitHub] [arrow-datafusion] andygrove opened a new issue, #2584: Add optimizer rule to remove `OFFSET 0`

2022-05-21 Thread GitBox
andygrove opened a new issue, #2584: URL: https://github.com/apache/arrow-datafusion/issues/2584 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Add optimizer rule to remove `OFFSET 0` since it is a no-op **Describe the solu

[GitHub] [arrow] WillAyd commented on pull request #12963: ARROW-16234: [C++] Vector Kernel for Rank

2022-05-21 Thread GitBox
WillAyd commented on PR #12963: URL: https://github.com/apache/arrow/pull/12963#issuecomment-1133658440 Thanks for the heads up. No rush at all - enjoy the time away -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow] eitsupi opened a new issue, #13211: [R] Can't binary install arrow 8.0.0 from RStudio Public Package Maneger

2022-05-21 Thread GitBox
eitsupi opened a new issue, #13211: URL: https://github.com/apache/arrow/issues/13211 When we try to install arrow from `https://packagemanager.rstudio.com/cran/__linux__/focal/latest` on Ubuntu 20.04, it seems to perform a source install. Since the other packages are binary installs, it

[GitHub] [arrow-datafusion] andygrove commented on issue #2502: [EPIC] Move Ballista to new arrow-ballista repo

2022-05-21 Thread GitBox
andygrove commented on issue #2502: URL: https://github.com/apache/arrow-datafusion/issues/2502#issuecomment-1133656420 The next step is to review & merge https://github.com/apache/arrow-datafusion/pull/2582 -- This is an automated message from the Apache Git Service. To respond to the m

[GitHub] [arrow-datafusion] andygrove commented on pull request #2582: Build against `arrow-ballista` in CI

2022-05-21 Thread GitBox
andygrove commented on PR #2582: URL: https://github.com/apache/arrow-datafusion/pull/2582#issuecomment-1133652171 @liukun4515 @yahoNanJing @mingmwang @thinkharderdev @gaojun2048 @realno @thinkharderdev @matthewmturner @yjshen @alamb @tustvold fyi -- This is an automated message from the

[GitHub] [arrow-ballista] andygrove merged pull request #31: MINOR: Use datafusion rev `cb84504fed4e613c9ed18c4e2a2022c701add2d9`

2022-05-21 Thread GitBox
andygrove merged PR #31: URL: https://github.com/apache/arrow-ballista/pull/31 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.

[GitHub] [arrow-datafusion] andygrove opened a new issue, #2583: Improve process for making changes to both `arrow-datafusion` and `arrow-ballista`

2022-05-21 Thread GitBox
andygrove opened a new issue, #2583: URL: https://github.com/apache/arrow-datafusion/issues/2583 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** As a follow-on to https://github.com/apache/arrow-datafusion/pull/2582 it would be ni

[GitHub] [arrow] Jokser commented on issue #13208: Protobuf specification for arrow types

2022-05-21 Thread GitBox
Jokser commented on issue #13208: URL: https://github.com/apache/arrow/issues/13208#issuecomment-1133649363 @lidavidm Thank you. I'll look in this direction. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[GitHub] [arrow-ballista] andygrove opened a new pull request, #31: MINOR: Use datafusion rev `cb84504fed4e613c9ed18c4e2a2022c701add2d9`

2022-05-21 Thread GitBox
andygrove opened a new pull request, #31: URL: https://github.com/apache/arrow-ballista/pull/31 # Which issue does this PR close? N/A # Rationale for this change Keep up to date with latest DataFusion # What changes are included in this PR?

[GitHub] [arrow-datafusion] andygrove commented on pull request #2576: Move `LogicalPlanBuilder` to `datafusion-expr` crate

2022-05-21 Thread GitBox
andygrove commented on PR #2576: URL: https://github.com/apache/arrow-datafusion/pull/2576#issuecomment-1133646688 @alamb This PR is finally ready for review ... this is what all the previous refactoring PRs were leading to. -- This is an automated message from the Apache Git Service. To

[GitHub] [arrow-datafusion] andygrove commented on a diff in pull request #2575: Remove `Union::alias`

2022-05-21 Thread GitBox
andygrove commented on code in PR #2575: URL: https://github.com/apache/arrow-datafusion/pull/2575#discussion_r878701773 ## datafusion/core/src/optimizer/filter_push_down.rs: ## @@ -894,27 +889,6 @@ mod tests { Ok(()) } -#[test] -fn union_all_with_alias()

[GitHub] [arrow] github-actions[bot] commented on pull request #13210: ARROW-16607: [R] Improve KeyValueMetadata handling

2022-05-21 Thread GitBox
github-actions[bot] commented on PR #13210: URL: https://github.com/apache/arrow/pull/13210#issuecomment-1133642708 :warning: Ticket **has not been started in JIRA**, please click 'Start Progress'. -- This is an automated message from the Apache Git Service. To respond to the message, ple

[GitHub] [arrow] github-actions[bot] commented on pull request #13210: ARROW-16607: [R] Improve KeyValueMetadata handling

2022-05-21 Thread GitBox
github-actions[bot] commented on PR #13210: URL: https://github.com/apache/arrow/pull/13210#issuecomment-1133642701 https://issues.apache.org/jira/browse/ARROW-16607 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow] nealrichardson opened a new pull request, #13210: ARROW-16607: [R] Improve KeyValueMetadata handling

2022-05-21 Thread GitBox
nealrichardson opened a new pull request, #13210: URL: https://github.com/apache/arrow/pull/13210 * Pushes KVM handling into ExecPlan so that Run() preserves the R metadata we want. * Also pushes special handling for a kind of collapsed query from collect() into Build(). * Better enc

[GitHub] [arrow-datafusion] andygrove commented on a diff in pull request #2575: Remove `Union::alias`

2022-05-21 Thread GitBox
andygrove commented on code in PR #2575: URL: https://github.com/apache/arrow-datafusion/pull/2575#discussion_r878699688 ## datafusion/core/src/optimizer/filter_push_down.rs: ## @@ -894,27 +889,6 @@ mod tests { Ok(()) } -#[test] -fn union_all_with_alias()

[GitHub] [arrow-datafusion] andygrove opened a new pull request, #2582: Build against arrow-ballista in CI

2022-05-21 Thread GitBox
andygrove opened a new pull request, #2582: URL: https://github.com/apache/arrow-datafusion/pull/2582 # Which issue does this PR close? Part of https://github.com/apache/arrow-datafusion/issues/2502 # Rationale for this change We need to build against `arrow-

[GitHub] [arrow-rs] tustvold commented on pull request #1714: Check the length of `null_bit_buffer` in `ArrayData::try_new()`

2022-05-21 Thread GitBox
tustvold commented on PR #1714: URL: https://github.com/apache/arrow-rs/pull/1714#issuecomment-1133637853 > we should try to remove the null_count field from ArrayData Often faster kernel implementations are possible if one can ignore nulls, as such null_count helps to inform this sel

[GitHub] [arrow-rs] HaoYang670 commented on pull request #1714: Check the length of `null_bit_buffer` in `ArrayData::try_new()`

2022-05-21 Thread GitBox
HaoYang670 commented on PR #1714: URL: https://github.com/apache/arrow-rs/pull/1714#issuecomment-1133634661 > But if we don't, we will check the length of `null_bit_buffer` twice. Maybe in the long term, we should try to remove the `null_count` field from `ArrayData`. I don't find why

[GitHub] [arrow-rs] tustvold commented on issue #1687: Support writing parquet to stdout

2022-05-21 Thread GitBox
tustvold commented on issue #1687: URL: https://github.com/apache/arrow-rs/issues/1687#issuecomment-1133632599 I'm currently working on this as part of fixing #1717 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow] assignUser commented on pull request #13149: ARROW-16403:[R][CI] Create Crossbow task for R nightly builds

2022-05-21 Thread GitBox
assignUser commented on PR #13149: URL: https://github.com/apache/arrow/pull/13149#issuecomment-1133630688 A full run based on apache/arrow master: [here](https://github.com/assignUser/test-repo-a/actions/runs/2362911923) -- This is an automated message from the Apache Git Service. To res

[GitHub] [arrow-rs] tustvold commented on pull request #1499: Alternative implementation of nullif kernel by slicing nested buffers

2022-05-21 Thread GitBox
tustvold commented on PR #1499: URL: https://github.com/apache/arrow-rs/pull/1499#issuecomment-1133630070 Another random thought would be to store a bit_offset on `Buffer` instead of a byte offset :thinking: I think this would allow removing the `offset` from ArrayData and a max offset of 2

[GitHub] [arrow-rs] tustvold commented on issue #807: Buffer::bit_slice has incorrect length for aligned offset

2022-05-21 Thread GitBox
tustvold commented on issue #807: URL: https://github.com/apache/arrow-rs/issues/807#issuecomment-1133629771 Currently, as alluded to @bjchambers, the length of a `Buffer` is somewhat immaterial, with length solely a property of the Array / ArrayData, and the size of the underlying buffers

[GitHub] [arrow-datafusion] Ted-Jiang commented on pull request #2580: Support limit pushdown through left right outer join

2022-05-21 Thread GitBox
Ted-Jiang commented on PR #2580: URL: https://github.com/apache/arrow-datafusion/pull/2580#issuecomment-1133628760 > Yes, you are right @Ted-Jiang -- I was confused about the side of the join 🤦 > > Upon more thought I think this is correct. > > Thank you again. > > I'll

[GitHub] [arrow-datafusion] alamb commented on pull request #2580: Support limit pushdown through left right outer join

2022-05-21 Thread GitBox
alamb commented on PR #2580: URL: https://github.com/apache/arrow-datafusion/pull/2580#issuecomment-1133626760 > One question is df LEFT JOIN is equal to LEFT OUTER JOIN? Yes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[GitHub] [arrow-rs] HaoYang670 commented on issue #807: Buffer::bit_slice has incorrect length for aligned offset

2022-05-21 Thread GitBox
HaoYang670 commented on issue #807: URL: https://github.com/apache/arrow-rs/issues/807#issuecomment-1133622635 cc @alamb @viirya @tustvold -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [arrow-rs] HaoYang670 commented on issue #807: Buffer::bit_slice has incorrect length for aligned offset

2022-05-21 Thread GitBox
HaoYang670 commented on issue #807: URL: https://github.com/apache/arrow-rs/issues/807#issuecomment-1133619890 Maybe an alternative way is to add the `length` field for `Buffer`, such as: ```rust pub struct Buffer { /// the internal byte buffer. data: Arc, /// T

[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #2572: Decouple FileFormat from datafusion_data_access

2022-05-21 Thread GitBox
tustvold commented on code in PR #2572: URL: https://github.com/apache/arrow-datafusion/pull/2572#discussion_r878690223 ## datafusion/core/src/datasource/file_format/mod.rs: ## @@ -63,8 +67,9 @@ pub trait FileFormat: Send + Sync + fmt::Debug { /// TODO: should the file sour

[GitHub] [arrow-rs] tustvold commented on issue #1574: protoc failed: Unknown flag: `--experimental_allow_proto3_optional`

2022-05-21 Thread GitBox
tustvold commented on issue #1574: URL: https://github.com/apache/arrow-rs/issues/1574#issuecomment-1133612714 FWIW I believe if you uninstall `protobuf-compiler` and have `cmake` installed, prost will automatically compile the `protobuf-compiler` from source. -- This is an automated mess

[GitHub] [arrow-rs] Ismail-Maj commented on issue #1574: protoc failed: Unknown flag: `--experimental_allow_proto3_optional`

2022-05-21 Thread GitBox
Ismail-Maj commented on issue #1574: URL: https://github.com/apache/arrow-rs/issues/1574#issuecomment-1133609135 Building protobuf from source solved this problem for me, I had to also uninstall the ubuntu package `protobuf-compiler`. -- This is an automated message from the Apache Git Se

[GitHub] [arrow-rs] tustvold commented on issue #1708: Roundtrip failure when using DELTA_BINARY_PACKED

2022-05-21 Thread GitBox
tustvold commented on issue #1708: URL: https://github.com/apache/arrow-rs/issues/1708#issuecomment-1133608391 Hi, thank you for the report. I'll take a look either today or tomorrow, it's likely a bug in the offset tracking logic -- This is an automated message from the Apache Git Servic

[GitHub] [arrow-rs] asayers commented on issue #1708: Roundtrip failure when using DELTA_BINARY_PACKED

2022-05-21 Thread GitBox
asayers commented on issue #1708: URL: https://github.com/apache/arrow-rs/issues/1708#issuecomment-1133607984 @tustvold do you have any ideas about what could be causing this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [arrow-datafusion] Ted-Jiang commented on pull request #2580: Support limit pushdown through left right outer join

2022-05-21 Thread GitBox
Ted-Jiang commented on PR #2580: URL: https://github.com/apache/arrow-datafusion/pull/2580#issuecomment-1133607491 > Thank you for the contribution @Ted-Jiang , howe I don't think this is a valid optimization. > > Specifically, because a join can filter out rows, if you limit the inp

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2572: Decouple FileFormat from datafusion_data_access

2022-05-21 Thread GitBox
alamb commented on code in PR #2572: URL: https://github.com/apache/arrow-datafusion/pull/2572#discussion_r878686694 ## datafusion/core/src/datasource/file_format/avro.rs: ## @@ -359,36 +358,13 @@ mod tests { async fn get_exec( file_name: &str, -projectio

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2572: Decouple FileFormat from datafusion_data_access

2022-05-21 Thread GitBox
alamb commented on code in PR #2572: URL: https://github.com/apache/arrow-datafusion/pull/2572#discussion_r878686331 ## datafusion/core/src/datasource/file_format/mod.rs: ## @@ -63,8 +67,9 @@ pub trait FileFormat: Send + Sync + fmt::Debug { /// TODO: should the file source

[GitHub] [arrow-ballista] alamb commented on issue #30: [Discuss] Ballista Future Direction

2022-05-21 Thread GitBox
alamb commented on issue #30: URL: https://github.com/apache/arrow-ballista/issues/30#issuecomment-1133605057 Somewhat of a tangent for this discussion, but my project (https://github.com/influxdata/influxdb_iox) is also interested in time to first result (aka streaming) execution, as well

[GitHub] [arrow] lidavidm commented on issue #13208: Protobuf specification for arrow types

2022-05-21 Thread GitBox
lidavidm commented on issue #13208: URL: https://github.com/apache/arrow/issues/13208#issuecomment-1133604025 Arrow Flight just encodes Arrow schemas by serializing/deserializing them and using a bytes field. Would that work? That way you wouldn't need to re-model schemas in Protobuf and yo

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2575: Remove `Union::alias`

2022-05-21 Thread GitBox
alamb commented on code in PR #2575: URL: https://github.com/apache/arrow-datafusion/pull/2575#discussion_r878683861 ## datafusion/core/src/logical_plan/builder.rs: ## @@ -432,8 +432,50 @@ impl LogicalPlanBuilder { } /// Apply a union, preserving duplicate rows -

[GitHub] [arrow-datafusion] alamb closed issue #2346: SQL planner should use `TableSource` not `TableProvider`

2022-05-21 Thread GitBox
alamb closed issue #2346: SQL planner should use `TableSource` not `TableProvider` URL: https://github.com/apache/arrow-datafusion/issues/2346 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[GitHub] [arrow-datafusion] alamb merged pull request #2569: `LogicalPlanBuilder` now uses `TableSource` instead of `TableProvider`

2022-05-21 Thread GitBox
alamb merged PR #2569: URL: https://github.com/apache/arrow-datafusion/pull/2569 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arro

[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2569: `LogicalPlanBuilder` now uses `TableSource` instead of `TableProvider`

2022-05-21 Thread GitBox
alamb commented on code in PR #2569: URL: https://github.com/apache/arrow-datafusion/pull/2569#discussion_r878683705 ## datafusion/core/src/logical_plan/builder.rs: ## @@ -191,16 +191,16 @@ impl LogicalPlanBuilder { /// Convert a table provider into a builder with a TableSc

[GitHub] [arrow-rs] alamb commented on a diff in pull request #1716: Add explicit column mask construction in parquet: `ProjectionMask` (#1701)

2022-05-21 Thread GitBox
alamb commented on code in PR #1716: URL: https://github.com/apache/arrow-rs/pull/1716#discussion_r878681742 ## parquet/src/arrow/mod.rs: ## @@ -133,11 +140,71 @@ pub use self::arrow_reader::ParquetFileArrowReader; pub use self::arrow_writer::ArrowWriter; #[cfg(feature = "asyn

[GitHub] [arrow] 3AceShowHand opened a new pull request, #13209: MINOR: [Docs] Fix typo in communication.rst

2022-05-21 Thread GitBox
3AceShowHand opened a new pull request, #13209: URL: https://github.com/apache/arrow/pull/13209 Signed-off-by: 3AceShowHand -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [arrow-rs] alamb opened a new issue, #1718: Support encoding a single parquet file using multiple threads

2022-05-21 Thread GitBox
alamb opened a new issue, #1718: URL: https://github.com/apache/arrow-rs/issues/1718 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** The encoding / compression is most often the bottleneck for increasing the throughput of writing

[GitHub] [arrow] dragosmg commented on a diff in pull request #13196: ARROW-16407: [R] Extend `parse_date_time` to cover hour, dates, and minutes components

2022-05-21 Thread GitBox
dragosmg commented on code in PR #13196: URL: https://github.com/apache/arrow/pull/13196#discussion_r878680645 ## r/tests/testthat/test-dplyr-funcs-datetime.R: ## @@ -1942,3 +1925,206 @@ test_that("lubridate's fast_strptime", { collect() ) }) + +test_that("parse_date_

[GitHub] [arrow-rs] alamb opened a new issue, #1717: Trying to write parquet file in parallel results in corrupt file

2022-05-21 Thread GitBox
alamb opened a new issue, #1717: URL: https://github.com/apache/arrow-rs/issues/1717 **Describe the bug** (from the mailing list) Apparently, you can make a program that appears to write a parquet file in parallel, but it will currently produce corrupt parquet data. **To R

[GitHub] [arrow] dragosmg commented on a diff in pull request #13196: ARROW-16407: [R] Extend `parse_date_time` to cover hour, dates, and minutes components

2022-05-21 Thread GitBox
dragosmg commented on code in PR #13196: URL: https://github.com/apache/arrow/pull/13196#discussion_r878679926 ## r/R/dplyr-datetime-helpers.R: ## @@ -201,19 +213,130 @@ build_formats <- function(orders) { } build_format_from_order <- function(order) { - year_chars <- c("%y

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #1716: Add explicit column mask construction (#1701)

2022-05-21 Thread GitBox
tustvold commented on code in PR #1716: URL: https://github.com/apache/arrow-rs/pull/1716#discussion_r878679531 ## parquet/src/arrow/async_reader.rs: ## @@ -166,32 +167,17 @@ impl ParquetRecordBatchStreamBuilder { } /// Only read data from the provided column indexe

[GitHub] [arrow-rs] tustvold commented on pull request #1716: Add explicit column mask construction (#1701)

2022-05-21 Thread GitBox
tustvold commented on PR #1716: URL: https://github.com/apache/arrow-rs/pull/1716#issuecomment-1133596264 As a happy side-effect this actually found and fixed a bug in the handling of nested projection pushdown in ParquetRecordBatchStream -- This is an automated message from the Apache Gi

[GitHub] [arrow-rs] codecov-commenter commented on pull request #1716: Add explicit column mask construction (#1701)

2022-05-21 Thread GitBox
codecov-commenter commented on PR #1716: URL: https://github.com/apache/arrow-rs/pull/1716#issuecomment-1133595638 # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/1716?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+S

[GitHub] [arrow-datafusion] tustvold opened a new issue, #2581: Introduce ProjectionMask To Allow Nested Projection Pushdown

2022-05-21 Thread GitBox
tustvold opened a new issue, #2581: URL: https://github.com/apache/arrow-datafusion/issues/2581 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Currently projection indices are pushed down to scans as `Vec`. This creates some

[GitHub] [arrow-datafusion] tustvold commented on issue #2453: Incorrect Parquet Projection For Nested Types

2022-05-21 Thread GitBox
tustvold commented on issue #2453: URL: https://github.com/apache/arrow-datafusion/issues/2453#issuecomment-1133593168 https://github.com/apache/arrow-rs/pull/1716 contains a new API that should fix this -- This is an automated message from the Apache Git Service. To respond to the messa

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #1716: Add explicit column mask construction (#1701)

2022-05-21 Thread GitBox
tustvold commented on code in PR #1716: URL: https://github.com/apache/arrow-rs/pull/1716#discussion_r878677472 ## parquet/src/arrow/schema.rs: ## @@ -155,24 +100,24 @@ fn get_arrow_schema_from_metadata(encoded_meta: &str) -> Result { Ok(message) => message

[GitHub] [arrow-rs] tustvold opened a new pull request, #1716: Add explicit column mask construction (#1701)

2022-05-21 Thread GitBox
tustvold opened a new pull request, #1716: URL: https://github.com/apache/arrow-rs/pull/1716 # Which issue does this PR close? Closes #1701. # Rationale for this change The current API is confusing, surfacing errors at runtime, and is liable to accidental misuse - https

[GitHub] [arrow-ballista] thinkharderdev commented on issue #30: [Discuss] Ballista Future Direction

2022-05-21 Thread GitBox
thinkharderdev commented on issue #30: URL: https://github.com/apache/arrow-ballista/issues/30#issuecomment-1133589773 > I am curious when you said fully streaming execution did you mean like Flink? Exactly, the current execution model is basically Flink Batch execution, but what we

[GitHub] [arrow-datafusion] Ted-Jiang opened a new pull request, #2580: Support limit pushdown through left right outer join

2022-05-21 Thread GitBox
Ted-Jiang opened a new pull request, #2580: URL: https://github.com/apache/arrow-datafusion/pull/2580 # Which issue does this PR close? run ``` explain select * from order left join item on order.o_orderkey = item.l_orderkey limit 1; ``` before ``` | log

[GitHub] [arrow] Jokser opened a new issue, #13208: Protobuf specification for arrow types

2022-05-21 Thread GitBox
Jokser opened a new issue, #13208: URL: https://github.com/apache/arrow/issues/13208 Hi everybody. I develop gRPC services that operate with arrow types in order to perform some DDL actions (e.g. create a table with columns that have arrow types). In order to do it, I have to repeat ar

[GitHub] [arrow] pitrou commented on pull request #12963: ARROW-16234: [C++] Vector Kernel for Rank

2022-05-21 Thread GitBox
pitrou commented on PR #12963: URL: https://github.com/apache/arrow/pull/12963#issuecomment-1133557926 @WillAyd for the record I'll be on holiday this week and will take a look in ~10 days. Thanks for your patience :-) -- This is an automated message from the Apache Git Service. To respon

[GitHub] [arrow-datafusion] Ted-Jiang opened a new issue, #2579: Push Limit through Join

2022-05-21 Thread GitBox
Ted-Jiang opened a new issue, #2579: URL: https://github.com/apache/arrow-datafusion/issues/2579 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** A clear and concise description of what the problem is. Ex. I'm always frustrated whe