Re: [VOTE] Add RLE Arrays to Arrow Format

2023-01-09 Thread Matt Topol
Thanks Antoine! I'll go respond to your comments now! On Mon, Jan 9, 2023 at 11:01 AM Antoine Pitrou wrote: > > I've commented on the PR. I'm +1 on the principle and on the proposed > format / layout additions. > > Regards > > Antoine. > > > Le 14/12/2022 à 17:27, Matt Topol a écrit : > >

Re: [VOTE] Add RLE Arrays to Arrow Format

2023-01-09 Thread Antoine Pitrou
I've commented on the PR. I'm +1 on the principle and on the proposed format / layout additions. Regards Antoine. Le 14/12/2022 à 17:27, Matt Topol a écrit : Hello, I'd like to propose adding the RLE type based on earlier discussions[1][2] to the Arrow format: - Columnar Format

Re: [VOTE] Add RLE Arrays to Arrow Format

2022-12-19 Thread Matthew Topol
Huzzah! That brings us to 3 +1 (binding) votes, and 1 +1 (non-binding) vote! The vote passes! I've updated the PR for the format changes (on their own) here: https://github.com/apache/arrow/pull/14176 and will follow it up with updating the other PRs as I can. If anyone could comment / approve

Re: [VOTE] Add RLE Arrays to Arrow Format

2022-12-19 Thread Ian Cook
@Matt Topol: Yes, a change of the name to "run-end encoding" changes my (non-binding) vote to a +1. On Mon, Dec 19, 2022 at 3:32 PM Matthew Topol wrote: > > Okay, slight edit to my previous email: It was brought to my attention that > we need at least 3 +1 binding votes, so this vote is still

Re: [VOTE] Add RLE Arrays to Arrow Format

2022-12-19 Thread Jorge Cardoso Leitão
+1 Thanks a lot for all this. Really exciting!! On Mon, 19 Dec 2022, 17:56 Matt Topol, wrote: > That leaves us with a total vote of +1.5 so the vote carries with the > caveat of changing the name to be Run End Encoded rather than Run Length > Encoded (unless this means I need to do a new vote

Re: [VOTE] Add RLE Arrays to Arrow Format

2022-12-19 Thread Matthew Topol
Okay, slight edit to my previous email: It was brought to my attention that we need at least 3 +1 binding votes, so this vote is still open for the moment. @IanCook: With the change of the name to RunEndEncoding is that sufficient to change your vote to a +1? On Mon, Dec 19, 2022 at 12:57 PM

Re: [VOTE] Add RLE Arrays to Arrow Format

2022-12-19 Thread Matt Topol
That leaves us with a total vote of +1.5 so the vote carries with the caveat of changing the name to be Run End Encoded rather than Run Length Encoded (unless this means I need to do a new vote with the changed name? This is my first time doing one of these so please correct me if I need to do a

Re: [VOTE] Add RLE Arrays to Arrow Format

2022-12-16 Thread Weston Pace
+1 I agree that run-end encoding makes more sense but also don't see it as a deal breaker. The most compelling counter-argument I've seen for new types is to avoid a schism where some implementations do not support the newer types. However, for the type proposed here I think the risk is low

Re: [VOTE] Add RLE Arrays to Arrow Format

2022-12-16 Thread Andrew Lamb
+1 on the proposal as written I think it makes sense and offers exciting opportunities for faster computation (especially for cases where parquet files can be decoded directly into such an array and avoid unpacking. RLE encoded dictionary are quite compelling) I would prefer to use the term

Re: [VOTE] Add RLE Arrays to Arrow Format

2022-12-14 Thread Matt Topol
I'm not at all opposed to renaming it as `Run-End-Encoding` if that would be preferable. Hopefully others will chime in with their feedback. --Matt On Wed, Dec 14, 2022 at 12:09 PM Ian Cook wrote: > Thank you Matt, Tobias, and others for the great work on this. > > I am -0.5 on this proposal

Re: [VOTE] Add RLE Arrays to Arrow Format

2022-12-14 Thread Ian Cook
Thank you Matt, Tobias, and others for the great work on this. I am -0.5 on this proposal in its current form because (pardon the pedantry) what we have implemented here is not run-length encoding; it is run-end encoding. Based on community input, the choice was made to store run ends instead of

[VOTE] Add RLE Arrays to Arrow Format

2022-12-14 Thread Matt Topol
Hello, I'd like to propose adding the RLE type based on earlier discussions[1][2] to the Arrow format: - Columnar Format description: https://github.com/apache/arrow/pull/1/files#diff-8b68cf6859e881f2357f5df64bb073135d7ff6eeb51f116418660b3856564c60 - Flatbuffers changes: