Le 08/07/2022 à 15:19, Wes McKinney a écrit :
* I believe that having a Type::RLE is the right approach in C++ and
it makes dynamic dispatch everywhere in the library pretty
straightforward.
+1 on this, as it will raise a nice NotImplemented error for existing
code rather than crash or corr
hi all,
Just catching up on this e-mail thread from last month. Since I've
been neck deep refactoring the kernels code the last few weeks I have
a few thoughts about this:
* How we implement and use RLE in the C++ library and Acero is
separate from how RLE will be represented in the Arrow IPC for
A format where run lengths and values are interleaved would almost certainly be
worse than having them separate. For example, unary scalar kernel evaluation is
exactly the same as on raw arrays when they are not interleaved. Further, in
the context of vectorization, a vectorized load into the ar
RLE would probably have some benefits that it makes sense to evaluate, I
would personally go in the direction of having a minimal benchmarking suite
for some of the cases where we expect to seem most benefit (IE: filtering)
so we can discuss with real numbers.
Also, the currently proposed format d
I created a Jira for adding RLE as ARROW-16771, and draft PRs:
- https://github.com/apache/arrow/pull/13330
Encode/Decode functions for (currently fixed width types only)
- https://github.com/apache/arrow/pull/1
For updating docs
Best,
Tobias
Am Dienstag, dem 31.05.2022 um 17:13 -0500 s
I think the biggest benefit of RLE is not on-the-wire compression, as that
can be done via more general purpose compression schemes as Antoine
mentions.
The biggest benefit of RLE is that it allows operating directly and very
efficiently on the "encoded" form -- for example, you can apply filters
Am Freitag, dem 03.06.2022 um 09:32 -0700 schrieb Micah Kornfield:
> >
> > Thinking about compatibility with existing software, RLE could
> > possibly
> > even made an Extension Type that follows the layout of a struct of
> > int32 and the encoded value type. I'm wondering wether this would
> > be
>
> Thinking about compatibility with existing software, RLE could possibly
> even made an Extension Type that follows the layout of a struct of
> int32 and the encoded value type. I'm wondering wether this would be
> better for compatibility.
I might be misunderstanding this proposal, but I don'
> Well, Arrow C++ does not have a notion of encoding distinct from the
> data type. Adding such a notion would risk breaking compatibility for
> all existing software that hasn't been upgraded to dispatch based on
> encoding.
Thinking about compatibility with existing software, RLE could possibl
Would it make sense to make a draft PR with your branch so that folks can
comment on specific parts of it?
Neal
On Wed, Jun 1, 2022 at 10:20 AM Tobias Zagorni
wrote:
> Am Dienstag, dem 31.05.2022 um 12:41 -0700 schrieb Micah Kornfield:
> >
> > - Should we allow multiple runs of the same value f
Am Dienstag, dem 31.05.2022 um 12:41 -0700 schrieb Micah Kornfield:
>
> - Should we allow multiple runs of the same value following each
> other?
> > Otherwise we would either need a pass to correct this after a lot
> > of
> > operations, or make RLE-aware versions of thier kernels.
>
> Is there
> I don't think replacing Scalar compute paths with dedicated paths for
> RLE-encoded data would ever be a simplification. Also, when a kernel
> hasn't been upgraded with a native path for RLE data, former Scalar
> Datums would now be expanded to the full RLE-decoded version before
> running the ke
I haven't had a chance to look at the branch in detail, but if you can
provide a pointer to a specification or other details about the
proposed memory format for RLE (basically: what would be added to the
columnar documentation as well as the Flatbuffers schema files), it
would be helpful so it can
Hi,
Am Dienstag, dem 31.05.2022 um 21:12 +0200 schrieb Antoine Pitrou:
>
> Hi,
>
> Le 31/05/2022 à 20:24, Tobias Zagorni a écrit :
> > Hi, I'm currently working on adding Run-Length encoding to arrow. I
> > created a function to dictionary-encode arrays here (currently only
> > for
> > fixed le
Le 31/05/2022 à 21:41, Micah Kornfield a écrit :
I'm currently working on adding Run-Length encoding to arrow.
Nice
What are the intended use cases for this:
- external engines want to provide run-length encoded data to work on
using arrow?
It is more than just external engines. Many p
>
> I'm currently working on adding Run-Length encoding to arrow.
Nice
> What are the intended use cases for this:
> - external engines want to provide run-length encoded data to work on
> using arrow?
>
It is more than just external engines. Many popular file formats support
RLE encoding. Bei
Hi,
Le 31/05/2022 à 20:24, Tobias Zagorni a écrit :
Hi, I'm currently working on adding Run-Length encoding to arrow. I
created a function to dictionary-encode arrays here (currently only for
fixed length types):
https://github.com/apache/arrow/compare/master...zagto:rle?expand=1
The general
Hi, I'm currently working on adding Run-Length encoding to arrow. I
created a function to dictionary-encode arrays here (currently only for
fixed length types):
https://github.com/apache/arrow/compare/master...zagto:rle?expand=1
The general idea is that RLE data will be a nested data type, with a
18 matches
Mail list logo