Hey Evan,
thank you for the interest.
There has been some effort for compressing floating-point data on the Parquet
side, namely the BYTE_STREAM_SPLIT encoding. On its own it does not compress
floating point data but makes it more compressible for when a compressor, such
as ZSTD, LZ4, etc, is
Dear all,
since this patch modifies the API and touches a lot of files to propagate the
information through the stack, it would be great to receive some more
constructive reviews on what makes sense and what doesn't.
Patch:
https://github.com/apache/arrow/pull/5071
[C++] Expose codec compr
> I hope this feature can be implemented in Arrow soon, so that we can use
> it
> > in our system.
> >
> > Best,
> > Liya Fan
> >
> > On Thu, Jul 11, 2019 at 5:55 PM Radev, Martin
> wrote:
> >
> > > Hello Liya Fan,
> > >
> >
n Thu, Jul 11, 2019 at 5:15 PM Radev, Martin wrote:
> Hello people,
>
>
> there has been discussion in the Apache Parquet mailing list on adding a
> new encoder for FP data.
> The reason for this is that the supported compressors by Apache Parquet
> (zstd, gzip, etc) do not
Hello people,
there has been discussion in the Apache Parquet mailing list on adding a new
encoder for FP data.
The reason for this is that the supported compressors by Apache Parquet (zstd,
gzip, etc) do not compress well raw FP data.
In my investigation it turns out that a very simple simpl