Agreed, but even then, if some Parquet files are generated inside of a
well-defined system which only needs to be interoperable with itself,
it's not necessaril harmful to allow LZ4 compression when writing new files.
Regards
Antoine.
Le 13/07/2020 à 17:07, Wes McKinney a écrit :
> I didn’t s
On Mon, Jul 13, 2020 at 11:15 AM Antoine Pitrou wrote:
>
>
> I'm not sure that's a good idea. There are probably Parquet files that
> are only ever used with the Arrow implementation (Arrow C++, Arrow
> Python, Arrow R...).
I tend to agree with Antoine here. As an alternative to disabling the
co
I didn’t say to disable _reading_ them, only writing them.
On Mon, Jul 13, 2020 at 4:15 AM Antoine Pitrou wrote:
>
> I'm not sure that's a good idea. There are probably Parquet files that
> are only ever used with the Arrow implementation (Arrow C++, Arrow
> Python, Arrow R...).
>
> I admit I'm
I'll volunteer to disable writing/reading LZ4. I'll submit a patch in the next
few days.
On 2020/07/12 22:11:33, Wes McKinney wrote:
> Since there hasn't been other movement on this, we need to disable
> writing LZ4-compressed files until this can be investigated more
> thoroughly. If someone w
I'm not sure that's a good idea. There are probably Parquet files that
are only ever used with the Arrow implementation (Arrow C++, Arrow
Python, Arrow R...).
I admit I'm also not terribly bothered about this, since the Parquet
community itself doesn't seem to care much about the issue (it has
Since there hasn't been other movement on this, we need to disable
writing LZ4-compressed files until this can be investigated more
thoroughly. If someone wants to submit a patch that would be helpful
otherwise I can take a look in the next couple days
On Thu, Jul 2, 2020 at 12:50 PM Antoine Pitro
On Mon, Jul 6, 2020 at 11:08 AM Antoine Pitrou wrote:
>
>
> Le 06/07/2020 à 17:57, Steve Kim a écrit :
> > The Parquet format specification is ambiguous about the exact details of
> > LZ4 compression. However, the *de facto* reference implementation in Java
> > (parquet-mr) uses the Hadoop LZ4 cod
> Would that keep compatibility with existing files produces by Parquet C++?
Changing the lz4 implementation to be compatible with parquet-mr/hadoop
would break compatibility with any existing files that were written by
Parquet C++ using lz4 compression. I believe that it is not possible to
reliab
Le 06/07/2020 à 17:57, Steve Kim a écrit :
> The Parquet format specification is ambiguous about the exact details of
> LZ4 compression. However, the *de facto* reference implementation in Java
> (parquet-mr) uses the Hadoop LZ4 codec.
>
> I think that it is important for Parquet c++ to have com
The Parquet format specification is ambiguous about the exact details of
LZ4 compression. However, the *de facto* reference implementation in Java
(parquet-mr) uses the Hadoop LZ4 codec.
I think that it is important for Parquet c++ to have compatibility and
feature parity with parquet-mr when poss
Well, it depends how important speed is, but LZ4 has extremely fast
decompression, even compared to Snappy:
https://github.com/lz4/lz4#benchmarks
Regards
Antoine.
Le 02/07/2020 à 19:47, Christian Hudon a écrit :
> At least for us, the advantages of Parquet are speed and interoperability
> in
At least for us, the advantages of Parquet are speed and interoperability
in the context of longer-term data storage, so I would tend to say
"reasonably conservative".
Le mer. 1 juill. 2020, à 09 h 32, Antoine Pitrou a
écrit :
>
> I don't have a sense of how conservative Parquet users generally
I don't have a sense of how conservative Parquet users generally are.
Is it worth adding a LZ4_FRAMED compression option in the Parquet
format, or would people just not use it?
Regards
Antoine.
On Tue, 30 Jun 2020 14:33:17 +0200
"Uwe L. Korn" wrote:
> I'm also in favor of disabling support f
I'm also in favor of disabling support for now. Having to deal with broken
files or the detection of various incompatible implementations in the long-term
will harm more than not supporting LZ4 for a while. Snappy is generally more
used than LZ4 in this category as it has been available since th
On Thu, Jun 25, 2020 at 3:31 AM Antoine Pitrou wrote:
>
>
> Le 25/06/2020 à 00:02, Wes McKinney a écrit :
> > hi folks,
> >
> > (cross-posting to dev@arrow and dev@parquet since there are
> > stakeholders in both places)
> >
> > It seems there are still problems at least with the C++ implementatio
Le 25/06/2020 à 00:02, Wes McKinney a écrit :
> hi folks,
>
> (cross-posting to dev@arrow and dev@parquet since there are
> stakeholders in both places)
>
> It seems there are still problems at least with the C++ implementation
> of LZ4 compression in Parquet files
>
> https://issues.apache.or
hi folks,
(cross-posting to dev@arrow and dev@parquet since there are
stakeholders in both places)
It seems there are still problems at least with the C++ implementation
of LZ4 compression in Parquet files
https://issues.apache.org/jira/browse/PARQUET-1241
https://issues.apache.org/jira/browse/P
17 matches
Mail list logo