Zip is a file format, not a codec. Various codecs are employed in Zip
archives, most commonly DEFLATE. The different set of codecs that are
supported in the Parquet file format are described in
https://github.com/apache/parquet-format/blob/master/Compression.md.
Since, then, Zip is not sensible or possible inside a Parquet file, the
only way to effect what you describe would be to embed a Parquet file
inside a Zip archive. This would be perverse and misguided but possibly
still queryable since Drill might transparently do the right things to
decode it anyway. Using a supported codec within the Parquet file
format and forgetting about Zip is certainly a better approach. If you
want compression ratios comparable to those found in Zip files then you
would choose GZip and pay with CPU cycles. When Drill gains support for
Zstandard there will be little reason to choose anything else.
On 2021/06/17 18:59, Leyne, Sean wrote:
Luoc,
Could you please tell me first which case you are talking about? Only
write(CTAS syntax) or read(SELECT)?
Really both, since you need a mechanism to create the zip'd parquet file to
begin with. Having to create a special/side process to zip the file outside of
drill would be ... awkward.
Sean
在 2021年6月16日,02:26,Leyne, Sean
<[email protected]> 写道:
All,
The documentation describes that gzip/gz compression as supported for
text files, and that snappy and gzip are support for parquet files.
I have also read that zip compression was also added (though not
documented) for text files.
But is zip also supported for parquet files?
What about support for other compression algorithms/methods? LZ4?
Bzip2? zstd??
Sean