Zip is a file format, not a codec.  Various codecs are employed in Zip archives, most commonly DEFLATE.  The different set of codecs that are supported in the Parquet file format are described in https://github.com/apache/parquet-format/blob/master/Compression.md. Since, then, Zip is not sensible or possible inside a Parquet file, the only way to effect what you describe would be to embed a Parquet file inside a Zip archive.  This would be perverse and misguided but possibly still queryable since Drill might transparently do the right things to decode it anyway.  Using a supported codec within the Parquet file format and forgetting about Zip is certainly a better approach.  If you want compression ratios comparable to those found in Zip files then you would choose GZip and pay with CPU cycles.  When Drill gains support for Zstandard there will be little reason to choose anything else.

On 2021/06/17 18:59, Leyne, Sean wrote:
Luoc,

   Could you please tell me first which case you are talking about? Only
write(CTAS syntax) or read(SELECT)?
Really both, since you need a mechanism to create the zip'd parquet file to 
begin with.  Having to create a special/side process to zip the file outside of 
drill would be ... awkward.


Sean

在 2021年6月16日,02:26,Leyne, Sean
<[email protected]> 写道:
All,

The documentation describes that gzip/gz compression as supported for
text files, and that snappy and gzip are support for parquet files.
I have also read that zip compression was also added (though not
documented) for text files.

But is zip also supported for parquet files?

What about support for other compression algorithms/methods?  LZ4?
Bzip2? zstd??

Sean




Reply via email to