> Has there been any study of how much compressing Hive Parquet tables with 
> snappy reduces storage space or simply the table size in quantitative terms?

http://www.slideshare.net/oom65/file-format-benchmarks-avro-json-orc-parquet/20

Since SNAPPY is just LZ77, I would assume it would be useful in cases of 
Parquet leaves containing text with large common sub-chunks (like URLs or log 
data).

If you want to experiment with that corner case, the L_COMMENT field from TPC-H 
lineitem is a good compression-thrasher.

Cheers,
Gopal


Reply via email to