> Has there been any study of how much compressing Hive Parquet tables with > snappy reduces storage space or simply the table size in quantitative terms?
http://www.slideshare.net/oom65/file-format-benchmarks-avro-json-orc-parquet/20 Since SNAPPY is just LZ77, I would assume it would be useful in cases of Parquet leaves containing text with large common sub-chunks (like URLs or log data). If you want to experiment with that corner case, the L_COMMENT field from TPC-H lineitem is a good compression-thrasher. Cheers, Gopal