Hi HBase community,

Our official documentation suggests that LZ4 compresses better than SNAPPY.

> Snappy is ... as fast as LZ4 but does not compress quite as well.

Recently, I changed one of our SNAPPY tables to LZ4, performed a major
compaction, and was surprised to find that LZ4 did not provide a better
compression ratio. The size remained almost the same as before – slightly
larger, to be precise.

In fact, I found that LZ4, in some cases, resulted in a much worse
compression ratio than SNAPPY. For instance, using the
PerformanceEvaluation tool:

    bin/hbase pe --nomapred --table=pe-lz4    --compress=LZ4
 --blockEncoding=FAST_DIFF --size=2 --presplit=4 sequentialWrite 1
    bin/hbase pe --nomapred --table=pe-snappy --compress=SNAPPY
--blockEncoding=FAST_DIFF --size=2 --presplit=4 sequentialWrite 1

    # From HBase Shell
    list.each { |t| flush t; major_compact t }

The resulting SNAPPY table was 467MB, while the LZ4 table was 678MB.

* I tried changing the valueSize from 100 to 10000, but the LZ4 table was
consistently larger by 30 to 40%.
* Removing --blockEncoding didn't make much difference
* And with --size=1024, LZ4 table was 370GB, but SNAPPY table was only
255GB.

Any idea why the compression ratio of LZ4 is much worse in this particular
case? Given these findings, does it still make sense to suggest in our
official documentation that LZ4 compresses better?

What has been your experience with this?
Thanks,
Junegunn Choi.

Reply via email to