Hi,
Here is a confusion I encounter these days: I don't install or build snappy on
my hadoop cluster, but when I tested and compared about the compression ratio
of Parquet and ORC storage format. During the test, I can set the way of
compression for two storage format, for example, using "TBLPROPERTIES
("orc.compress"="Snappy"); " or "set parquet.compression=snappy;", both these
commands would work. However, when I just want to compress the textfile format
with snappy compression, it says that "can not find or access the snappy
library".
I wonder why this situation happen, and, I really doubt that whether the ORC or
Parquet file using "Snappy" compression. But, the storage really becomes
smaller, and diff from the "gzip" or "zlib" compression.
Looking forward to your reply and help.
Best,
Zhefu Peng