Gzip is transparently handled by Hive (* by the formats available in Hive. If
it is a custom format it depends on it).. What format is the table (csv? Json?)
depending on that you simply choose the corresponding serde and it
transparently does the decompression. Keep in mind that gzip is not spl
pretty simple
--1 Move gz file or files into HDFS: Multiple files can be in that staging
directory with hdfs dfs -copyFromLocal /*.gz
hdfs://rhes564:9000/data/stg/
--2 Create an external table. Just one will do CREATE EXTERNAL TABLE stg_t2
... STORED AS TEXTFILE LOCATION '/data/stg/'
--3 Creat
Hi I have huge gzip on hdfs and |I'd like to create an external table on top of
them
Any code example? Cheers
Ps
I cannot use snappy or lzo for some constraints
--
Kind regards
Mario Amatucci
CG TB PS GDC PRAGUE THINK BIG