Gzip is transparently handled by Hive (* by the formats available in Hive. If 
it is a custom format it depends on it).. What format is the table (csv? Json?) 
depending on that you simply choose the corresponding serde and it 
transparently does the decompression. Keep in mind that gzip is not splittable 
that means it cannot be decompressed in parallel. Try to go for bzip2 to enable 
parallel decompression it or split the large file in several smaller files (at 
minimum the size of a HDFS block).

> On 19 Jul 2016, at 13:03, Amatucci, Mario, Vodafone Group 
> <[email protected]> wrote:
> 
>  
> Hi I have huge gzip on hdfs and |I’d like to create an external table on top 
> of them
> Any code example? Cheers
> Ps
> I cannot use snappy or lzo for some constraints
>  
> --
> Kind regards
> Mario Amatucci
> CG TB PS GDC PRAGUE THINK BIG
>  

Reply via email to