Hi,

You can use the GZip inside the AVRO files and still have splittable AVRO
files.
This has the to with the fact that there is a block structure inside the
AVRO and these blocks are gzipped.

I suggest you simply try it.

Niels


On Mon, Sep 22, 2014 at 4:40 PM, Georgi Ivanov <iva...@vesseltracker.com>
wrote:

> Hi guys,
> I would like to compress the files on HDFS to save some storage.
>
> As far as i see bzip2 is the only format which is splitable (and slow).
>
> The actual files are Avro.
>
> So in my driver class i have :
>
> job.setInputFormatClass(AvroKeyInputFormat.class);
>
> I have number of jobs running processing Avro files so i would like to
> keep the code change to a minimum.
>
> Is it possible to comrpess these avro files with bzip2 and keep the code
> of MR jobs the same (or with little change)
> If it is , please give me some hints as so far i don't seem to find any
> good resources on the Internet.
>
>
> Georgi
>



-- 
Best regards / Met vriendelijke groeten,

Niels Basjes

Reply via email to