: Mon, 22 Sep 2014 17:21:29 +0200
From: iva...@vesseltracker.com
To: user@hadoop.apache.org
Subject: Re: Bzip2 files as an input to MR job
Hi Niels,
Thanks for the reply.
Changing the avro files is not really an option for me as it will
require a lot of time( i
Hi guys,
I would like to compress the files on HDFS to save some storage.
As far as i see bzip2 is the only format which is splitable (and slow).
The actual files are Avro.
So in my driver class i have :
job.setInputFormatClass(AvroKeyInputFormat.class);
I have number of jobs running
Hi,
You can use the GZip inside the AVRO files and still have splittable AVRO
files.
This has the to with the fact that there is a block structure inside the
AVRO and these blocks are gzipped.
I suggest you simply try it.
Niels
On Mon, Sep 22, 2014 at 4:40 PM, Georgi Ivanov
Hi Niels,
Thanks for the reply.
Changing the avro files is not really an option for me as it will
require a lot of time( i have a lot ).
The Avro files themself are compressed a bit.
But still bzip2 gives 50% compression on one avro file.
So what i want is , to use Bzip2 compressed file as an