RE: Bzip2 files as an input to MR job

2014-09-23 Thread java8964
: Mon, 22 Sep 2014 17:21:29 +0200 From: iva...@vesseltracker.com To: user@hadoop.apache.org Subject: Re: Bzip2 files as an input to MR job Hi Niels, Thanks for the reply. Changing the avro files is not really an option for me as it will require a lot of time( i

Bzip2 files as an input to MR job

2014-09-22 Thread Georgi Ivanov
Hi guys, I would like to compress the files on HDFS to save some storage. As far as i see bzip2 is the only format which is splitable (and slow). The actual files are Avro. So in my driver class i have : job.setInputFormatClass(AvroKeyInputFormat.class); I have number of jobs running

Re: Bzip2 files as an input to MR job

2014-09-22 Thread Niels Basjes
Hi, You can use the GZip inside the AVRO files and still have splittable AVRO files. This has the to with the fact that there is a block structure inside the AVRO and these blocks are gzipped. I suggest you simply try it. Niels On Mon, Sep 22, 2014 at 4:40 PM, Georgi Ivanov

Re: Bzip2 files as an input to MR job

2014-09-22 Thread Georgi Ivanov
Hi Niels, Thanks for the reply. Changing the avro files is not really an option for me as it will require a lot of time( i have a lot ). The Avro files themself are compressed a bit. But still bzip2 gives 50% compression on one avro file. So what i want is , to use Bzip2 compressed file as an