Hi, You can use the GZip inside the AVRO files and still have splittable AVRO files. This has the to with the fact that there is a block structure inside the AVRO and these blocks are gzipped.
I suggest you simply try it. Niels On Mon, Sep 22, 2014 at 4:40 PM, Georgi Ivanov <iva...@vesseltracker.com> wrote: > Hi guys, > I would like to compress the files on HDFS to save some storage. > > As far as i see bzip2 is the only format which is splitable (and slow). > > The actual files are Avro. > > So in my driver class i have : > > job.setInputFormatClass(AvroKeyInputFormat.class); > > I have number of jobs running processing Avro files so i would like to > keep the code change to a minimum. > > Is it possible to comrpess these avro files with bzip2 and keep the code > of MR jobs the same (or with little change) > If it is , please give me some hints as so far i don't seem to find any > good resources on the Internet. > > > Georgi > -- Best regards / Met vriendelijke groeten, Niels Basjes