Hi Niels,
Thanks for the reply.
Changing the avro files is not really an option for me as it will
require a lot of time( i have a lot ).
The Avro files themself are compressed a bit.
But still bzip2 gives 50% compression on one avro file.
So what i want is , to use Bzip2 compressed file as an i
Hi,
You can use the GZip inside the AVRO files and still have splittable AVRO
files.
This has the to with the fact that there is a block structure inside the
AVRO and these blocks are gzipped.
I suggest you simply try it.
Niels
On Mon, Sep 22, 2014 at 4:40 PM, Georgi Ivanov
wrote:
> Hi guys,
Hi guys,
I would like to compress the files on HDFS to save some storage.
As far as i see bzip2 is the only format which is splitable (and slow).
The actual files are Avro.
So in my driver class i have :
job.setInputFormatClass(AvroKeyInputFormat.class);
I have number of jobs running processi
While on the subject,
You can also use the bigpetstore application to do this, in apache bigtop.
This data is suited well for hbase ( semi structured, transactional, and
features some global patterns which can make for meaningful queries and so on).
Clone apache/bigtop
cd bigtop-bigpetstore
gra
Hi,
I need to generate large amount of test data (4TB) into Hadoop, has anyone used
PDGF to do so? Could you share your cook book about PDGF in Hadoop (or HBase)?
Many Thanks
Arthur