As it turns out, this is due to our /tmp partition being too small.
We'll either need to increase it or put hadoop.tmp.dir on a bigger partition.
On 2012-01-11, at 4:29 PM, Frank Grimes wrote:
Ok, so I wrote a MapReduce job to merge the files and it appears to be
working with a limited input
I would do it by staging the machine data into a temporary directory
and then renaming the directory when it's been verified. So, data
would be written into directories like this:
2012-01/02/00/stage/machine1.log.avro
2012-01/02/00/stage/machine2.log.avro
2012-01/02/00/stage/machine3.log.avro
Hi Joey,
That's a very good suggestion and might suit us just fine.
However, many of the files will be much smaller than the HDFS block size.
That could affect the performance of the MapReduce jobs, correct?
Also, from my understanding it would put more burden on the name node (memory
usage)
-
From: Frank Grimes [mailto:frankgrime...@gmail.com]
Sent: Friday, January 06, 2012 2:56 PM
To: hdfs-user@hadoop.apache.org
Subject: Re: Combining AVRO files efficiently within HDFS
Hi Joey,
That's a very good suggestion and might suit us just fine.
However, many of the files will be much smaller
a stream and combining files single threaded or trying
to do something via command line.
Dave
-Original Message-
From: Frank Grimes [mailto:frankgrime...@gmail.com]
Sent: Friday, January 06, 2012 2:56 PM
To: hdfs-user@hadoop.apache.org
Subject: Re: Combining AVRO files efficiently within
I would use a MapReduce job to merge them.
-Joey
On Fri, Jan 6, 2012 at 11:55 AM, Frank Grimes frankgrime...@gmail.com wrote:
Hi Joey,
That's a very good suggestion and might suit us just fine.
However, many of the files will be much smaller than the HDFS block size.
That could affect the