Re: Combining AVRO files efficiently within HDFS

2012-01-12 Thread Frank Grimes
As it turns out, this is due to our /tmp partition being too small. We'll either need to increase it or put hadoop.tmp.dir on a bigger partition. On 2012-01-11, at 4:29 PM, Frank Grimes wrote: Ok, so I wrote a MapReduce job to merge the files and it appears to be working with a limited input

Re: Combining AVRO files efficiently within HDFS

2012-01-06 Thread Joey Echeverria
I would do it by staging the machine data into a temporary directory and then renaming the directory when it's been verified. So, data would be written into directories like this: 2012-01/02/00/stage/machine1.log.avro 2012-01/02/00/stage/machine2.log.avro 2012-01/02/00/stage/machine3.log.avro

Re: Combining AVRO files efficiently within HDFS

2012-01-06 Thread Frank Grimes
Hi Joey, That's a very good suggestion and might suit us just fine. However, many of the files will be much smaller than the HDFS block size. That could affect the performance of the MapReduce jobs, correct? Also, from my understanding it would put more burden on the name node (memory usage)

RE: Combining AVRO files efficiently within HDFS

2012-01-06 Thread Dave Shine
- From: Frank Grimes [mailto:frankgrime...@gmail.com] Sent: Friday, January 06, 2012 2:56 PM To: hdfs-user@hadoop.apache.org Subject: Re: Combining AVRO files efficiently within HDFS Hi Joey, That's a very good suggestion and might suit us just fine. However, many of the files will be much smaller

Re: Combining AVRO files efficiently within HDFS

2012-01-06 Thread Steve Edison
a stream and combining files single threaded or trying to do something via command line. Dave -Original Message- From: Frank Grimes [mailto:frankgrime...@gmail.com] Sent: Friday, January 06, 2012 2:56 PM To: hdfs-user@hadoop.apache.org Subject: Re: Combining AVRO files efficiently within

Re: Combining AVRO files efficiently within HDFS

2012-01-06 Thread Joey Echeverria
I would use a MapReduce job to merge them. -Joey On Fri, Jan 6, 2012 at 11:55 AM, Frank Grimes frankgrime...@gmail.com wrote: Hi Joey, That's a very good suggestion and might suit us just fine. However, many of the files will be much smaller than the HDFS block size. That could affect the