Re: Migrating from mapred to mapreduce API

2010-11-18 Thread Varene Olivier
In Hadoop V0.20.2, I had the same choices to do ... The result are : whenever I can work with the new API, I do if not, I continue working with the OLD API (as for MultipleOutputFormat) The whole process is thus a mix of new API jobs mixed with old API jobs : works perfectly. MultipleOutputForm

potential bug in InputSampler, hadoop 0.21.0

2010-11-18 Thread exception
Hi all, I probably find a bug in InputSamper, under hadoop 0.21.0. In the file InputSampler.java under package org.apache.hadoop.mapreduce.lib.partition, inside function getSample, a record reader is created but not initialized. So when trying to use the record reader, an exception will be thro

Re: Migrating from mapred to mapreduce API

2010-11-18 Thread Srihari Anantha Padmanabhan
Thank you. I will check it out. On Nov 18, 2010, at 4:29 PM, Ted Yu wrote: You can get the source code here: http://archive.cloudera.com/cdh/3/hadoop-0.20.2+320.tar.gz On Thu, Nov 18, 2010 at 4:21 PM, Srihari Anantha Padmanabhan mailto:sriha...@yahoo-inc.com>> wrote: I am using Hadoop 0.20.2.

Re: Migrating from mapred to mapreduce API

2010-11-18 Thread Ted Yu
You can get the source code here: http://archive.cloudera.com/cdh/3/hadoop-0.20.2+320.tar.gz On Thu, Nov 18, 2010 at 4:21 PM, Srihari Anantha Padmanabhan < sriha...@yahoo-inc.com> wrote: > I am using Hadoop 0.20.2. > > On Nov 18, 2010, at 4:14 PM, Ted Yu wrote: > > > hadoop > >

Re: Migrating from mapred to mapreduce API

2010-11-18 Thread Srihari Anantha Padmanabhan
I am using Hadoop 0.20.2. On Nov 18, 2010, at 4:14 PM, Ted Yu wrote: > hadoop

Re: Migrating from mapred to mapreduce API

2010-11-18 Thread Ted Yu
Sorry to mention that I was searching in cdh3b2 tree. What distro of hadoop are you using ? On Thu, Nov 18, 2010 at 4:06 PM, Srihari Anantha Padmanabhan < sriha...@yahoo-inc.com> wrote: > I think MultipleOutputs.java is a part of mapred and not mapreduce. Please > correct me if I am wrong. > > I

Re: Migrating from mapred to mapreduce API

2010-11-18 Thread Srihari Anantha Padmanabhan
I think MultipleOutputs.java is a part of mapred and not mapreduce. Please correct me if I am wrong. I can find only the following classes under mapreduce/lib/output FileOutputCommitter.java FileOutputFormat.java NullOutputFormat.java SequenceFileOutputFormat.java TextOutputFormat.java On

Re: Migrating from mapred to mapreduce API

2010-11-18 Thread Ted Yu
Have you looked at src/mapred/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.java ? On Thu, Nov 18, 2010 at 3:15 PM, Srihari Anantha Padmanabhan < sriha...@yahoo-inc.com> wrote: > Hi, > > I am working on migrating a mapreduce program from using > org.apache.hadoop.mapred to org.apache.had

Migrating from mapred to mapreduce API

2010-11-18 Thread Srihari Anantha Padmanabhan
Hi, I am working on migrating a mapreduce program from using org.apache.hadoop.mapred to org.apache.hadoop.mapreduce APIs. The program currently uses orf.apache.hadoop.mapred.lib.MultipleOutputFormat. I could not find any equivalent class in mapreduce. Can anyone suggest an equivalent class or

Re: Help tuning a cluster - COPY slow

2010-11-18 Thread Tim Robertson
Just to close this thread. Turns out it all came down to a mapred.reduce.parallel.copies being overwritten to 5 on the Hive submission. Cranking that back up and everything is happy again. Thanks for the ideas, Tim On Thu, Nov 18, 2010 at 11:04 AM, Tim Robertson wrote: > Thanks again. > > We

hadoop input sampler

2010-11-18 Thread exception
Hi all, I am trying to sample the key distribution before making a total sort. But the programs failed and throw an exception. This is the stack: Exception in thread "main" java.lang.NullPointerException at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.nextKeyValue(LineRecordRe

Re: Help tuning a cluster - COPY slow

2010-11-18 Thread Tim Robertson
Thanks again. We are getting closer to debugging this. Our reference for all these tests was a simple GroupBy using Hive, but when I do a vanilla MR job on the tab file input to do the same group by, it flies through - almost exactly 2 times quicker. Investigating further as it is not quite a fa

Re: Help tuning a cluster - COPY slow

2010-11-18 Thread Friso van Vollenhoven
Do you have IPv6 enabled on the boxes? If DNS gives both IPv4 and IPv6 results for lookups, Java will try v6 first and then fall back to v4, which is an additional connect attempt. You can force Java to use only v4 by setting the system property java.net.preferIPv4Stack=true. Also, I am not sur