applying patch HADOOP-6835 to hadoop-0.20.2

2010-09-10 Thread Lewis Crawford
Hi, I am trying unsuccessfully to apply a patch (HADOOP-6835) to hadoop-0.20.2 (64bit Ubuntu 10.04) I have downloaded the tar.gz and can build the project - I tried to apply the patch from https://issues.apache.org/jira/browse/HADOOP-6835 (specifically

Re: Problem in copyFromLocal

2010-09-10 Thread leibnitz
is it running in safemode?hadoop wil run in this case when start for a moment. -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-in-copyFromLocal-tp1446688p1453684.html Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

Custom Key class not working correctly

2010-09-10 Thread Aaron Baff
So I'm pretty new to Hadoop, just learning it for work, and starting to play with some of our data on a VM cluster to see it work, and to make sure it can do what we need to. By and large, very cool, I think I'm getting the hang of it, but when I try and make a custom composite key class, it

Re: Problem in copyFromLocal

2010-09-10 Thread Medha Atre
On Fri, Sep 10, 2010 at 1:08 PM, leibnitz se3g2...@gmail.com wrote: is it running in safemode?hadoop wil run in this case when start for a How do I find that out if it's running in safemode? After the clue about datanode failures given in earlier replies, I did check the datanode logs and they

Question on classpath

2010-09-10 Thread Mark
If I submit a jar that has a lib directory that contains a bunch of jars, shouldn't those jars be in the classpath and available to all nodes? The reason I ask this is because I am trying to submit a jar myjar.jar that has the following structure --src \ (My source classes) -- lib \

Re: Question on classpath

2010-09-10 Thread Allen Wittenauer
On Sep 10, 2010, at 11:53 AM, Mark wrote: If I submit a jar that has a lib directory that contains a bunch of jars, shouldn't those jars be in the classpath and available to all nodes? Are you using distributed cache?

Re: applying patch HADOOP-6835 to hadoop-0.20.2

2010-09-10 Thread Greg Roelofs
Lewis Crawford wrote: I am trying unsuccessfully to apply a patch (HADOOP-6835) to hadoop-0.20.2 (64bit Ubuntu 10.04) using ant on the command line I was able to build the project again and generate a new jar hadoop-0.20.3-dev-core.jar which I copied back into the $HADOOP_HOME and started

Re: applying patch HADOOP-6835 to hadoop-0.20.2

2010-09-10 Thread Lewis Crawford
Yes that seems to have done the trick! Thanks Lewis. On 10 September 2010 20:39, Greg Roelofs roel...@yahoo-inc.com wrote: Lewis Crawford wrote: I am trying unsuccessfully to apply a patch (HADOOP-6835) to hadoop-0.20.2 (64bit Ubuntu 10.04) using ant on the command line I was able to

Re: applying patch HADOOP-6835 to hadoop-0.20.2

2010-09-10 Thread Neil Ghosh
Hello , I am new to Hadoop.Can anybody suggest any example or procedure of outputting TOP N items having maximum total count, where the input file has have (Item, count ) pair in each line . Items can repeat. Thanks Neil http://neilghosh.com

TOP N items

2010-09-10 Thread Neil Ghosh
Hello , I am new to Hadoop.Can anybody suggest any example or procedure of outputting TOP N items having maximum total count, where the input file has have (Item, count ) pair in each line . Items can repeat. Thanks Neil http://neilghosh.com -- Thanks and Regards Neil http://neilghosh.com

Re: TOP N items

2010-09-10 Thread James Seigel
Welcome to the land of the fuzzy elephant! Of course there are many ways to do it. Here is one, it might not be brilliant or the right was, but I am sure you will get more :) Use the identity mapper... job.setMapperClass(Mapper.class); then have one reducer

Re: Question on classpath

2010-09-10 Thread Mark
I dont know? I'm running in a fully distributed environment.. ie not local or psuedo. On 9/10/10 12:03 PM, Allen Wittenauer wrote: On Sep 10, 2010, at 11:53 AM, Mark wrote: If I submit a jar that has a lib directory that contains a bunch of jars, shouldn't those jars be in the classpath

Re: Hadoop 0.21.0 release Maven repo

2010-09-10 Thread Tom White
Hi Sonal, The 0.21.0 jars are not available in Maven yet, since the process for publishing them post split has changed. See HDFS-1292 and MAPREDUCE-1929. Cheers, Tom On Fri, Sep 10, 2010 at 1:33 PM, Sonal Goyal sonalgoy...@gmail.com wrote: Hi, Can someone please point me to the Maven repo

Re: TOP N items

2010-09-10 Thread Neil Ghosh
Thanks James, This gives me only N results for sure but not necessarily the top N I have used the Item as Key and Count as Value as input to the reducer. and my reducing logic is to sum the count for a particular item. Now my output comes as grouped but not in order. Do I need to use custom

Re: TOP N items

2010-09-10 Thread Neil Ghosh
Thanks Aaron. I employed two Jobs and solved the problem. I was just wondering is there anyway , it can be done in single job so that disk/network I/O is less and no temporary storage is required between 1st and second job. Neil On Sat, Sep 11, 2010 at 4:37 AM, Aaron Baff

Re: TOP N items

2010-09-10 Thread Alex Kozlov
Hi Neil, Uniques and Top N, as well as percentiles, are inherently difficult to distribute/parallelize since you have to have a global view of the dataset. You can optimize the computations given some assumptions about the input (the # of unique values, prevalence of the most frequent value

Re: TOP N items

2010-09-10 Thread Neil Ghosh
Hi Alex , Thanks so much for the reply . As of now I don't have any issue with 2 Jobs.I was just making sure that I am not missing any obvious way of writing the program in one job.I will get back if I need to optimize on performance based on specific pattern of input. Thank you so much you all

Re: Question on classpath

2010-09-10 Thread Mark
If I deploy 1 jar (that contains a lib directory with all the required dependencies) shouldn't that jar be inherently be distributed to all the nodes? On 9/10/10 2:49 PM, Mark wrote: I dont know? I'm running in a fully distributed environment.. ie not local or psuedo. On 9/10/10 12:03 PM,

RE: Custom Key class not working correctly

2010-09-10 Thread Kaluskar, Sanjay
Have you considered using something higher-level like PIG or Hive? Are there reasons why you need to process at this low level? -Original Message- From: Aaron Baff [mailto:aaron.b...@telescope.tv] Sent: Friday, September 10, 2010 11:50 PM To: common-user@hadoop.apache.org Subject: Custom

Re: Custom Key class not working correctly

2010-09-10 Thread James Seigel
Is the footer on this email a little rough for content that will be passed around and made indexable on the internets? Just saying :) Cheers James Sent from my mobile. Please excuse the typos. On 2010-09-10, at 8:01 PM, Kaluskar, Sanjay skalus...@informatica.com wrote: Have you considered

Re: Question on classpath

2010-09-10 Thread James Seigel
Are the libs exploded inside the main jar? If not then no it probably won't work. James Sent from my mobile. Please excuse the typos. On 2010-09-10, at 7:43 PM, Mark static.void@gmail.com wrote: If I deploy 1 jar (that contains a lib directory with all the required dependencies)

Re: TOP N items

2010-09-10 Thread Runping Qi
Assuming N is not too large in the sense that your reducers can keep a tree map of N elements, then you can have your reducer maintain the top N elements in a tree-map (or a priority queue, or a heap, whatever), with counts as keys in the tree-map. As the reducers progress, you throw away the