Comparing input hdfs file to a distributed cache files

2012-07-20 Thread Shanu Sushmita
Hi, I am trying to solve a problem where I need to computed frequencies of words occurring in a file1 from file 2. For example: text in file1: hadoop user hello world and text in file2 is: hadoop user hello world hadoop hadoop hadoop user world world world hadoop user hello so the output sh

Re: Comparing input hdfs file to a distributed cache files

2012-07-20 Thread Shanu Sushmita
hat you want to run your word > frequencies on, to a HDFS file path and then read the job off it. You will > need to configure your MR job to pick input files from your new HDFS > location. > > This would be my approach. Other regulars in the forum will be better able > to help you! > >

Re: Comparing input hdfs file to a distributed cache files

2012-07-20 Thread Shanu Sushmita
ck at something. Logs are observable for each task attempt via the JT Web UI/etc.. On Fri, Jul 20, 2012 at 9:43 PM, Shanu Sushmita wrote: Hi, I am trying to solve a problem where I need to computed frequencies of words occurring in a file1 from file 2. For example: text in file1: hadoop user

Re: Comparing input hdfs file to a distributed cache files

2012-07-20 Thread Shanu Sushmita
e 0% Why is it taking so much time here? The max heap size looks fine (see highlighted in green text) I am wondering about the texts highlighted in red. Do you think that could be the problem? Sorry for asking such basic questions. I am just clueless right now. SS On Fri, Jul 20, 2012 at 9:46 AM,

Re: Fail to start mapreduce tasks across nodes

2012-07-20 Thread Shanu Sushmita
yes we can see it :-) SS On 20 Jul 2012, at 12:15, Steve Sonnenberg wrote: Sorry this is my first posting and I haven't gotten a copy nor any response. Could someone please respond if you are seeing this? Thanks, Newbie On Fri, Jul 20, 2012 at 12:36 PM, Steve Sonnenberg > wrote: I have a 2-