Re: what does job tracker status reduce > copy mean?

2010-09-25 Thread Neil Xu
Hi, Vitaliy a reducer has 3 phases to process data, copy(33%)=>sort(66%)=>reduce(100%). the information shows that the reducer is at the first phase, pulling data from mappers, sometimes the progress may keep 33% for a long time due to the limit of network bandwidth when too large data pulled to a

Re: Remote connection bottleneck?

2010-09-25 Thread Mario M
In the ssh I can't execute local files while my session is open... or can I? That is why I use a different shell. Also, all the tutorials related to running hadoop remotely use this method, though I might be missing something :S 2010/9/25 Ted Yu > > In the new shell, I wento to the hadoop/bin d

Re: Remote connection bottleneck?

2010-09-25 Thread Ted Yu
> In the new shell, I wento to the hadoop/bin directory in my computer Why didn't you issue the command from window which had ssh ? On Sat, Sep 25, 2010 at 6:53 PM, Mario M wrote: > Hi, > what I did was this: > > I am working with Cygwin in Windows 7. > > - I copied my jar file ITESMCEMdebug.ja

Re: Remote connection bottleneck?

2010-09-25 Thread Mario M
Hi, what I did was this: I am working with Cygwin in Windows 7. - I copied my jar file ITESMCEMdebug.jar to the cluster in the directory /home/mariom . (I then connected with the ssh and confirmed that it is there). - I left the ssh window open and opened another cygwin shell. - In the new shel

Re: Remote connection bottleneck?

2010-09-25 Thread Ted Yu
Mario: Can you show us the error when you run the following ? "hadoop jar " Hello, >> please excuse my ignorance, but how can I run it from there? >> Up to now I've been running the programs with "hadoop jar >> ". >> >> I tried copying the jar to the HDFS and using "hadoop jar >> " but that di

Re: Log file questions

2010-09-25 Thread Mario M
Hi, it doesn't include the time needed to divide the input into splits for each map task, that I can tell you for sure (e.g. my program takes 1 minute processing and 30 minutes dividing the input, but the log only shows one minute). Mario M 2010/9/25 Han Dong > Hi, > > I have a question regardi

Re: Remote connection bottleneck?

2010-09-25 Thread Mario M
Yes, the program does run with with "hadoop jar ", and the manifest file has the statement of the main file. The problem is that the process of deciding the inputsplits for the map phase (with NLineInputFormat ) takes more than 30 min with slow connection to the cluster. It spends half an hour doi

Re: Remote connection bottleneck?

2010-09-25 Thread Raja Thiruvathuru
Did u defined the Main Class in the manifest file? On Sat, Sep 25, 2010 at 12:27 PM, Mario M wrote: > Hello, > please excuse my ignorance, but how can I run it from there? > Up to now I've been running the programs with "hadoop jar > ". > > I tried copying the jar to the HDFS and using "hadoop

Re: Remote connection bottleneck?

2010-09-25 Thread Mario M
Hello, please excuse my ignorance, but how can I run it from there? Up to now I've been running the programs with "hadoop jar ". I tried copying the jar to the HDFS and using "hadoop jar " but that didn't work (file not found), so I went to the ssh connection and copied the jar to my directory i

Re: Remote connection bottleneck?

2010-09-25 Thread Ted Yu
Mario: Please produce a jar, place it on one of the servers in the cloud and run from there. On Sat, Sep 25, 2010 at 7:46 AM, Raja Thiruvathuru wrote: > MapReduce doesn't download the actual data, but it reads meta-data before > it starts MapReduce job > > > On Sat, Sep 25, 2010 at 7:55 AM, Mario

Re: Remote connection bottleneck?

2010-09-25 Thread Raja Thiruvathuru
MapReduce doesn't download the actual data, but it reads meta-data before it starts MapReduce job On Sat, Sep 25, 2010 at 7:55 AM, Mario M wrote: > Hello, > I am having a problem that might be expected behaviour. I am using a cloud > with Hadoop remotely through ssh. I have a program that runs f

Re: JobClient using deprecated JobConf

2010-09-25 Thread Martin Becker
Hello David, thanks a lot. Yet I want java code to submit my application. I do not want to mess with any kind of command line arguments or an executable, neither Java nor Hadoop. I want to write a method that can set up and submit a job to an arbitrary cluster. Something like calling CustomJ

Re: Executing Eclipse Plugin

2010-09-25 Thread Johannes.Lichtenberger
On 09/25/2010 02:52 PM, Johannes.Lichtenberger wrote: > Hi, > > I've a problem using the eclipse plugin... I've started Hadoop/MapReduce > 0.20.2 with `bin/start-mapred.sh` or `bin/start-all.sh` and then > executed the main class within Eclipse "Run on hadoop", but then > everytime it reads a bigg

Executing Eclipse Plugin

2010-09-25 Thread Johannes.Lichtenberger
Hi, I've a problem using the eclipse plugin... I've started Hadoop/MapReduce 0.20.2 with `bin/start-mapred.sh` or `bin/start-all.sh` and then executed the main class within Eclipse "Run on hadoop", but then everytime it reads a bigger XML file (about 9 Gb) it freezes. I've used a StAX parser, in a

Remote connection bottleneck?

2010-09-25 Thread Mario M
Hello, I am having a problem that might be expected behaviour. I am using a cloud with Hadoop remotely through ssh. I have a program that runs for about a minute, it processes a 200 MB file using NLineInputFormat and the user decides the number of lines to divide the file. However, before the map-r