Re: Performance question

2009-04-20 Thread Jean-Daniel Cryans
Mark, There is a setup price when using Hadoop, for each task a new JVM must be spawned. On such a small scale, you won't see any good using MR. J-D On Mon, Apr 20, 2009 at 12:26 AM, Mark Kerzner markkerz...@gmail.com wrote: Hi, I ran a Hadoop MapReduce task in the local mode, reading and

Re: Performance question

2009-04-20 Thread Mark Kerzner
Jean-Daniel, I realize that, and my question was, is this the normal setup/finishup time, about 2 minutes? If it is, then fine. I would expect that on tasks taking 10-15 minutes, 2 minutes would be totally justified, and I think that this is the guideline - each task should take minutes. Thank

Re: Performance question

2009-04-20 Thread Jean-Daniel Cryans
Mark, Oh sorry, yes you should expect that kind of delay. A tip to optimize that on big jobs with lots of tasks is to use the JobConf.setNumTasksToExecutePerJvm(int numTasks) which sets how many times a JVM can be reused (instead of spawning new ones). Happy Hadooping! J-D On Mon, Apr 20, 2009

Re: Performance question

2009-04-20 Thread Arun C Murthy
On Apr 20, 2009, at 9:56 AM, Mark Kerzner wrote: Hi, I ran a Hadoop MapReduce task in the local mode, reading and writing from HDFS, and it took 2.5 minutes. Essentially the same operations on the local file system without MapReduce took 1/2 minute. Is this to be expected? Hmm...

Re: Performance question

2009-04-20 Thread Mark Kerzner
Arun, thank you very much for the answer. I will turn off the combiner. I am debugging intermediate MR steps now, so I am mostly interested in performance to for this, and real tuning will be later, in a cluster. I am running 18.3, but general pointers should be good enough at this stage. I am

Performance question

2009-04-19 Thread Mark Kerzner
Hi, I ran a Hadoop MapReduce task in the local mode, reading and writing from HDFS, and it took 2.5 minutes. Essentially the same operations on the local file system without MapReduce took 1/2 minute. Is this to be expected? It seemed that the system lost most of the time in the MapReduce

Re: Hadoop - is it good for me and performance question

2008-07-01 Thread tim robertson
@hadoop.apache.org Subject: RE: Hadoop - is it good for me and performance question Not sure if this will answer your question, but a similar thread regarding hadoop performance: http://www.mail-archive.com/core-user@hadoop.apache.org/msg02878.html Hadoop is good for log processing if you have a lot

RE: Hadoop - is it good for me and performance question

2008-07-01 Thread Haijun Cao
@hadoop.apache.org Subject: RE: Hadoop - is it good for me and performance question Thanks for your reply Haijun, Do you know what makes Hadoop run so slow? I have been trying to figure it out my self but I can't imagine anything so complicate that justifies hadoop performance and latency. -Original

RE: Hadoop - is it good for me and performance question

2008-06-30 Thread Haijun Cao
http://www.mail-archive.com/core-user@hadoop.apache.org/msg02906.html -Original Message- From: yair gotdanker [mailto:[EMAIL PROTECTED] Sent: Sunday, June 29, 2008 4:46 AM To: core-user@hadoop.apache.org Subject: Hadoop - is it good for me and performance question Hello all, I am

Hadoop - is it good for me and performance question

2008-06-29 Thread yair gotdanker
Hello all, I am newbie to hadoop, The technology seems very interesting but I am not sure it suit my needs. I really appreciate your feedbacks. The problem: I have multiple logservers each receiving 10-100 mg/minute. The received data is processed to produce aggregated data. The data