RE: do NOT start reduce task until all mappers are finished

2008-11-24 Thread Haijun Cao
Amar, Thanks for the pointer. -Original Message- From: Amar Kamat [mailto:[EMAIL PROTECTED] Sent: Monday, November 24, 2008 8:43 PM To: core-user@hadoop.apache.org Subject: Re: do NOT start reduce task until all mappers are finished Haijun Cao wrote: > Hi, > > > > I

do NOT start reduce task until all mappers are finished

2008-11-24 Thread Haijun Cao
fair scheduler useless for my workload. I am wondering if there is a way to NOT to start reduce task until all its mappers have finished. Thanks Haijun Cao

RE: parallel mapping on single server

2008-07-09 Thread Haijun Cao
Set number of map slots per tasktracker to 8 in order to run 8 map tasks on one machine (assuming one tasktracker per machine) at the same time: mapred.tasktracker.map.tasks.maximum 8 The maximum number of map tasks that will be run simultaneously by a task tracker. -Original M

RE: Hadoop - is it good for me and performance question

2008-07-01 Thread Haijun Cao
--Original Message----- From: Haijun Cao [mailto:[EMAIL PROTECTED] Sent: Monday, June 30, 2008 9:33 PM To: core-user@hadoop.apache.org Subject: RE: Hadoop - is it good for me and performance question Not sure if this will answer your question, but a similar thread regarding hadoop performa

RE: Hadoop - is it good for me and performance question

2008-06-30 Thread Haijun Cao
http://www.mail-archive.com/core-user@hadoop.apache.org/msg02906.html -Original Message- From: yair gotdanker [mailto:[EMAIL PROTECTED] Sent: Sunday, June 29, 2008 4:46 AM To: core-user@hadoop.apache.org Subject: Hadoop - is it good for me and performance question Hello all, I am new

RE: Hadoop - is it good for me and performance question

2008-06-30 Thread Haijun Cao
Not sure if this will answer your question, but a similar thread regarding hadoop performance: http://www.mail-archive.com/core-user@hadoop.apache.org/msg02878.html Hadoop is good for log processing if you have a lot of logs to process and you don't need the result in real time (e.g. you can acc

RE: Sharing Hadoop cluster among multiple users

2008-06-26 Thread Haijun Cao
containing 'username' in them. Is it safe to use some global path, say dropping the username reference from the default values? Thank you, YongChul On Thu, Jun 26, 2008 at 12:19 PM, Haijun Cao <[EMAIL PROTECTED]> wrote: > Is it because you leave the mapred.system.dir as default (see >

RE: Sharing Hadoop cluster among multiple users

2008-06-26 Thread Haijun Cao
Is it because you leave the mapred.system.dir as default (see hadoop-default.xml)? Haijun -Original Message- From: YongChul Kwon [mailto:[EMAIL PROTECTED] Sent: Thursday, June 26, 2008 12:10 PM To: core-user@hadoop.apache.org Subject: Sharing Hadoop cluster among multiple users Hello

RE: Question about Hadoop

2008-06-12 Thread Haijun Cao
"/usr/local/hadoop/hadoop-datastore/hadoop-${user.name}". Is this a good location ? On Thu, Jun 12, 2008 at 12:59 PM, Haijun Cao <[EMAIL PROTECTED]> wrote: > > "While testing I had to delete the temporary "datastore" folder and > reformat > the file syste

RE: Question about Hadoop

2008-06-12 Thread Haijun Cao
"While testing I had to delete the temporary "datastore" folder and reformat the file system a couple of times." Is it because you leave hadoop.tmp.dir and other .dir parameter as default? Try to set hadoop.tmp.dir to a dir not under /tmp. hadoop.tmp.dir /tmp/hadoop-${user.name} A base fo

RE: does anyone have idea on how to run multiple sequential jobs with bash script

2008-06-11 Thread Haijun Cao
e. The summary for us (especially 4-6 months ago when we were deciding) is that cascading is good enough to use now and pig will probably be more useful later. On Wed, Jun 11, 2008 at 4:19 PM, Haijun Cao <[EMAIL PROTECTED]> wrote: > > I find cascading very similar to pig, do you c

RE: does anyone have idea on how to run multiple sequential jobs with bash script

2008-06-11 Thread Haijun Cao
Ted, I find cascading very similar to pig, do you care to provide your comment here? If map reduce programmers are to go to the next level (scripting/query language), which way to go? Thanks Haijun -Original Message- From: Ted Dunning [mailto:[EMAIL PROTECTED] Sent: Wednesday, June

local bytes written (high io, low memory usage)

2008-06-05 Thread Haijun Cao
I noticed that "local bytes written/read" stat in my map reduce job is really high, 2x, 3x, 4x of the hdfs bytes. When does hadoop mapred framework write to local fs? Is it done when the jvm memory is not enough and data is spill to disk? how I can configure so that it does not spill to disk?

RE: compressed/encrypted file

2008-06-05 Thread Haijun Cao
HOD may be too heavy weight for us with small cluster and small number of users. From hadoop summit, I heard Kerberos authentication is in the pipeline, is there a place I can check on the progress? It seems that authentication/authorization work is from the perspective of file system, but n

RE: compressed/encrypted file

2008-06-05 Thread Haijun Cao
08, at 3:45 PM, Haijun Cao wrote: > >> >> Mile, Thanks. >> >> "If your inputs to maps are compressed, then you don't get any >> automatic >> assignment of mappers to your data: each gzipped file gets assigned a >> mapper." <--- this is the ca

RE: compressed/encrypted file

2008-06-04 Thread Haijun Cao
ransparent. Miles 2008/6/4 Haijun Cao <[EMAIL PROTECTED]>: > > If a file is compressed and encrypted, then is it still possible to split > it and run mappers in parallel? > > Do people compress their files stored in hadoop? If yes, how do you go > about processing them in p

compressed/encrypted file

2008-06-04 Thread Haijun Cao
If a file is compressed and encrypted, then is it still possible to split it and run mappers in parallel? Do people compress their files stored in hadoop? If yes, how do you go about processing them in parallel? Thanks Haijun

RE: setrep

2008-06-04 Thread Haijun Cao
Lohit, Thanks for the explanation. If that's the case, then it is not slower than expected. Haijun -Original Message- From: lohit [mailto:[EMAIL PROTECTED] Sent: Wed 6/4/2008 2:11 AM To: core-user@hadoop.apache.org Subject: Re: setrep >It seems that setrep won't force replicatio

setrep

2008-06-03 Thread Haijun Cao
It seems that setrep won't force replication change to the specified number immediately, it changed really slowly. just wondering if this is the expected behavior? what's the rational for this behavior? is there way to speed it up? Thanks Haijun

confusing debug message thrown by Configuration class

2008-05-19 Thread Haijun Cao
Hi, I noticed that the org.apache.hadoop.conf.Configuration constructor will log a message like below if DEBUG is enabled: 2008-05-19 15:59:43,237 DEBUG [main] conf.Configuration java.io.IOException: config() at org.apache.hadoop.conf.Configuration.(Configuration.java:156) The code is

hadoop.mapred.join.Parser does not work with KeyValueTextInputFormat

2008-05-05 Thread Haijun Cao
Hi, Chris, Thanks for adding the map side join feature (http://issues.apache.org/jira/browse/HADOOP-2085) I tried the join example with KeyValueTextInputFormat as input format, but got following exception: java.lang.NullPointerException at org.apache.hadoop.mapred.KeyValu