Amar, Thanks for the pointer.
-Original Message-
From: Amar Kamat [mailto:[EMAIL PROTECTED]
Sent: Monday, November 24, 2008 8:43 PM
To: core-user@hadoop.apache.org
Subject: Re: do NOT start reduce task until all mappers are finished
Haijun Cao wrote:
> Hi,
>
>
>
> I
fair scheduler useless for my workload.
I am wondering if there is a way to NOT to start reduce task
until all its mappers have finished.
Thanks
Haijun Cao
Set number of map slots per tasktracker to 8 in order to run 8 map tasks
on one machine (assuming one tasktracker per machine) at the same time:
mapred.tasktracker.map.tasks.maximum
8
The maximum number of map tasks that will be run
simultaneously by a task tracker.
-Original M
--Original Message-----
From: Haijun Cao [mailto:[EMAIL PROTECTED]
Sent: Monday, June 30, 2008 9:33 PM
To: core-user@hadoop.apache.org
Subject: RE: Hadoop - is it good for me and performance question
Not sure if this will answer your question, but a similar thread
regarding hadoop performa
http://www.mail-archive.com/core-user@hadoop.apache.org/msg02906.html
-Original Message-
From: yair gotdanker [mailto:[EMAIL PROTECTED]
Sent: Sunday, June 29, 2008 4:46 AM
To: core-user@hadoop.apache.org
Subject: Hadoop - is it good for me and performance question
Hello all,
I am new
Not sure if this will answer your question, but a similar thread
regarding hadoop performance:
http://www.mail-archive.com/core-user@hadoop.apache.org/msg02878.html
Hadoop is good for log processing if you have a lot of logs to process
and you don't need the result in real time (e.g. you can acc
containing
'username' in them. Is it safe to use some global path, say dropping
the username reference from the default values?
Thank you,
YongChul
On Thu, Jun 26, 2008 at 12:19 PM, Haijun Cao <[EMAIL PROTECTED]>
wrote:
> Is it because you leave the mapred.system.dir as default (see
>
Is it because you leave the mapred.system.dir as default (see
hadoop-default.xml)?
Haijun
-Original Message-
From: YongChul Kwon [mailto:[EMAIL PROTECTED]
Sent: Thursday, June 26, 2008 12:10 PM
To: core-user@hadoop.apache.org
Subject: Sharing Hadoop cluster among multiple users
Hello
"/usr/local/hadoop/hadoop-datastore/hadoop-${user.name}".
Is
this a good location ?
On Thu, Jun 12, 2008 at 12:59 PM, Haijun Cao <[EMAIL PROTECTED]>
wrote:
>
> "While testing I had to delete the temporary "datastore" folder and
> reformat
> the file syste
"While testing I had to delete the temporary "datastore" folder and
reformat
the file system a couple of times."
Is it because you leave hadoop.tmp.dir and other .dir parameter as
default? Try to set hadoop.tmp.dir to a dir not under /tmp.
hadoop.tmp.dir
/tmp/hadoop-${user.name}
A base fo
e.
The summary for us (especially 4-6 months ago when we were deciding) is that
cascading is good enough to use now and pig will probably be more useful
later.
On Wed, Jun 11, 2008 at 4:19 PM, Haijun Cao <[EMAIL PROTECTED]> wrote:
>
> I find cascading very similar to pig, do you c
Ted,
I find cascading very similar to pig, do you care to provide your comment here?
If map reduce programmers are to go to the next level (scripting/query
language), which way to go?
Thanks
Haijun
-Original Message-
From: Ted Dunning [mailto:[EMAIL PROTECTED]
Sent: Wednesday, June
I noticed that "local bytes written/read" stat in my map reduce job is
really high, 2x, 3x, 4x of the hdfs bytes.
When does hadoop mapred framework write to local fs? Is it done when the
jvm memory is not enough and data is spill to disk? how I can configure
so that it does not spill to disk?
HOD may be too heavy weight for us with small cluster and small number of
users.
From hadoop summit, I heard Kerberos authentication is in the pipeline, is
there a place I can check on the progress?
It seems that authentication/authorization work is from the perspective of file
system, but n
08, at 3:45 PM, Haijun Cao wrote:
>
>>
>> Mile, Thanks.
>>
>> "If your inputs to maps are compressed, then you don't get any
>> automatic
>> assignment of mappers to your data: each gzipped file gets assigned
a
>> mapper." <--- this is the ca
ransparent.
Miles
2008/6/4 Haijun Cao <[EMAIL PROTECTED]>:
>
> If a file is compressed and encrypted, then is it still possible to
split
> it and run mappers in parallel?
>
> Do people compress their files stored in hadoop? If yes, how do you go
> about processing them in p
If a file is compressed and encrypted, then is it still possible to split it
and run mappers in parallel?
Do people compress their files stored in hadoop? If yes, how do you go about
processing them in parallel?
Thanks
Haijun
Lohit,
Thanks for the explanation. If that's the case, then it is not slower than
expected.
Haijun
-Original Message-
From: lohit [mailto:[EMAIL PROTECTED]
Sent: Wed 6/4/2008 2:11 AM
To: core-user@hadoop.apache.org
Subject: Re: setrep
>It seems that setrep won't force replicatio
It seems that setrep won't force replication change to the specified number
immediately, it changed really slowly. just wondering if this is the expected
behavior? what's the rational for this behavior? is there way to speed it up?
Thanks
Haijun
Hi,
I noticed that the org.apache.hadoop.conf.Configuration constructor will
log a message like below if DEBUG is enabled:
2008-05-19 15:59:43,237 DEBUG [main] conf.Configuration
java.io.IOException: config()
at
org.apache.hadoop.conf.Configuration.(Configuration.java:156)
The code is
Hi, Chris,
Thanks for adding the map side join feature
(http://issues.apache.org/jira/browse/HADOOP-2085)
I tried the join example with KeyValueTextInputFormat as input format, but got
following exception:
java.lang.NullPointerException
at
org.apache.hadoop.mapred.KeyValu
21 matches
Mail list logo