Joey,
Is yarn just a synonym for MRv2? And if so he would still have to create a
custom application master for his job type right?
Matt
-Original Message-
From: Joey Echeverria [mailto:j...@cloudera.com]
Sent: Tuesday, October 04, 2011 11:06 AM
To: mapreduce-user@hadoop.apache.org
Praveen,
Functionality wise you don't gain much from using the new API and most would
actually recommend that you stay with the old API as it will not be
officially deprecated until 0.22 / 0.23 (I can't remember which one). If you
want to take a look at the classes dig into the packages for
If you dig into the job history on the web-ui can you confirm whether it is the
same 16 tasktrackers slots that are getting the map tasks? Long shot but it
could be that it is actually distributing across your cluster and there is some
other issue that is springing up. Also, how long does each
seems to be running two at a time. Map
tasks take 10 seconds from start to finish. Is it possible that they are just
completing faster than they can be created and it just seems to stick around 16?
-- Adam
From: GOEKE, MATTHEW (AG/1000) [mailto:matthew.go...@monsanto.com]
Sent: Thursday
of this. If we take a basic map reduce
job, say word count without a combiner. What would the percentage distribution
of execution time on map, reduce and the sort shuffle phase?
On Wed, Sep 7, 2011 at 10:30 PM, GOEKE, MATTHEW (AG/1000)
matthew.go...@monsanto.commailto:matthew.go...@monsanto.com
)
at javax.security.auth.Subject.doAs(Subject.java:396)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
From: GOEKE, MATTHEW [AG/1000]
Sent: Monday, July 18, 2011 4:54 PM
To: mapreduce-user
Off the wall thought but it might be possible to do this through rolling your
own load manager using the fair scheduler. I know this is how people have setup
custom job distributions based on current cluster utilization.
Matt
From: Jonathan Zukerman [mailto:zukermanjonat...@gmail.com]
Sent:
Praveen,
David is correct but we might need to use different terminology. Hadoop looks
at the number of input splits and if the file is not splittable then yes it
will only use 1 mapper for it. In the case of most files (which are splittable)
Hadoop will break them into multiple maps and work
(for 5 node cluster). So I
believe hadoop is defaulting to no. of cores in the cluster which is 10. That
is why I want to choose the map tasks also same as no. of cores so that they
match with max concurrent map tasks.
Praveen
-Original Message-
From: ext GOEKE, MATTHEW (AG/1000
Hassen,
If you would like to use python as input I would suggest looking into the
streaming api examples around Hadoop.
Matt
-Original Message-
From: Hassen Riahi [mailto:hassen.ri...@cern.ch]
Sent: Monday, June 20, 2011 3:13 PM
To: mapreduce-user@hadoop.apache.org
Subject: mapreduce
You might want to chase down leads around
https://issues.apache.org/jira/browse/MAPREDUCE-606. It looks like there is a
patch for it on Jira but I am not quite sure if it is working. If it is worth
it to you to keep it in python then it might be worth it to tinker with the
patch...
HTH,
Matt
If you know for certain that it needs to be split into multiple work units I
would suggest looking into Oozie. Easy to install, light weight, low learning
curve... for my purposes it's been very helpful so far. I am also fairly
certain you can chain multiple job confs into the same run but I
Someone might have a more graceful method of determining this but I've
found outputting that kind of data to counters is the most effective
way. Otherwise you could use stderr or stdout but then you would need to
mine the log data on each node to figure it out.
Matt
From: Jonathan Coveney
13 matches
Mail list logo