RE: Submitting a Hadoop task from withing a reducer

2011-10-04 Thread GOEKE, MATTHEW (AG/1000)
Joey, Is yarn just a synonym for MRv2? And if so he would still have to create a custom application master for his job type right? Matt -Original Message- From: Joey Echeverria [mailto:j...@cloudera.com] Sent: Tuesday, October 04, 2011 11:06 AM To: mapreduce-user@hadoop.apache.org

RE: How to pull data in the Map/Reduce functions?

2011-09-24 Thread GOEKE, MATTHEW (AG/1000)
Praveen, Functionality wise you don't gain much from using the new API and most would actually recommend that you stay with the old API as it will not be officially deprecated until 0.22 / 0.23 (I can't remember which one). If you want to take a look at the classes dig into the packages for

RE: FairScheduler Local Task Restriction

2011-09-22 Thread GOEKE, MATTHEW (AG/1000)
If you dig into the job history on the web-ui can you confirm whether it is the same 16 tasktrackers slots that are getting the map tasks? Long shot but it could be that it is actually distributing across your cluster and there is some other issue that is springing up. Also, how long does each

RE: FairScheduler Local Task Restriction

2011-09-22 Thread GOEKE, MATTHEW (AG/1000)
seems to be running two at a time. Map tasks take 10 seconds from start to finish. Is it possible that they are just completing faster than they can be created and it just seems to stick around 16? -- Adam From: GOEKE, MATTHEW (AG/1000) [mailto:matthew.go...@monsanto.com] Sent: Thursday

RE: No Mapper but Reducer

2011-09-08 Thread GOEKE, MATTHEW (AG/1000)
of this. If we take a basic map reduce job, say word count without a combiner. What would the percentage distribution of execution time on map, reduce and the sort shuffle phase? On Wed, Sep 7, 2011 at 10:30 PM, GOEKE, MATTHEW (AG/1000) matthew.go...@monsanto.commailto:matthew.go...@monsanto.com

RE: Distributed cache not working

2011-07-18 Thread GOEKE, MATTHEW (AG/1000)
) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.mapred.Child.main(Child.java:262) From: GOEKE, MATTHEW [AG/1000] Sent: Monday, July 18, 2011 4:54 PM To: mapreduce-user

RE: tasktracker maximum map tasks for a certain job

2011-06-21 Thread GOEKE, MATTHEW (AG/1000)
Off the wall thought but it might be possible to do this through rolling your own load manager using the fair scheduler. I know this is how people have setup custom job distributions based on current cluster utilization. Matt From: Jonathan Zukerman [mailto:zukermanjonat...@gmail.com] Sent:

RE: controlling no. of mapper tasks

2011-06-20 Thread GOEKE, MATTHEW (AG/1000)
Praveen, David is correct but we might need to use different terminology. Hadoop looks at the number of input splits and if the file is not splittable then yes it will only use 1 mapper for it. In the case of most files (which are splittable) Hadoop will break them into multiple maps and work

RE: controlling no. of mapper tasks

2011-06-20 Thread GOEKE, MATTHEW (AG/1000)
(for 5 node cluster). So I believe hadoop is defaulting to no. of cores in the cluster which is 10. That is why I want to choose the map tasks also same as no. of cores so that they match with max concurrent map tasks. Praveen -Original Message- From: ext GOEKE, MATTHEW (AG/1000

RE: mapreduce and python

2011-06-20 Thread GOEKE, MATTHEW (AG/1000)
Hassen, If you would like to use python as input I would suggest looking into the streaming api examples around Hadoop. Matt -Original Message- From: Hassen Riahi [mailto:hassen.ri...@cern.ch] Sent: Monday, June 20, 2011 3:13 PM To: mapreduce-user@hadoop.apache.org Subject: mapreduce

RE: mapreduce and python

2011-06-20 Thread GOEKE, MATTHEW (AG/1000)
You might want to chase down leads around https://issues.apache.org/jira/browse/MAPREDUCE-606. It looks like there is a patch for it on Jira but I am not quite sure if it is working. If it is worth it to you to keep it in python then it might be worth it to tinker with the patch... HTH, Matt

RE: Programming Multiple rounds of mapreduce

2011-06-13 Thread GOEKE, MATTHEW (AG/1000)
If you know for certain that it needs to be split into multiple work units I would suggest looking into Oozie. Easy to install, light weight, low learning curve... for my purposes it's been very helpful so far. I am also fairly certain you can chain multiple job confs into the same run but I

RE: Is there a way to see what file killed a mapper?

2011-05-10 Thread GOEKE, MATTHEW [AG/1000]
Someone might have a more graceful method of determining this but I've found outputting that kind of data to counters is the most effective way. Otherwise you could use stderr or stdout but then you would need to mine the log data on each node to figure it out. Matt From: Jonathan Coveney