Memory Manager in Hadoop MR

2010-12-09 Thread Pedro Costa
Hi, 1 - Hadoop MR contains a TaskMemoryManagerThread class that is used to manage memory usage of tasks running under a TaskTracker. Why Hadoop MR needs a class to manage memory? Why it couldn't rely on the JVM, or this class is here for another purpose? 2 - How the JT knows that a Map or Reduce

Map-Reduce Applicability With All-In Memory Data

2010-12-09 Thread Narinder Kumar
Hi All, We have a problem in hand which we would like to solve using Distributed and Parallel Processing. *Problem context* : We have a Map (Entity, Value). The entity can have a parent which in turn will have its parent and so on till we reach the head. I have to traverse this tree and do some

MultipleInputs and Paths Containing Commas

2010-12-09 Thread Ghigliotti, Matthew
Hello. I'm unsure of if this is a bug or an oversight, but since I've not found any reference anywhere to this, I figured I might bring it to light. I've been using MultipleInputs for several of my MapReduce jobs, where I am joining together different forms of data. However, I have encountered

RE: distcp just fails (was:distcp fails with ConnectException)

2010-12-09 Thread Deepika Khera
Thanks everyone. It turned out that I was using the wrong port. The issue was resolved. From: hadoopman [mailto:hadoop...@gmail.com] Sent: Monday, December 06, 2010 6:26 PM To: mapreduce-user@hadoop.apache.org Subject: Re: distcp just fails (was:distcp fails with ConnectException) On 12/06/2010

Re: Memory Manager in Hadoop MR

2010-12-09 Thread Greg Roelofs
2 - How the JT knows that a Map or Reduce Task finished? Is through the heartbeat? Exactly. Tasks communicate with their TTs through the umbilical, and each TT communicates with the JT via heartbeat (and heartbeat response). Greg

Re: Behaviour of reducer's Iterable in MR unit.

2010-12-09 Thread Aaron Kimball
Hi James, The ReduceDriver is configured to receive a list of inputs because lists have ordering guarantees whereas other Iterables/Collections do not; for determinism's sake, it is best to guarantee that you're calling reduce() with an ordered set of values when testing. It would be stellar if

How to share Same Counter in Multiple Jobs?

2010-12-09 Thread Savannah Beckett
Hi,   I chain multiple jobs in my program.  Job 1's reduce function has a counter.  I want job 3's reduce function to read this Job 1's counter.  How?  Thanks.

Re: How to share Same Counter in Multiple Jobs?

2010-12-09 Thread Ted Yu
I wrote the following code today. We have our own flow execution logic which calls the following to collect counters. enum COUNT_COLLECTION { LOG,// log the counters ADD_TO_CONF// add counters to JobConf } protected static void

-libjars?

2010-12-09 Thread Vipul Pandey
disclaimer : a newbie!!! Howdy? Got a quick question. -libjars option doesn't seem to work for me in - prettymuch - my first (or mayby second) mapreduce job. Here's what i'm doing : $bin/hadoop jar sherlock.jar somepkg.FindSchoolsJob -libjars HStats-1A18.jar input output sherlock.jar has