Re: instantiation of classes in MR

2012-01-09 Thread Anirudh
Harsh, Appreciate you responding to the thread. Sorry got a bit held up with work, hence couldnt reply. I am not sure what do you mean by setup/cleanup procedures? In my response I actually meant latter i.e. API hooks 'setup/cleanup'. If the APIs are called per split/partition, why do we need to

Re: instantiation of classes in MR

2012-01-02 Thread Harsh J
Hello Anirudh, On 02-Jan-2012, at 5:31 AM, Anirudh wrote: > Any specific reason why setup is called for every task attempt. For > optimization point of view, wouldnt it be good if the setup is called only > once in case of JVM reuse. Note that the task setup/cleanup procedures are separate fro

Re: instantiation of classes in MR

2012-01-02 Thread Eyal Golan
Thank you very much for the help. I am going to start working on it soon (a few days) and will probably have more questions :) Eyal Golan egola...@gmail.com Visit: http://jvdrums.sourceforge.net/ LinkedIn: http://www.linkedin.com/in/egolan74 Skype: egolan74 P Save a tree. Please don't print t

Re: instantiation of classes in MR

2012-01-01 Thread Anirudh
Any specific reason why setup is called for every task attempt. For optimization point of view, wouldnt it be good if the setup is called only once in case of JVM reuse. I have not yet looked at the implementation, in case of JVM reuse is the application Mapper instance reused or a new instance is

Re: instantiation of classes in MR

2012-01-01 Thread Harsh J
You are guaranteed one setup call for every single task attempt. This is regardless of JVM reuse being on or off. JVM reuse will cause no issues with what Eyal is attempting to do. On Sun, Jan 1, 2012 at 5:49 PM, Anirudh wrote: > No problems Eyal. > > OnĀ  a second thought, for the JVM re-use the

Re: instantiation of classes in MR

2012-01-01 Thread Anirudh
No problems Eyal. On a second thought, for the JVM re-use the Mapper/Reducer instances should be re-used, and the setup should be called only once. This makes sense too as the JVM reuse is for the same job. You should be good with class instantiation even if the JVM reuse is enabled. On Sat, Dec

Re: instantiation of classes in MR

2011-12-31 Thread Eyal Golan
Thank you very much for the detailed explanation Anirudh. I think that my question about node / VM was due to some lack of knowledge (I'm just starting to learn the Hadoop environment). Regarding configuration of the nodes and clusters. This is something that I am not doing by myself. We have a de

Re: instantiation of classes in MR

2011-12-31 Thread Anirudh
I just wanted to confirm where exactly you were planning to have the instantiation code, as it was not mentioned in your previous post. The location would have made difference. As you are doing it in the setup of mapper/reducer, you are good. I was referring to the Task JVM Reuse option: http://ha

Re: instantiation of classes in MR

2011-12-31 Thread Eyal Golan
My idea is to create that class in the setup / configure method (depends which Mapper / Reducer I will inherit from). I don't understand the 'reuse' option you are referring to. How many map tasks will be created? One per split or one per VM (node)? Are you suggesting that although there would be

Re: instantiation of classes in MR

2011-12-30 Thread Anirudh
Where are you creating this new class. If it is in the map function, then it will be create a new object for each record in the split. Also you may need to see how the JVM reuse option works. I am not too sure of this and you may want to look at the code. If the option for JVM reuse is set, then m

Re: instantiation of classes in MR

2011-12-30 Thread Eyal Golan
Great News !! Thanks for the info. So using reflection, I can inject different implementations of interfaces (services) for the mapper (or reducer). And this way I can test a mapper (or reducer). Just by reflecting a stub instead of a real implementation. Thanks, Eyal Golan egola...@gmail.com

Re: instantiation of classes in MR

2011-12-30 Thread Harsh J
Eyal, Yes, it is right to think of each Task attempt being one individual JVM running individually on any added Node. Multiple slots would mean multiple VMs in parallel as well. Yes, your use of reflection to build your objects will work just fine -- its all user-side java code that is executed

instantiation of classes in MR

2011-12-30 Thread Eyal Golan
Hi, I want to understand a basic concept in MR. If a mapper creates an instance of some class (using the 'new' operator), then the created class exists ONCE in the VM of this node. For each node. Correct? Now, what if instead of using the 'new' operator, the class is created using reflection. Is