Small doubt in MR

2010-01-02 Thread bharath v
Hi, I want a particular "section of code" to run only in any "ONE" of the mappers . So I employed the following procedure. Main-Class { public boolean flag = true; Map-Class { if(flag) { flag=false; /* section of co

Re: Small doubt in MR

2010-01-02 Thread Mark Kerzner
I think you need some kind of semaphore that you can turn on by the first reducer. For example, allocating a file in HDFS would work - if you could guarantee that it is an atomic operation (create-if-does-not-exist). Mark On Sat, Jan 2, 2010 at 10:04 PM, bharath v < bharathvissapragada1...@gmail.

Re: Small doubt in MR

2010-01-02 Thread Matei Zaharia
If you want the code to happen on only one machine, why not run it in your driver program that submits the MapReduce job? You could also create a special input record that tells the mapper who gets that record that it's the chosen one. However, note that that mapper may be run multiple times du

Re: Small doubt in MR

2010-01-02 Thread brien colwell
Another approach would be to use a custom InputFormat implementation, with the flag as a property of the input split . Consider wrapping your InputFormat with something like 'InputFormatWithFlag', that returns splits that combine the wrapped InputFormat's splits with your flag. Since InputForm

Re: Small doubt in MR

2010-01-04 Thread Mridul Muralidharan
From top of my head, you could set the flag to true based on some globally unique condition. Like some specific file name with start offset 0 - like part-0, offset 0 (the actual file name could be a jobconf param). Note that the condition should be repeatable - since tasks can get reexe