On Tue, Dec 21, 2010 at 7:23 AM, li ping wrote:
> I think the reduce can be started before all of the map finished.
> See the configration item in mapred-site.xml
>
> mapred.reduce.slowstart.completed.maps
> 0.05
> Fraction of the number of maps in the job which should be
> complete befor
I don think reduce jobs can start unless all maps are over because the
values are accumulated at the end of map stage only.
are you using the default scheduler?
On Tue, Dec 21, 2010 at 7:23 AM, li ping wrote:
> I think the reduce can be started before all of the map finished.
> See the configrat
I think the reduce can be started before all of the map finished.
See the configration item in mapred-site.xml
mapred.reduce.slowstart.completed.maps
0.05
Fraction of the number of maps in the job which should be
complete before reduces are scheduled for the job.
Correct me, if I'm w
Hi,
On Tue, Dec 21, 2010 at 12:03 AM, Pedro Costa wrote:
> 1 - A reduce task should start only when a map task ends ?
Only when all map()s finish, the reduce() is called, yes.
>
> --
> Pedro
>
--
Harsh J
www.harshj.com
All,
Not sure if this is the right mailing list of this question. I am using pig
to do some data analysis and I am wondering if there a way to tell hadoop
when it encountered a bad log files either due to uncompression failures or
what ever caused the job to die, record the line and if possible th
This makes sense until you realize:
a) It won't scale.
b) Machines fail.
On Dec 20, 2010, at 5:26 AM, Martin Becker wrote:
> I wrote a little bit much, so I put a summary up front. Sorry about that.
>
> Summary:
> 1) Is there any point in time, where on
1 - A reduce task should start only when a map task ends ?
--
Pedro
I wrote a little bit much, so I put a summary up front. Sorry about that.
Summary:
1) Is there any point in time, where one single instance of Hadoop has
access to all keys that are to be distributed to the nodes together
with corresponding data? Or maybe at least nodes could have Task
priorities,
The JobTracker wouldn't know what your data is going to be is when it
is assigning the Reduce Tasks.
If you really do need ordering among your reducers, you should
implement a locking mechanism (making sure the dormant reduce tasks
stay alive by sending out some status reports).
Although, how is
I just reread my first post. Maybe I was not clear enough:
It is only important to me that the Reduce tasks _start_ in a
specified order based on their key. That is the only additional
constraint I need.
On Mon, Dec 20, 2010 at 9:51 AM, Martin Becker <_martinbec...@web.de> wrote:
> As far as I und
Thank you for your suggestions. In this context I heard about
ZooKeeper a few times. It seems to be the easiest and most failsafe
solution as of yet. Another solution mentioned was using some sort of
communication through the file system, which is probably slow and is
quite subtle. Of course, I wou
As far as I understood, MapReduce is waiting for all Mappers to finish
until it starts running Reduce tasks. Am I mistaken here? If I am not,
then I do not see any more synchrony being introduced than there
already is (no locks required). Of course I am not aware of all the
internals, but MapReduce
12 matches
Mail list logo