Thank you.
Splitting the files leads to multiple MR-tasks!
Only changing the MR settings of hadoop did not help. In the future it
would be nice if the drivers would scale themself and would split the
data according to the dataset size and the number of available MR-slots.
Cheers
Sebastian
Am
This is really a Hadoop-level thing. I am not sure I have ever
successfully induced M/R to run multiple mappers on less than one
block of data, even with a low max split size. Reducers you can
control.
On Thu, Mar 28, 2013 at 9:04 AM, Sebastian Briesemeister
This is a longstanding Hadoop issue.
Your suggestion is interesting, but only a few cases would benefit. The
problem is that splitting involves reading from a very small number of
nodes and thus is not much better than just running the program with few
mappers. If the data is large enough to
It would also be very hard to do automatically, as clusters are shared
and a framework cannot know how much of the shared resources (available
map slots) it can take.
On 28.03.2013 10:07, Sean Owen wrote:
This is really a Hadoop-level thing. I am not sure I have ever
successfully induced M/R to
Sebastian,
For CPU-bound problems like matrix factorization with ALS, we have
recently seen good results with multithreaded mappers, where we had the
users specify the number of cores to use per mapper.
On 28.03.2013 10:20, Ted Dunning wrote:
This is a longstanding Hadoop issue.
Your
In my case, each map processes requires a lot of memory and I would like
to distribute this consumption on multiple nodes.
However, I still get out of memory exceptions even if I split the input
file into several very small input files??? I though the mapper would
consider only one file at a time
From what I've seen, even if the mapper does throw an out of memory
exception, Hadoop will restart it increasing the memory.
There are ways to configure the mapper/reducer JVMs to use more memory by
default through the Configuration although I don't recall the exact
options. It's probably
I tried to increase the heap space, but it wasn't enough.
It seems the problem is not the number of mappers. I will start another
thread for this problem with some more details.
Cheers
Sebastian
Am 28.03.2013 16:41, schrieb Dan Filimon:
From what I've seen, even if the mapper does throw an
Dear all,
I am trying to start the FuzzyKMeansDriver on a hadoop cluster so that
it starts multiple MapReduce-Jobs. However, it always starts just a
single MR-Job?!
I figured it might be caused by the fact that I generated my input data
into a single file using SequenceFile.Writer???
Or is there
Do you mean that it starts a single map task?
On Wed, Mar 27, 2013 at 5:10 PM, Sebastian Briesemeister
sebastian.briesemeis...@unister-gmbh.de wrote:
Dear all,
I am trying to start the FuzzyKMeansDriver on a hadoop cluster so that
it starts multiple MapReduce-Jobs. However, it always starts
Yes, correct. It currently starts a single Map task.
Ted Dunning ted.dunn...@gmail.com schrieb:
Do you mean that it starts a single map task?
On Wed, Mar 27, 2013 at 5:10 PM, Sebastian Briesemeister
sebastian.briesemeis...@unister-gmbh.de wrote:
Dear all,
I am trying to start the
Your idea that this is related to your single input file is the most likely
cause.
If your input file is relatively small then splitting it up to force
multiple mappers is the easiest solution.
If your input file is larger, then you might be able to convince the
map-reduce framework to use more
12 matches
Mail list logo