Hi Shane,
I can't explain that, but I can say that with 0.19.0 I am using
setNumTasksToExecutePerJvm(-1) and then initializing statically
declared data in the Map configure successfully now. It really is
educated guesswork for the tuning parameters though - I am profiling
the app for memory usage
Given the goal of a shared data accessable across the Map instances,
can someone please explain some of the differences between using:
- setNumTasksToExecutePerJvm() and then having statically declared
data initialised in Mapper.configure(); and
- a MultithreadedMapRunner?
Regards,
Shane
On Wed,
The more I use it, i realize Hadoop is not build around shared
memory. For these type of things, use TSpaces (IBM), that way you can
have a flag to load it once and allow for sharing.
Regards
Saptarshi
On Tue, Nov 25, 2008 at 3:42 PM, Chris K Wensel <[EMAIL PROTECTED]> wrote:
> cool. If you need
cool. If you need a hand with Cascading stuff, feel free to ping me on
the mail list or #cascading irc. lots of other friendly folk there
already.
ckw
On Nov 25, 2008, at 12:35 PM, tim robertson wrote:
Thanks Chris,
I have a different test running, then will implement that. Might give
ca
Thanks Chris,
I have a different test running, then will implement that. Might give
cascading a shot for what I am doing.
Cheers
Tim
On Tue, Nov 25, 2008 at 9:24 PM, Chris K Wensel <[EMAIL PROTECTED]> wrote:
> Hey Tim
>
> The .configure() method is what you are looking for i believe.
>
> It i
Hey Tim
The .configure() method is what you are looking for i believe.
It is called once per task, which in the default case, is once per jvm.
Note Jobs are broken into parallel tasks, each task handles a portion
of the input data. So you may create your map 100 times, because there
are 100
Hi Doug,
Thanks - it is not so much I want to run in a single JVM - I do want a
bunch of machines doing the work, it is just I want them all to have
this in-memory lookup index, that is configured once per job. Is
there some hook somewhere that I can trigger a read from the
distributed cache, or
tim robertson wrote:
Thanks Alex - this will allow me to share the shapefile, but I need to
"one time only per job per jvm" read it, parse it and store the
objects in the index.
Is the Mapper.configure() the best place to do this? E.g. will it
only be called once per job?
In 0.19, with HADOOP-
Hi
Thanks Alex - this will allow me to share the shapefile, but I need to
"one time only per job per jvm" read it, parse it and store the
objects in the index.
Is the Mapper.configure() the best place to do this? E.g. will it
only be called once per job?
Thanks
Tim
On Tue, Nov 25, 2008 at 8:1
You should use the DistributedCache:
<
http://www.cloudera.com/blog/2008/11/14/sending-files-to-remote-task-nodes-with-hadoop-mapreduce/
>
and
<
http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#DistributedCache
>
Hope this helps!
Alex
On Tue, Nov 25, 2008 at 11:09 AM, tim robert
Hi all,
If I want to have an in memory "lookup" Hashmap that is available in
my Map class, where is the best place to initialise this please?
I have a shapefile with polygons, and I wish to create the polygon
objects in memory on each node's JVM and have the map able to pull
back the objects by i
11 matches
Mail list logo