I do not totally understand you job you are running but if each simulation
can run independent of each other then you could run a map reduce job that
will spread the simulation's over many servers so each one can run one or
more at the same time this will give you a level of protection on server
This can work pretty well if you just use the list of parameter settings as
input. The map task would run your simulation and output the data. You may
not even need a reducer, although parallelized summary of output might be
very nice to have. Because each of your sims takes a long time to run,
here is some informal description of the map/reduce model:
In the map/reduce paradigm there is usually input data consiting of
(very large number of) records.
the paradigm assumes that you want to do some computation on each input
record seperately (without simultenous access to other records)
Thank you for your comment, it did confirm my suspicions.
You framed the problem correctly. I will probably invest a bit of time
studying the framework anyway, to see if a rewrite is interesting, since
we hit scaling limitations on our Agent scheduler framework. Our main
computational load is
I am new to Hadoop. So take this information with a grain of salt.
But the power of Hadoop is breaking down big problems into small pieces and
spreading it across many (thousands) of machines, in effect creating a
massively parallel processing engine.
But in order to take advantage of that functi
Hello list
We will be getting access to a cluster soon, and I was wondering whether
this I should use Hadoop ? Or am I better of with the usual batch
schedulers such as ProActive etc ? I am not a CS/CE person, and from
reading the website I can not get a sense of whether hadoop is for me.
A