On Wed, Jan 28, 2015 at 1:44 PM, Matan Safriel wrote:
> So I assume I can safely run a function F of mine within the spark driver
> program, without dispatching it to the cluster (?), thereby sticking to one
> piece of code for both a real cluster run over big data, and for small
> on-demand runs
Thanks!
So I assume I can safely run a function *F* of mine within the spark driver
program, without dispatching it to the cluster (?), thereby sticking to one
piece of code for *both* a real cluster run over big data, and for small
on-demand runs for a single input (now and then), both scenarios
Processing one object isn't a distributed operation, and doesn't
really involve Spark. Just invoke your function on your object in the
driver; there's no magic at all to that.
You can make an RDD of one object and invoke a distributed Spark
operation on it, but assuming you mean you have it on the
Hi,
How would I run a given function in Spark, over a single input object?
Would I first add the input to the file system, then somehow invoke the
Spark function on just that input? or should I rather twist the Spark
streaming api for it?
Assume I'd like to run a piece of computation that normall