Hi Mukhtaj,
Parallelization on Spark is abstracted on the DataFrame.
You can run anything locally on the driver but to make it run in parallel on
the cluster you'll need to use the DataFrame abstraction.
You may want to check maxpumperla/elephas.
|
|
|
| | |
|
|
|
| |
maxpumperla/ele
Hi Diwakar,
A Yarn cluster not having Hadoop is kind of a fuzzy concept.
Definitely you may want to have Hadoop and don't need to use MapReduce and use
Spark instead. That is the main reason to use Spark in a Hadoop cluster anyway.
On the other hand it is highly probable you may want to use HDFS
em between threads, so that is what I am
trying to eliminate.
Regards,
Antonin
On 04/07/2020 17:49, Juan Martín Guillén wrote:
> Hi Antonin.
>
> It seems you are confusing Standalone with Local mode. They are 2
> different modes.
>
> From Spark in Action book: "In local mode, th
Hi Antonin.
It seems you are confusing Standalone with Local mode. They are 2 different
modes.
>From Spark in Action book: "In local mode, there is only one executor in the
>same client JVM as the driver, butthis executor can spawn several threads to
>run tasks.
In local mode, Spark uses your