Hi,
as far as I understand you shouldn't send data to driver. Suppose you have
file in hdfs/s3 or cassandra partitioning, you should create your job such
that every executor/worker of spark will handle part of your input,
transform, filter it and at the end write back to cassandra as output(once
again every executor/core inside worker will write part of the output, in
your case they will write part of report)

In general I find that submitting multiple jobs in same spark context(aka
driver) is more performant(you don't pay startup-shutdown time), for this
some use rest server for submitting jobs to long running spark
context(driver)

I'm not sure you can run multiple concurrent drivers because of ports

On 4 June 2015 at 17:30, Giuseppe Sarno <giuseppesa...@fico.com> wrote:

>  Hello,
>
> I am relatively new to spark and I am currently trying to understand how
> to scale large numbers of jobs with spark.
>
> I understand that spark architecture is split in “Driver”, “Master” and
> “Workers”. Master has a standby node in case of failure and workers can
> scale out.
>
> All the examples I have seen show Spark been able to distribute the load
> to the workers and returning small amount of data to the Driver. In my case
> I would like to explore the scenario where I need to generate a large
> report on data stored on Cassandra and understand how Spark architecture
> will handle this case when multiple report jobs will be running in parallel.
>
> According to this  presentation
> https://trongkhoanguyenblog.wordpress.com/2015/01/07/understand-the-spark-deployment-modes/
> responses from workers go through the Master and finally to the Driver.
> Does this mean that the Driver and/ or Master is a single point for all the
> responses coming back from workers ?
>
> Is it possible to start multiple concurrent Drivers ?
>
>
>
> Regards,
>
> Giuseppe.
>
>
>
> Fair Isaac Services Limited (Co. No. 01998476) and Fair Isaac (Adeptra)
> Limited (Co. No. 03295455) are registered in England and Wales and have a
> registered office address of Cottons Centre, 5th Floor, Hays Lane, London,
> SE1 2QP.
>
> This email and any files transmitted with it are confidential, proprietary
> and intended solely for the individual or entity to whom they are
> addressed. If you have received this email in error please delete it
> immediately.
>

Reply via email to