Out of curiosity : are there functional limitations in Spark Standalone
that are of concern?  Yarn is more configurable for running non-spark
workloads and how to run multiple spark jobs in parallel. But for a single
spark job it seems standalone launches more quickly and does not miss any
features. Are there specific limitations you are aware of / run into?

stephen b

On Mon, 21 Nov 2022 at 09:01, Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Hi,
>
> I have not tested this myself but Google have brought up *Dataproc Serverless
> for Spar*k. in a nutshell Dataproc Serverless lets you run Spark batch
> workloads without requiring you to provision and manage your own cluster.
> Specify workload parameters, and then submit the workload to the Dataproc
> Serverless service. The service will run the workload on a managed compute
> infrastructure, autoscaling resources as needed. Dataproc Serverless
> charges apply only to the time when the workload is executing. Google
> Dataproc is similar to Amazon EMR
>
> So in short you don't need to provision your own Dataproc cluster etc. One
> thing Inoticed from release doc
> <https://cloud.google.com/dataproc-serverless/docs/overview>is that the
> resource management is *spark based a*s opposed to standard Dataproc
> which iis YARN based. It is available for Spark 3.2. My assumption is
> that by Spark based it means that spark is running in standalone mode. Has
> there been much improvement in release 3.2 for standalone mode?
>
> Thanks
>
>
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>

Reply via email to