Out of curiosity : are there functional limitations in Spark Standalone that are of concern? Yarn is more configurable for running non-spark workloads and how to run multiple spark jobs in parallel. But for a single spark job it seems standalone launches more quickly and does not miss any features. Are there specific limitations you are aware of / run into?
stephen b On Mon, 21 Nov 2022 at 09:01, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Hi, > > I have not tested this myself but Google have brought up *Dataproc Serverless > for Spar*k. in a nutshell Dataproc Serverless lets you run Spark batch > workloads without requiring you to provision and manage your own cluster. > Specify workload parameters, and then submit the workload to the Dataproc > Serverless service. The service will run the workload on a managed compute > infrastructure, autoscaling resources as needed. Dataproc Serverless > charges apply only to the time when the workload is executing. Google > Dataproc is similar to Amazon EMR > > So in short you don't need to provision your own Dataproc cluster etc. One > thing Inoticed from release doc > <https://cloud.google.com/dataproc-serverless/docs/overview>is that the > resource management is *spark based a*s opposed to standard Dataproc > which iis YARN based. It is available for Spark 3.2. My assumption is > that by Spark based it means that spark is running in standalone mode. Has > there been much improvement in release 3.2 for standalone mode? > > Thanks > > > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > https://en.everybodywiki.com/Mich_Talebzadeh > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > >