Dataproc serverless for Spark

Mich Talebzadeh Mon, 21 Nov 2022 09:01:36 -0800

Hi,

I have not tested this myself but Google have brought up *Dataproc Serverless
for Spar*k. in a nutshell Dataproc Serverless lets you run Spark batch
workloads without requiring you to provision and manage your own cluster.
Specify workload parameters, and then submit the workload to the Dataproc
Serverless service. The service will run the workload on a managed compute
infrastructure, autoscaling resources as needed. Dataproc Serverless
charges apply only to the time when the workload is executing. Google
Dataproc is similar to Amazon EMR


So in short you don't need to provision your own Dataproc cluster etc. One
thing Inoticed from release doc
<https://cloud.google.com/dataproc-serverless/docs/overview>is that the
resource management is *spark based a*s opposed to standard Dataproc which
iis YARN based. It is available for Spark 3.2. My assumption is that by
Spark based it means that spark is running in standalone mode. Has there
been much improvement in release 3.2 for standalone mode?

Thanks




   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>


 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Dataproc serverless for Spark

Reply via email to