Thanks, I'll have a closer look at GKE and compare it with what some other
sites running similar to use have used (Openstack).
Well, no, I don't envisage any public cloud integration. There is no plan
to use Hive just PySpark using HDFS !
On Wed, Nov 24, 2021 at 10:31 AM Mich Talebzadeh
wrote:
Just to clarify it should say The current Spark Kubernetes model ...
You will also need to build or get the Spark docker image that you are
going to use in k8s clusters based on spark version, java version, scala
version, OS and so forth. Are you going to use Hive as your main storage?
OK to your point below
"... We are going to deploy 20 physical Linux servers for use as an
on-premise Spark & HDFS on Kubernetes cluster..
Kubernetes is really a cloud-native technology. However, the cloud-native
concept does not exclude the use of on-premises infrastructure in cases
where it
We are going to deploy 20 physical Linux servers for use as an on-premise
Spark & HDFS on Kubernetes cluster. My question is: within this
architecture, is it best to have the pods run directly on bare metal or
under VMs or system containers like LXC and/or under an on-premise instance
of something