Re: Choosing architecture for on-premise Spark & HDFS on Kubernetes cluster

2021-11-25 Thread JHI Star
Thanks, I'll have a closer look at GKE and compare it with what some other sites running similar to use have used (Openstack). Well, no, I don't envisage any public cloud integration. There is no plan to use Hive just PySpark using HDFS ! On Wed, Nov 24, 2021 at 10:31 AM Mich Talebzadeh wrote:

Re: Choosing architecture for on-premise Spark & HDFS on Kubernetes cluster

2021-11-24 Thread Mich Talebzadeh
Just to clarify it should say The current Spark Kubernetes model ... You will also need to build or get the Spark docker image that you are going to use in k8s clusters based on spark version, java version, scala version, OS and so forth. Are you going to use Hive as your main storage?

Re: Choosing architecture for on-premise Spark & HDFS on Kubernetes cluster

2021-11-23 Thread Mich Talebzadeh
OK to your point below "... We are going to deploy 20 physical Linux servers for use as an on-premise Spark & HDFS on Kubernetes cluster.. Kubernetes is really a cloud-native technology. However, the cloud-native concept does not exclude the use of on-premises infrastructure in cases where it

Choosing architecture for on-premise Spark & HDFS on Kubernetes cluster

2021-11-23 Thread JHI Star
We are going to deploy 20 physical Linux servers for use as an on-premise Spark & HDFS on Kubernetes cluster. My question is: within this architecture, is it best to have the pods run directly on bare metal or under VMs or system containers like LXC and/or under an on-premise instance of something