OK to your point below "... We are going to deploy 20 physical Linux servers for use as an on-premise Spark & HDFS on Kubernetes cluster..
Kubernetes is really a cloud-native technology. However, the cloud-native concept does not exclude the use of on-premises infrastructure in cases where it makes sense. So the question is are you going to use a mesh structure to integrate these microservices together, including on-premise and in cloud? Now you have 20 tin boxes on-prem that you want to deploy for building your Spark & HDFS stack on top of them. You will gain benefit from Kubernetes and your microservices by simplifying the deployment by decoupling the dependencies and abstracting your infra-structure away with the ability to port these infrastructures. As you have your hardware (your Linux servers),running k8s on bare metal will give you native hardware performance. However, with 20 linux servers, you may limit your scalability (your number of k8s nodes). If you go this way, you will need to invest in a bare metal automation platform such as platform9 <https://platform9.com/bare-metal/> . The likelihood is that you may decide to move to the public cloud at some point or integrate with the public cloud. My advice would be to look at something like GKE on-prem <https://cloud.google.com/anthos/clusters/docs/on-prem/1.3/overview> Back to Spark, The current Kubernetes model works on the basis of the "one-container-per-Pod" model <https://kubernetes.io/docs/concepts/workloads/pods/> meaning that for each node of the cluster you will have one node running the driver and each remaining node running one executor each. My question would be will you be integrating with public cloud (AWS, GCP etc) at some point? In that case you should look at mesh technologies like Istio <https://cloud.google.com/learn/what-is-istio> HTH view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Tue, 23 Nov 2021 at 14:09, JHI Star <jhistarl...@gmail.com> wrote: > We are going to deploy 20 physical Linux servers for use as an on-premise > Spark & HDFS on Kubernetes cluster. My question is: within this > architecture, is it best to have the pods run directly on bare metal or > under VMs or system containers like LXC and/or under an on-premise instance > of something like OpenStack - or something else altogether ? > > I am looking to garner any experience around this question relating > directly to the specific use case of Spark & HDFS on Kuberenetes - I know > there are also general points to consider regardless of the use case. >