Re: Choosing architecture for on-premise Spark & HDFS on Kubernetes cluster

Mich Talebzadeh Tue, 23 Nov 2021 11:40:16 -0800

OK  to your point below

"... We are going to deploy 20 physical Linux servers for use as an
on-premise Spark & HDFS on Kubernetes cluster..

 Kubernetes is really a cloud-native technology. However, the cloud-native
concept does not exclude the use of on-premises infrastructure in cases
where it makes sense. So the question is are you going to use a mesh
structure to integrate these microservices together, including on-premise
and in cloud?
Now you have 20 tin boxes on-prem that you want to deploy for building your
Spark & HDFS stack on top of them. You will gain benefit from Kubernetes
and your microservices by simplifying the deployment by decoupling the
dependencies and abstracting your infra-structure away with the ability to
port these infrastructures. As you have your hardware (your Linux
servers),running k8s on bare metal will give you native hardware
performance. However, with 20 linux servers, you may limit your scalability
(your number of k8s nodes). If you go this way, you will need to invest in
a bare metal automation platform such as platform9
<https://platform9.com/bare-metal/> . The likelihood is that  you may
decide to move to the public cloud at some point or integrate with the
public cloud. My advice would be to look at something like GKE on-prem
<https://cloud.google.com/anthos/clusters/docs/on-prem/1.3/overview>

Back to Spark, The current Kubernetes model works on the basis of the
"one-container-per-Pod"
model  <https://kubernetes.io/docs/concepts/workloads/pods/> meaning that
for each node of the cluster you will have one node running the driver and
each remaining node running one executor each. My question would be will
you be integrating with public cloud (AWS, GCP etc) at some point? In that
case you should look at mesh technologies like Istio
<https://cloud.google.com/learn/what-is-istio>

HTH

   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>

*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

On Tue, 23 Nov 2021 at 14:09, JHI Star <jhistarl...@gmail.com> wrote:

> We are going to deploy 20 physical Linux servers for use as an on-premise
> Spark & HDFS on Kubernetes cluster. My question is: within this
> architecture, is it best to have the pods run directly on bare metal or
> under VMs or system containers like LXC and/or under an on-premise instance
> of something like OpenStack - or something else altogether ?
>
> I am looking to garner any experience around this question relating
> directly to the specific use case of Spark & HDFS on Kuberenetes - I know
> there are also general points to consider regardless of the use case.
>

Re: Choosing architecture for on-premise Spark & HDFS on Kubernetes cluster

Reply via email to