Thanks, I'll have a closer look at GKE and compare it with what some other sites running similar to use have used (Openstack).
Well, no, I don't envisage any public cloud integration. There is no plan to use Hive just PySpark using HDFS ! On Wed, Nov 24, 2021 at 10:31 AM Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Just to clarify it should say ....The current Spark Kubernetes model ... > > > You will also need to build or get the Spark docker image that you are > going to use in k8s clusters based on spark version, java version, scala > version, OS and so forth. Are you going to use Hive as your main storage? > > > HTH > > > view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Tue, 23 Nov 2021 at 19:39, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > >> OK to your point below >> >> "... We are going to deploy 20 physical Linux servers for use as an >> on-premise Spark & HDFS on Kubernetes cluster.. >> >> Kubernetes is really a cloud-native technology. However, the >> cloud-native concept does not exclude the use of on-premises infrastructure >> in cases where it makes sense. So the question is are you going to use a >> mesh structure to integrate these microservices together, including >> on-premise and in cloud? >> Now you have 20 tin boxes on-prem that you want to deploy for >> building your Spark & HDFS stack on top of them. You will gain benefit from >> Kubernetes and your microservices by simplifying the deployment by >> decoupling the dependencies and abstracting your infra-structure away with >> the ability to port these infrastructures. As you have your hardware >> (your Linux servers),running k8s on bare metal will give you native >> hardware performance. However, with 20 linux servers, you may limit your >> scalability (your number of k8s nodes). If you go this way, you will need >> to invest in a bare metal automation platform such as platform9 >> <https://platform9.com/bare-metal/> . The likelihood is that you may >> decide to move to the public cloud at some point or integrate with the >> public cloud. My advice would be to look at something like GKE on-prem >> <https://cloud.google.com/anthos/clusters/docs/on-prem/1.3/overview> >> >> >> Back to Spark, The current Kubernetes model works on the basis of the >> "one-container-per-Pod" >> model <https://kubernetes.io/docs/concepts/workloads/pods/> meaning >> that for each node of the cluster you will have one node running the driver >> and each remaining node running one executor each. My question would be >> will you be integrating with public cloud (AWS, GCP etc) at some point? In >> that case you should look at mesh technologies like Istio >> <https://cloud.google.com/learn/what-is-istio> >> >> >> HTH >> >> >> >> view my Linkedin profile >> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >> >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> >> >> On Tue, 23 Nov 2021 at 14:09, JHI Star <jhistarl...@gmail.com> wrote: >> >>> We are going to deploy 20 physical Linux servers for use as an >>> on-premise Spark & HDFS on Kubernetes cluster. My question is: within this >>> architecture, is it best to have the pods run directly on bare metal or >>> under VMs or system containers like LXC and/or under an on-premise instance >>> of something like OpenStack - or something else altogether ? >>> >>> I am looking to garner any experience around this question relating >>> directly to the specific use case of Spark & HDFS on Kuberenetes - I know >>> there are also general points to consider regardless of the use case. >>> >>