Re: Choosing architecture for on-premise Spark & HDFS on Kubernetes cluster

JHI Star Thu, 25 Nov 2021 06:33:09 -0800

Thanks, I'll have a closer look at GKE and compare it with what some other
sites running similar to use have used (Openstack).


Well, no, I don't envisage any public cloud integration. There is no plan
to use Hive just PySpark using HDFS !

On Wed, Nov 24, 2021 at 10:31 AM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Just to clarify it should say  ....The current Spark Kubernetes model ...
>
>
> You will also need to build or get the Spark docker image that you are
> going to use in k8s clusters based on spark version, java version, scala
> version, OS and so forth. Are you going to use Hive as your main storage?
>
>
> HTH
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 23 Nov 2021 at 19:39, Mich Talebzadeh <mich.talebza...@gmail.com>
> wrote:
>
>> OK  to your point below
>>
>> "... We are going to deploy 20 physical Linux servers for use as an
>> on-premise Spark & HDFS on Kubernetes cluster..
>>
>>  Kubernetes is really a cloud-native technology. However, the
>> cloud-native concept does not exclude the use of on-premises infrastructure
>> in cases where it makes sense. So the question is are you going to use a
>> mesh structure to integrate these microservices together, including
>> on-premise and in cloud?
>> Now you have 20 tin boxes on-prem that you want to deploy for
>> building your Spark & HDFS stack on top of them. You will gain benefit from
>> Kubernetes and your microservices by simplifying the deployment by
>> decoupling the dependencies and abstracting your infra-structure away with
>> the ability to port these infrastructures. As you have your hardware
>> (your Linux servers),running k8s on bare metal will give you native
>> hardware performance. However, with 20 linux servers, you may limit your
>> scalability (your number of k8s nodes). If you go this way, you will need
>> to invest in a bare metal automation platform such as platform9
>> <https://platform9.com/bare-metal/> . The likelihood is that  you may
>> decide to move to the public cloud at some point or integrate with the
>> public cloud. My advice would be to look at something like GKE on-prem
>> <https://cloud.google.com/anthos/clusters/docs/on-prem/1.3/overview>
>>
>>
>> Back to Spark, The current Kubernetes model works on the basis of the 
>> "one-container-per-Pod"
>> model  <https://kubernetes.io/docs/concepts/workloads/pods/> meaning
>> that for each node of the cluster you will have one node running the driver
>> and each remaining node running one executor each. My question would be
>> will you be integrating with public cloud (AWS, GCP etc) at some point? In
>> that case you should look at mesh technologies like Istio
>> <https://cloud.google.com/learn/what-is-istio>
>>
>>
>> HTH
>>
>>
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Tue, 23 Nov 2021 at 14:09, JHI Star <jhistarl...@gmail.com> wrote:
>>
>>> We are going to deploy 20 physical Linux servers for use as an
>>> on-premise Spark & HDFS on Kubernetes cluster. My question is: within this
>>> architecture, is it best to have the pods run directly on bare metal or
>>> under VMs or system containers like LXC and/or under an on-premise instance
>>> of something like OpenStack - or something else altogether ?
>>>
>>> I am looking to garner any experience around this question relating
>>> directly to the specific use case of Spark & HDFS on Kuberenetes - I know
>>> there are also general points to consider regardless of the use case.
>>>
>>

Re: Choosing architecture for on-premise Spark & HDFS on Kubernetes cluster

Reply via email to