Re: HA for Zeppelin

John Omernik Wed, 13 Apr 2016 07:06:15 -0700

Is this a specific Docker decision or a Zeppelin on Docker decision. I am
curious on the amount of network traffic Zeppelin actually generates. I
could be around, but I made the assumption that most of the network traffic
with Zeppelin is results from the various endpoints (Spark, JDBC, Elastic
Search etc) and not heavy lifting type activities.



John
On Apr 12, 2016 5:03 PM, "vincent gromakowski" <
vincent.gromakow...@gmail.com> wrote:

> We  decided  to not use docker for network performance In production flows
> not dor deployment. virtualisation of the network brings 50% decrease In
> perf. It may change with calico because it abstract network with routing
> not virtualizing like flannel
> Le 12 avr. 2016 2:22 PM, "John Omernik" <j...@omernik.com> a écrit :
>
>> On 2.  I had some thoughts there.  How "expensive" would it be fore
>> Zeppelin to run a timer of sorts that can be accessed via a specific URL.
>> Basically, this URL would return the idle time. This thing that knows most
>> if Zeppelin has activity is Zeppelin.  So, any actions within Zeppelin
>> would reset this timer basically, changing notebooks, opening, closing,
>> moving notes around, running notes, adding new notes, changing interpreter
>> settings. Any requests that are handled by Zeppelin in the UI, would reset
>> said timer. A request to the "timer" URL obviously would NOT reset the
>> timer, but basically, if nothing that was user actionable (we'd have to
>> separate user actionable items from automated API requests) was run, the
>> timer would not get reset. This would allow us using Zeppelin in a
>> multi-user/multi-tenant environment to monitor for idle instances and take
>> action when the occur. (Ideally, we could through an authenticated API
>> issue a "save" of all notebooks before taking said action...
>>
>> So, to summarize:
>>
>> API that provides seconds since last human action...
>>
>> Monitor that API, when seconds since last human actions exceed enterprise
>> threshold, then API can issue the "Safe Save all"  to Zeppelin, which will
>> go ahead and do a save (addition point, the timer API could return seconds
>> since last human use and a bool value of "all saved" or not... basically,
>> if normal Zeppelin processes have saved all human interaction, the API
>> could indicate that, then, when the timer check hits the API, it knows,
>> "The seconds past the threshold, and Zeppelin reports all saved, we can
>> issue a termination, or if it's not all safe, it can issue the "save all"
>> command, and wait for it to be safe... if something is keeping Zeppelin
>> from being in a safe condition for shutdown, the API would reflect this and
>> prevent a shutdown).
>>
>> Then, API seconds exceed enterprise threshold, we can safely shutdown the
>> instance of Zeppelin returning resources to the cluster.
>>
>> Would love discussion here...
>>
>> On Tue, Apr 12, 2016 at 1:57 AM, vincent gromakowski <
>> vincent.gromakow...@gmail.com> wrote:
>>
>>> 1. I am using ansible to deploy zeppelin on all slaves and to launch
>>> zeppelin instance for one user. So if zeppelin binaries are already
>>> deployed, the launch is very quick through marathon (1 or 2 sec). ooking
>>> for velocity solution (based on jfrog) on Mesos to manage binaries and
>>> artifacts with versioning, rights... No use of docker for network
>>> performance constraints
>>>
>>> 2. Same answer as John. Still running. I will test dynamic resource for
>>> spark interpreter but zeppelin daemon will still be up and taking 4GB
>>>
>>> 3. I have a service discovery that authenticate the user and route him
>>> to his instance (and only his instance). It's based right now on a simple
>>> shell script pulling marathon through its API and updating an apache
>>> configuration file every 15s. The username is in the marathon task. We will
>>> update this with a fully industrialized solution (consul ? haproxy ?...)
>>>
>>>
>>> 3.
>>>
>>> 2016-04-12 2:37 GMT+02:00 Johnny W. <jzw.ser...@gmail.com>:
>>>
>>>> Thanks John for your insights.
>>>>
>>>> For 2., one solution we have experimented is spark dynamic resource
>>>> allocation. We could define a timer to scale down. Hope that helps.
>>>>
>>>> J.
>>>>
>>>> On Mon, Apr 11, 2016 at 4:24 PM, John Omernik <j...@omernik.com> wrote:
>>>>
>>>>> 1. Things launch pretty fast for me, however, it depends if the docker
>>>>> container I am running Zeppelin in is cached on the node mesos wants to 
>>>>> run
>>>>> it on. If not, it pulls from a local docker registry, so worst case, up to
>>>>> a minute to get things running if the image isn't cached.
>>>>>
>>>>> 2. No, if the user logs out it stays running.  Ideally I would want to
>>>>> setup some sort of timer that could scale down an instance if left unused.
>>>>> I have some ideas here, but haven't put them into practice yet.   I wanted
>>>>> to play with Nginx to see if I could do something there (lack of activity
>>>>> causes Nginx to shutdown Zeppelin for example). With spark resources, one
>>>>> thing I wanted to play with using fine grain scaling with mesos, to only
>>>>> use resources if queries were actually running.  Lots of tools to fit the
>>>>> bill here, just need to identify the right ones.
>>>>>
>>>>> 3. Dns resolution is handed for me with mesos-dns.  Each instance has
>>>>> its own Id  and the dns name auto updates in mesos dns based on mesos 
>>>>> tasks
>>>>> so I always know where Zeppelin is running.
>>>>>
>>>>> On Monday, April 11, 2016, Johnny W. <jzw.ser...@gmail.com> wrote:
>>>>>
>>>>>> John & Vincent, I am interested in the per instance per user
>>>>>> approach. I have some questions about this approach:
>>>>>> --
>>>>>> 1. how long will it take to launch a Zeppelin instance (and
>>>>>> initialize SparkContext) when user log in?
>>>>>> 2. will the instance be destroyed when user log out? if not, how do
>>>>>> you deal with the resource assigned to Zeppelin/SparkContext?
>>>>>> 3. for auto failover through marathon, how do you deal with the DNS
>>>>>> resolve for clients?
>>>>>>
>>>>>> Thanks!
>>>>>> J.
>>>>>>
>>>>>> On Fri, Apr 8, 2016 at 10:09 AM, John Omernik <j...@omernik.com>
>>>>>> wrote:
>>>>>>
>>>>>>> So for us, we are doing something similar to Vincent, however,
>>>>>>> instead of Gluster, we are using MapR-FS and the NFS mount. Basically, 
>>>>>>> this
>>>>>>> gives us a shared filesystem that is running on all nodes, with strong
>>>>>>> security (Filesystem ACEs for fine grained permissions) built in 
>>>>>>> auditing,
>>>>>>> Posix compliance, true random read/write (as opposed to HDFS), 
>>>>>>> snapshots,
>>>>>>> and cluster to cluster replication. There are also some neat things with
>>>>>>> Volumes and Volume placement we are doing . That provides our storage
>>>>>>> layer. Then we have docker for actually running Zeppelin, and since 
>>>>>>> it's a
>>>>>>> instance per User, that helps organize who has access to what (Still
>>>>>>> hashing out the details on that).  Marathon on Mesos is how we ensure 
>>>>>>> that
>>>>>>> Zeppelin is actually available, and then when it comes to spark, we are
>>>>>>> just submitting to Mesos, which is right there. Since everything is on 
>>>>>>> one
>>>>>>> cluster, the user has a home directory (on a volume) where I store all
>>>>>>> configs for each instance of Zeppelin, and they can also put adhoc data 
>>>>>>> in
>>>>>>> their home directory. Spark and Apache Drill can both query anything in
>>>>>>> MapR FS, making it a pretty powerful combination.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Apr 8, 2016 at 6:33 AM, vincent gromakowski <
>>>>>>> vincent.gromakow...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Using it for 3 months without any incident
>>>>>>>> Le 8 avr. 2016 9:09 AM, "ashish rawat" <dceash...@gmail.com> a
>>>>>>>> écrit :
>>>>>>>>
>>>>>>>>> Sounds great. How long have you been using glusterfs in prod? and
>>>>>>>>> have you encountered any challenges. The only difficulty for me to 
>>>>>>>>> use it,
>>>>>>>>> would be a lack of expertise to fix broken things, so hope it's 
>>>>>>>>> stability
>>>>>>>>> isn't something to be concerned about.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Ashish
>>>>>>>>>
>>>>>>>>> On Fri, Apr 8, 2016 at 12:20 PM, vincent gromakowski <
>>>>>>>>> vincent.gromakow...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> use fuse interface. Gluster volume is directly accessible as
>>>>>>>>>> local storage on all nodes but performance is only 200 Mb/s. More 
>>>>>>>>>> than
>>>>>>>>>> enough for notebooks. For data prefer tachyon/alluxio on top of 
>>>>>>>>>> gluster...
>>>>>>>>>> Le 8 avr. 2016 6:35 AM, "ashish rawat" <dceash...@gmail.com> a
>>>>>>>>>> écrit :
>>>>>>>>>>
>>>>>>>>>>> Thanks Eran and Vincent.
>>>>>>>>>>> Eran, I would definitely like to try it out, since it won't add
>>>>>>>>>>> to the complexity of my deployment. Would see the S3 
>>>>>>>>>>> implementation, to
>>>>>>>>>>> figure out how complex it would be.
>>>>>>>>>>>
>>>>>>>>>>> Vincent,
>>>>>>>>>>> I haven't explored glusterfs at all. Would it also require to
>>>>>>>>>>> write an implementation of storage interface? Or zeppelin can work 
>>>>>>>>>>> with it,
>>>>>>>>>>> out of the box?
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Ashish
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Apr 6, 2016 at 12:53 PM, vincent gromakowski <
>>>>>>>>>>> vincent.gromakow...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> For 1 marathon on mesos restart zeppelin daemon In case of
>>>>>>>>>>>> failure.
>>>>>>>>>>>> For 2 glusterfs fuse mount allows to share notebooks on all
>>>>>>>>>>>> mesos nodes.
>>>>>>>>>>>> For 3 not available right now In our  design but a manual
>>>>>>>>>>>> restart In zeppelin config page is acceptable for US.
>>>>>>>>>>>> Le 6 avr. 2016 8:18 AM, "Eran Witkon" <eranwit...@gmail.com> a
>>>>>>>>>>>> écrit :
>>>>>>>>>>>>
>>>>>>>>>>>>> Yes this is correct.
>>>>>>>>>>>>> For HA disk, if you don't have HA storage and no access to S3
>>>>>>>>>>>>> then AFAIK you don't have other option at the moment.
>>>>>>>>>>>>> If you like to save notebooks to elastic then I suggest you
>>>>>>>>>>>>> look at the storage interface and implementation for git and s3 
>>>>>>>>>>>>> and
>>>>>>>>>>>>> implement that yourself. It does sound like an interesting feature
>>>>>>>>>>>>> Best
>>>>>>>>>>>>> Eran
>>>>>>>>>>>>> On Wed, 6 Apr 2016 at 08:57 ashish rawat <dceash...@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks Eran. So 3, seems to be something external to
>>>>>>>>>>>>>> Zeppelin, and hopefully 1 only means running "zeppelin-daemon.sh 
>>>>>>>>>>>>>> start" on
>>>>>>>>>>>>>> a slave machine, when master become inaccessible. Is that 
>>>>>>>>>>>>>> correct?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> My main concern still remains on the storage front. And I
>>>>>>>>>>>>>> don't really have high availability disks or even hdfs in my 
>>>>>>>>>>>>>> setup. I have
>>>>>>>>>>>>>> been using elastic search cluster for data high availability, 
>>>>>>>>>>>>>> but was
>>>>>>>>>>>>>> hoping that zeppelin can save notebooks to a Elastic Search 
>>>>>>>>>>>>>> (like kibana)
>>>>>>>>>>>>>> or maybe a document store.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Any idea if anything is planned in that direction. Don't want
>>>>>>>>>>>>>> to fallback to 'rsync' like options.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>> Ashish
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Apr 5, 2016 at 11:17 PM, Eran Witkon <
>>>>>>>>>>>>>> eranwit...@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> For 1 you need to have both zeppelin web HA and zeppelin
>>>>>>>>>>>>>>> deamon HA
>>>>>>>>>>>>>>> For 2 I guess you can use HDFS if you implement the storage
>>>>>>>>>>>>>>> interface for HDFS. But i am not sure.
>>>>>>>>>>>>>>> For 3 I mean that if you connect to an external cluster for
>>>>>>>>>>>>>>> example a spark cluster you need to make sure your spark 
>>>>>>>>>>>>>>> cluster is HA.
>>>>>>>>>>>>>>> Otherwise you will have zeppelin running but your notebook will 
>>>>>>>>>>>>>>> fail as no
>>>>>>>>>>>>>>> spark cluster available.
>>>>>>>>>>>>>>> HTH
>>>>>>>>>>>>>>> Eran
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, 5 Apr 2016 at 20:20 ashish rawat <
>>>>>>>>>>>>>>> dceash...@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks Eran for your reply.
>>>>>>>>>>>>>>>> For 1) I am assuming that it would similar to HA of any
>>>>>>>>>>>>>>>> other web application, i.e. running multiple instances and 
>>>>>>>>>>>>>>>> switching to the
>>>>>>>>>>>>>>>> backup server when master is down, is it not the case?
>>>>>>>>>>>>>>>> For 2) is it also possible to save it on hdfs?
>>>>>>>>>>>>>>>> Can you please explain 3, are you referring to interpreter
>>>>>>>>>>>>>>>> config? If I am using Spark interpreter and submitting jobs to 
>>>>>>>>>>>>>>>> it, and if
>>>>>>>>>>>>>>>> zeppelin master node goes down, then what could be the problem 
>>>>>>>>>>>>>>>> in slave
>>>>>>>>>>>>>>>> node pointing to the same cluster and submitting jobs?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>> Ashish
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Apr 5, 2016 at 10:08 PM, Eran Witkon <
>>>>>>>>>>>>>>>> eranwit...@gmail.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I would say you need to account for these things
>>>>>>>>>>>>>>>>> 1) availability of the zeppelin deamon
>>>>>>>>>>>>>>>>> 2) availability of the notebookd files
>>>>>>>>>>>>>>>>> 3) availability of the interpreters used.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> For 1 i don't know of out-of-box solution
>>>>>>>>>>>>>>>>> For 2 any ha storage will do, s3 or any ha external
>>>>>>>>>>>>>>>>> mounted disk
>>>>>>>>>>>>>>>>> For 3 it is up to the interpreter and your big data ha
>>>>>>>>>>>>>>>>> solution
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, 5 Apr 2016 at 19:29 ashish rawat <
>>>>>>>>>>>>>>>>> dceash...@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Is there a suggested architecture to run Zeppelin in high
>>>>>>>>>>>>>>>>>> availability mode. The only option I could find was by 
>>>>>>>>>>>>>>>>>> saving notebooks to
>>>>>>>>>>>>>>>>>> S3. Are there any options if one is not using AWS?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Regards,
>>>>>>>>>>>>>>>>>> Ashish
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Sent from my iThing
>>>>>
>>>>
>>>>
>>>
>>

Re: HA for Zeppelin

Reply via email to