Re: HA for Zeppelin

John Omernik Fri, 08 Apr 2016 10:10:59 -0700

So for us, we are doing something similar to Vincent, however, instead of
Gluster, we are using MapR-FS and the NFS mount. Basically, this gives us a
shared filesystem that is running on all nodes, with strong security
(Filesystem ACEs for fine grained permissions) built in auditing, Posix
compliance, true random read/write (as opposed to HDFS), snapshots, and
cluster to cluster replication. There are also some neat things with
Volumes and Volume placement we are doing . That provides our storage
layer. Then we have docker for actually running Zeppelin, and since it's a
instance per User, that helps organize who has access to what (Still
hashing out the details on that).  Marathon on Mesos is how we ensure that
Zeppelin is actually available, and then when it comes to spark, we are
just submitting to Mesos, which is right there. Since everything is on one
cluster, the user has a home directory (on a volume) where I store all
configs for each instance of Zeppelin, and they can also put adhoc data in
their home directory. Spark and Apache Drill can both query anything in
MapR FS, making it a pretty powerful combination.




On Fri, Apr 8, 2016 at 6:33 AM, vincent gromakowski <
vincent.gromakow...@gmail.com> wrote:

> Using it for 3 months without any incident
> Le 8 avr. 2016 9:09 AM, "ashish rawat" <dceash...@gmail.com> a écrit :
>
>> Sounds great. How long have you been using glusterfs in prod? and have
>> you encountered any challenges. The only difficulty for me to use it, would
>> be a lack of expertise to fix broken things, so hope it's stability isn't
>> something to be concerned about.
>>
>> Regards,
>> Ashish
>>
>> On Fri, Apr 8, 2016 at 12:20 PM, vincent gromakowski <
>> vincent.gromakow...@gmail.com> wrote:
>>
>>> use fuse interface. Gluster volume is directly accessible as local
>>> storage on all nodes but performance is only 200 Mb/s. More than enough for
>>> notebooks. For data prefer tachyon/alluxio on top of gluster...
>>> Le 8 avr. 2016 6:35 AM, "ashish rawat" <dceash...@gmail.com> a écrit :
>>>
>>>> Thanks Eran and Vincent.
>>>> Eran, I would definitely like to try it out, since it won't add to the
>>>> complexity of my deployment. Would see the S3 implementation, to figure out
>>>> how complex it would be.
>>>>
>>>> Vincent,
>>>> I haven't explored glusterfs at all. Would it also require to write an
>>>> implementation of storage interface? Or zeppelin can work with it, out of
>>>> the box?
>>>>
>>>> Regards,
>>>> Ashish
>>>>
>>>> On Wed, Apr 6, 2016 at 12:53 PM, vincent gromakowski <
>>>> vincent.gromakow...@gmail.com> wrote:
>>>>
>>>>> For 1 marathon on mesos restart zeppelin daemon In case of failure.
>>>>> For 2 glusterfs fuse mount allows to share notebooks on all mesos
>>>>> nodes.
>>>>> For 3 not available right now In our  design but a manual restart In
>>>>> zeppelin config page is acceptable for US.
>>>>> Le 6 avr. 2016 8:18 AM, "Eran Witkon" <eranwit...@gmail.com> a écrit :
>>>>>
>>>>>> Yes this is correct.
>>>>>> For HA disk, if you don't have HA storage and no access to S3 then
>>>>>> AFAIK you don't have other option at the moment.
>>>>>> If you like to save notebooks to elastic then I suggest you look at
>>>>>> the storage interface and implementation for git and s3 and implement 
>>>>>> that
>>>>>> yourself. It does sound like an interesting feature
>>>>>> Best
>>>>>> Eran
>>>>>> On Wed, 6 Apr 2016 at 08:57 ashish rawat <dceash...@gmail.com> wrote:
>>>>>>
>>>>>>> Thanks Eran. So 3, seems to be something external to Zeppelin, and
>>>>>>> hopefully 1 only means running "zeppelin-daemon.sh start" on a slave
>>>>>>> machine, when master become inaccessible. Is that correct?
>>>>>>>
>>>>>>> My main concern still remains on the storage front. And I don't
>>>>>>> really have high availability disks or even hdfs in my setup. I have 
>>>>>>> been
>>>>>>> using elastic search cluster for data high availability, but was hoping
>>>>>>> that zeppelin can save notebooks to a Elastic Search (like kibana) or 
>>>>>>> maybe
>>>>>>> a document store.
>>>>>>>
>>>>>>> Any idea if anything is planned in that direction. Don't want to
>>>>>>> fallback to 'rsync' like options.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Ashish
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Apr 5, 2016 at 11:17 PM, Eran Witkon <eranwit...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> For 1 you need to have both zeppelin web HA and zeppelin deamon HA
>>>>>>>> For 2 I guess you can use HDFS if you implement the storage
>>>>>>>> interface for HDFS. But i am not sure.
>>>>>>>> For 3 I mean that if you connect to an external cluster for example
>>>>>>>> a spark cluster you need to make sure your spark cluster is HA. 
>>>>>>>> Otherwise
>>>>>>>> you will have zeppelin running but your notebook will fail as no spark
>>>>>>>> cluster available.
>>>>>>>> HTH
>>>>>>>> Eran
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, 5 Apr 2016 at 20:20 ashish rawat <dceash...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks Eran for your reply.
>>>>>>>>> For 1) I am assuming that it would similar to HA of any other web
>>>>>>>>> application, i.e. running multiple instances and switching to the 
>>>>>>>>> backup
>>>>>>>>> server when master is down, is it not the case?
>>>>>>>>> For 2) is it also possible to save it on hdfs?
>>>>>>>>> Can you please explain 3, are you referring to interpreter config?
>>>>>>>>> If I am using Spark interpreter and submitting jobs to it, and if 
>>>>>>>>> zeppelin
>>>>>>>>> master node goes down, then what could be the problem in slave node
>>>>>>>>> pointing to the same cluster and submitting jobs?
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Ashish
>>>>>>>>>
>>>>>>>>> On Tue, Apr 5, 2016 at 10:08 PM, Eran Witkon <eranwit...@gmail.com
>>>>>>>>> > wrote:
>>>>>>>>>
>>>>>>>>>> I would say you need to account for these things
>>>>>>>>>> 1) availability of the zeppelin deamon
>>>>>>>>>> 2) availability of the notebookd files
>>>>>>>>>> 3) availability of the interpreters used.
>>>>>>>>>>
>>>>>>>>>> For 1 i don't know of out-of-box solution
>>>>>>>>>> For 2 any ha storage will do, s3 or any ha external mounted disk
>>>>>>>>>> For 3 it is up to the interpreter and your big data ha solution
>>>>>>>>>>
>>>>>>>>>> On Tue, 5 Apr 2016 at 19:29 ashish rawat <dceash...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> Is there a suggested architecture to run Zeppelin in high
>>>>>>>>>>> availability mode. The only option I could find was by saving 
>>>>>>>>>>> notebooks to
>>>>>>>>>>> S3. Are there any options if one is not using AWS?
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Ashish
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>
>>

Re: HA for Zeppelin

Reply via email to