Re: HA for Zeppelin

2016-04-13 Thread vincent gromakowski
It's a global decision on our SMACK stack platform but maybe we will go for applications only on docker for devops (client of spark). For zeppelin I dont see the need (no devops) Le 13 avr. 2016 4:05 PM, "John Omernik" a écrit : > Is this a specific Docker decision or a Zeppelin on Docker decisi

Re: HA for Zeppelin

2016-04-13 Thread John Omernik
Is this a specific Docker decision or a Zeppelin on Docker decision. I am curious on the amount of network traffic Zeppelin actually generates. I could be around, but I made the assumption that most of the network traffic with Zeppelin is results from the various endpoints (Spark, JDBC, Elastic Sea

Re: HA for Zeppelin

2016-04-12 Thread vincent gromakowski
We decided to not use docker for network performance In production flows not dor deployment. virtualisation of the network brings 50% decrease In perf. It may change with calico because it abstract network with routing not virtualizing like flannel Le 12 avr. 2016 2:22 PM, "John Omernik" a écrit

Re: HA for Zeppelin

2016-04-12 Thread John Omernik
On 2. I had some thoughts there. How "expensive" would it be fore Zeppelin to run a timer of sorts that can be accessed via a specific URL. Basically, this URL would return the idle time. This thing that knows most if Zeppelin has activity is Zeppelin. So, any actions within Zeppelin would reset

Re: HA for Zeppelin

2016-04-12 Thread John Omernik
Vincent - On 1, I am curious on the docker/network performance issues. We are running, granted, some fat pipes on our cluster between nodes, and our docker registry is actually on the cluster to (backed by MapR FS on all nodes) Most launches of Zeppelin take under 20 seconds for us, because the ru

Re: HA for Zeppelin

2016-04-11 Thread vincent gromakowski
1. I am using ansible to deploy zeppelin on all slaves and to launch zeppelin instance for one user. So if zeppelin binaries are already deployed, the launch is very quick through marathon (1 or 2 sec). ooking for velocity solution (based on jfrog) on Mesos to manage binaries and artifacts with ver

Re: HA for Zeppelin

2016-04-11 Thread Johnny W.
Thanks John for your insights. For 2., one solution we have experimented is spark dynamic resource allocation. We could define a timer to scale down. Hope that helps. J. On Mon, Apr 11, 2016 at 4:24 PM, John Omernik wrote: > 1. Things launch pretty fast for me, however, it depends if the docke

Re: HA for Zeppelin

2016-04-11 Thread John Omernik
1. Things launch pretty fast for me, however, it depends if the docker container I am running Zeppelin in is cached on the node mesos wants to run it on. If not, it pulls from a local docker registry, so worst case, up to a minute to get things running if the image isn't cached. 2. No, if the user

Re: HA for Zeppelin

2016-04-11 Thread Johnny W.
John & Vincent, I am interested in the per instance per user approach. I have some questions about this approach: -- 1. how long will it take to launch a Zeppelin instance (and initialize SparkContext) when user log in? 2. will the instance be destroyed when user log out? if not, how do you deal wi

Re: HA for Zeppelin

2016-04-10 Thread ashish rawat
Thanks Vincent and John, for providing these viable options. On Fri, Apr 8, 2016 at 10:39 PM, John Omernik wrote: > So for us, we are doing something similar to Vincent, however, instead of > Gluster, we are using MapR-FS and the NFS mount. Basically, this gives us a > shared filesystem that is

Re: HA for Zeppelin

2016-04-08 Thread John Omernik
So for us, we are doing something similar to Vincent, however, instead of Gluster, we are using MapR-FS and the NFS mount. Basically, this gives us a shared filesystem that is running on all nodes, with strong security (Filesystem ACEs for fine grained permissions) built in auditing, Posix complian

Re: HA for Zeppelin

2016-04-08 Thread vincent gromakowski
Using it for 3 months without any incident Le 8 avr. 2016 9:09 AM, "ashish rawat" a écrit : > Sounds great. How long have you been using glusterfs in prod? and have you > encountered any challenges. The only difficulty for me to use it, would be > a lack of expertise to fix broken things, so hope

Re: HA for Zeppelin

2016-04-08 Thread ashish rawat
Sounds great. How long have you been using glusterfs in prod? and have you encountered any challenges. The only difficulty for me to use it, would be a lack of expertise to fix broken things, so hope it's stability isn't something to be concerned about. Regards, Ashish On Fri, Apr 8, 2016 at 12:2

Re: HA for Zeppelin

2016-04-07 Thread vincent gromakowski
use fuse interface. Gluster volume is directly accessible as local storage on all nodes but performance is only 200 Mb/s. More than enough for notebooks. For data prefer tachyon/alluxio on top of gluster... Le 8 avr. 2016 6:35 AM, "ashish rawat" a écrit : > Thanks Eran and Vincent. > Eran, I woul

Re: HA for Zeppelin

2016-04-07 Thread ashish rawat
Thanks Eran and Vincent. Eran, I would definitely like to try it out, since it won't add to the complexity of my deployment. Would see the S3 implementation, to figure out how complex it would be. Vincent, I haven't explored glusterfs at all. Would it also require to write an implementation of sto

Re: HA for Zeppelin

2016-04-06 Thread vincent gromakowski
For 1 marathon on mesos restart zeppelin daemon In case of failure. For 2 glusterfs fuse mount allows to share notebooks on all mesos nodes. For 3 not available right now In our design but a manual restart In zeppelin config page is acceptable for US. Le 6 avr. 2016 8:18 AM, "Eran Witkon" a écrit

Re: HA for Zeppelin

2016-04-05 Thread Eran Witkon
Yes this is correct. For HA disk, if you don't have HA storage and no access to S3 then AFAIK you don't have other option at the moment. If you like to save notebooks to elastic then I suggest you look at the storage interface and implementation for git and s3 and implement that yourself. It does s

Re: HA for Zeppelin

2016-04-05 Thread ashish rawat
Thanks Eran. So 3, seems to be something external to Zeppelin, and hopefully 1 only means running "zeppelin-daemon.sh start" on a slave machine, when master become inaccessible. Is that correct? My main concern still remains on the storage front. And I don't really have high availability disks or

Re: HA for Zeppelin

2016-04-05 Thread Eran Witkon
For 1 you need to have both zeppelin web HA and zeppelin deamon HA For 2 I guess you can use HDFS if you implement the storage interface for HDFS. But i am not sure. For 3 I mean that if you connect to an external cluster for example a spark cluster you need to make sure your spark cluster is HA. O

Re: HA for Zeppelin

2016-04-05 Thread ashish rawat
Thanks Eran for your reply. For 1) I am assuming that it would similar to HA of any other web application, i.e. running multiple instances and switching to the backup server when master is down, is it not the case? For 2) is it also possible to save it on hdfs? Can you please explain 3, are you ref

Re: HA for Zeppelin

2016-04-05 Thread Eran Witkon
I would say you need to account for these things 1) availability of the zeppelin deamon 2) availability of the notebookd files 3) availability of the interpreters used. For 1 i don't know of out-of-box solution For 2 any ha storage will do, s3 or any ha external mounted disk For 3 it is up to the

HA for Zeppelin

2016-04-05 Thread ashish rawat
Hi, Is there a suggested architecture to run Zeppelin in high availability mode. The only option I could find was by saving notebooks to S3. Are there any options if one is not using AWS? Regards, Ashish