For issue (1) you might want to try Amazon EFS. While EFS is designed for “big 
data”, you can use it for other concurrency use cases.  You need to pay very 
close attention to your storage size/utilization ratios as EFS can complete 
choke off bandwidth.  I would also look at using EFS for Notebook storage as 
this will help up the storage size/utilization ratios and –may—improve 
performance.   In any case, EFS could provide a solid concurrency solution.  
Costs little to test the concept.

 

Note:  twe have similar scenario to you on our roadmap.  Our approach relies on 
F5 (and SAML for authentication).  We have not got to Zeppelin yet in our 
SSO/SAML integration roadmap.

 

Patrick Maroney

Principal Engineer – Data Sciences & Analytics

Wapack Labs

609-841-5104

pmaro...@wapacklabs.com

http://pgp.mit.edu/pks/lookup?op=get&search=0x7C810C9769BD29AF

http://www.wapacklabs.com

 

From: "Tan, Jialiang" <j...@ea.com>
Reply-To: <users@zeppelin.apache.org>
Date: Tuesday, October 17, 2017 at 3:14 AM
To: "users@zeppelin.apache.org" <users@zeppelin.apache.org>
Subject: notebook-authorization.json file makes Zeppelin not scalable

 

We want to have a Zeppelin service that serves over 200 people in our company. 
So we plan to have around 10 – 15 Zeppelin instances behind an ELB. We use S3 
as notebook storage, and hence all our Zeppelin instances are referring to the 
same S3 location for notebooks. But there is one thing that breaks the whole 
thing: Zeppelin is storing the notebook authorization information into a LOCAL 
file called notebook-authorization.json. In order to solve the problem we setup 
some NFS like thing to let every Zeppelin instance to refer to the same 
configuration location through FS mount. The method has following problems:
We cannot handle concurrency conditions where multiple Zeppelin instances are 
editing the files at the same time. Some unexpected behaviors will happen.
I found out that Zeppelin only reads the notebook-authorization.json file to 
memory on startup. After startup, it only treats the authorization in memory as 
the source of truth. Zeppelin will never read that file anymore unless you 
restart it. It only writes to it, from memory. Therefore even without the 
concurrency problem described in (1), it is not able to get the correct 
authorization for notebooks after other Zeppelin instances change the 
authorization file. 
I know the reasons behind for making authorizations separate from notebook but 
it actually brings up more serious problems like this. Any ideas how to tackle 
this problem and make Zeppelin scalable?

 

Reply via email to