That is correct. In fact I think the scenario is very general as long as we 
want Zeppelin to be scalable. Unifying storage is not going to be that useful 
without supporting updating single record. Without that multiple Zeppelin 
instances working in parallel would not be viable.

From: Jeff Zhang <zjf...@gmail.com>
Reply-To: "users@zeppelin.apache.org" <users@zeppelin.apache.org>
Date: Tuesday, October 17, 2017 at 1:07 AM
To: "users@zeppelin.apache.org" <users@zeppelin.apache.org>
Subject: Re: notebook-authorization.json file makes Zeppelin not scalable


Unify storage could be done in 0.8.0. But for your scenario, it's not about 
storage, it's about how to update the storage. For now, for each change, the 
whole file needs to be updated. You scenario means each change only update one 
record.


Tan, Jialiang <j...@ea.com<mailto:j...@ea.com>>于2017年10月17日周二 下午3:47写道:
I went through those tickets. Are there any plans on those improvements? When 
will the storage layer unification be done approximately?

From: Jeff Zhang <zjf...@gmail.com<mailto:zjf...@gmail.com>>
Reply-To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Date: Tuesday, October 17, 2017 at 12:45 AM

To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Subject: Re: notebook-authorization.json file makes Zeppelin not scalable


Still in file format.


Tan, Jialiang <j...@ea.com<mailto:j...@ea.com>>于2017年10月17日周二 下午3:38写道:
Thanks for such quick reply. Does Zepplin 0.8.0-SNAPSHOT MONGO DB store 
autorizations in db or still in that json file?

From: Jeff Zhang <zjf...@gmail.com<mailto:zjf...@gmail.com>>
Reply-To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Date: Tuesday, October 17, 2017 at 12:28 AM
To: "users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>" 
<users@zeppelin.apache.org<mailto:users@zeppelin.apache.org>>
Subject: Re: notebook-authorization.json file makes Zeppelin not scalable


There's one ticket for unifying zeppelin storage layer. 
https://issues.apache.org/jira/browse/ZEPPELIN-2742

But for your case about sharing notebook-authorization across multiple zeppelin 
instances, I think this ticket is not enough, it would require more deep 
integration with shiro's authorization.

Tan, Jialiang <j...@ea.com<mailto:j...@ea.com>>于2017年10月17日周二 下午3:14写道:
We want to have a Zeppelin service that serves over 200 people in our company. 
So we plan to have around 10 – 15 Zeppelin instances behind an ELB. We use S3 
as notebook storage, and hence all our Zeppelin instances are referring to the 
same S3 location for notebooks. But there is one thing that breaks the whole 
thing: Zeppelin is storing the notebook authorization information into a LOCAL 
file called notebook-authorization.json. In order to solve the problem we setup 
some NFS like thing to let every Zeppelin instance to refer to the same 
configuration location through FS mount. The method has following problems:

1.       We cannot handle concurrency conditions where multiple Zeppelin 
instances are editing the files at the same time. Some unexpected behaviors 
will happen.

2.       I found out that Zeppelin only reads the notebook-authorization.json 
file to memory on startup. After startup, it only treats the authorization in 
memory as the source of truth. Zeppelin will never read that file anymore 
unless you restart it. It only writes to it, from memory. Therefore even 
without the concurrency problem described in (1), it is not able to get the 
correct authorization for notebooks after other Zeppelin instances change the 
authorization file.
I know the reasons behind for making authorizations separate from notebook but 
it actually brings up more serious problems like this. Any ideas how to tackle 
this problem and make Zeppelin scalable?

Reply via email to