Re: Zeppelin distributed architecture design

liuxun Fri, 10 Aug 2018 19:37:33 -0700

hi，

After 2 weeks of development, I have completed the development of upgrading 
copycat to the atomix algorithm library.
The reason for the increased workload is the need to resolve the problem of 
netty package conflicts. Now it has been used on our intra-company clusters 
using the atomix algorithm.


Because atomix uses the 4.1.27-Final version of the netty JAR package.
If you put the high version of the netty package directly in ./zeppelin/lib or 
the ./zeppelin/interpreter path, it will conflict with the netty package 
version of spark, causing the spark-interpreter to fail.
Need to be isolated in zeppelin-server and interpreter-process by loading the 
atomix netty JAR and the netty package in the classpath through the custom 
classloader.

I updated the atomix algorithm library module in the system design 
documentation, please click on the link below to browse.

Atomix Raft algorithm library
https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit#heading=h.qbcgqhd0wwh8
 
<https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit#heading=h.qbcgqhd0wwh8>

I will send a new code to submit the pull, please help me merge it, thank you.

Thanks,
Xun Liu

> 在 2018年7月24日，下午12:57，liuxun <[email protected]> 写道：
> 
> @Jongyoul Lee：
> Thank you for your attention.
> 
> Indeed, as you said, the `Copycat` project has been closed and has been 
> migrated to `https://github.com/atomix/atomix` 
> <https://github.com/atomix/atomix%60>.
> 
> I also considered this issue during development.
> The main reason was that it was enough to realize Raft using `Copycat` at the 
> time, and it was not considered too long.
> 
> Today, I took a look at the documentation of atomix, 
> https://atomix.io/docs/latest/user-manual/ 
> <https://atomix.io/docs/latest/user-manual/> , 
> which has a lot of features, such as broadcasting messages in the cluster, 
> detecting cluster events... ,
> From the perspective of zeppelin's long-term development, it is better to use 
> atomix.
> So, I will switch the Raft protocol algorithm library to atomix, which is not 
> difficult to modify.
> 
> Struggle for zeppelin!!! :-)
> 
> 
>> 在 2018年7月24日，上午9:35，Jongyoul Lee <[email protected] 
>> <mailto:[email protected]>> 写道：
>> 
>> First of all, thank you for your effort and contribution.
>> 
>> I read it carefully today, and personally, it's a very nice feature and
>> idea.
>> 
>> Let's discuss it and improve more concretely. I also left comments on the
>> doc.
>> 
>> And I have a simple question.
>> 
>> `Copycat`, which you used to implement it, is deprecated by owner[1] and
>> moved under https://github.com/atomix/atomix/ 
>> <https://github.com/atomix/atomix/>. I'm afraid of it. Do you
>> have any reason to use this library? It's even SNAPSHOT version.
>> 
>> Regards,
>> JL
>> 
>> [1]: https://github.com/atomix/copycat <https://github.com/atomix/copycat>
>> 
>> On Sat, Jul 21, 2018 at 2:07 AM, liuxun <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>>> HI：
>>> 
>>> In order to more intuitively express the actual use of distributed
>>> zeppelin clusters.
>>> I updated this design document, starting with the 16th page of the
>>> document, adding 2 GIF animations showing the operation record screen of
>>> the zeppelin cluster we are using now.
>>> https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeu 
>>> <https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeu>
>>> VDKCRRBm-Qa3Bw/edit# <https://docs.google.com/document/d/
>>> 1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit#>
>>> 
>>> Distributed clustered zeppelin is already in use at our company, and the
>>> recorded screens are all real.
>>> The first recorded screens GIF shows the following
>>> Create a cluster of three zeppelin servers
>>> Add 234, 235, 236 to the zeppelin.cluster.addr attribute in
>>> zeppelin-site.xml to create a cluster
>>> Start these 3 servers at the same time
>>> Open the web pages of these 3 servers and prepare for the notebook
>>> operation.
>>> 
>>> 
>>> The second recorded screens GIF shows the following
>>> Create an interpreter process in the cluster
>>> Create a notebook on host234 and execute it, This action will create an
>>> interpreter process in the server with free resources in the cluster.
>>> You can then continue editing this notebook on host235 and execute it, You
>>> can return results immediately without waiting for the time to create an
>>> interpreter process.
>>> Again, you can continue to edit this notebook on host236. And execute it,
>>> you can return results immediately without waiting for the time to create
>>> the interpreter process
>>> The same notebook will reuse the first created interpreter process, so you
>>> can get the execution result immediately on any server.
>>> By looking at the background server process, you will find that host234,
>>> host235, and host235 use the same interpreter process for the same notebook.
>>> 
>>> Originally, I wanted to record the interpreter process exception. The
>>> cluster re-created the screenshot of the interpreter process in the idle
>>> server, but I am too tired now.
>>> There is time to record later.
>>> 
>>> 
>>>> 在 2018年7月19日，上午7:36，Ruslan Dautkhanov <[email protected]> 写道：
>>>> 
>>>> Thank you luxun,
>>>> 
>>>> I left a couple of comments in that google document.
>>>> 
>>>> --
>>>> Ruslan Dautkhanov
>>>> 
>>>> 
>>>> On Tue, Jul 17, 2018 at 11:30 PM liuxun <[email protected] <mailto:
>>> [email protected]>> wrote:
>>>> hi，Ruslan Dautkhanov
>>>> 
>>>> Thank you very much for your question. according to your advice, I added
>>> 3 schematics to illustrate.
>>>> 1. Distributed Zeppelin Deployment architecture diagram.
>>>> 2. Distributed zeppelin Server fault tolerance diagram.
>>>> 3. Distributed zeppelin Server & intp process fault tolerance diagram.
>>>> 
>>>> 
>>>> The email attachment exceeded the size limit, so I reorganized the
>>> document and updated it with Google Docs.
>>>> https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeu
>>> VDKCRRBm-Qa3Bw/edit?usp=sharing <https://docs.google.com/document/d/
>>> 1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit?usp=sharing>
>>>> 
>>>> 
>>>>> 在 2018年7月18日，下午1:03，liuxun <[email protected] <mailto:[email protected]>>
>>> 写道：
>>>>> 
>>>>> hi，Ruslan Dautkhanov
>>>>> 
>>>>> Thank you very much for your question. according to your advice, I
>>> added 3 schematics to illustrate.
>>>>> 1. Zeppelin Cluster architecture diagram.
>>>>> 2. Distributed zeppelin Server fault tolerance diagram.
>>>>> 3. Distributed zeppelin Server & intp process fault tolerance diagram.
>>>>> 
>>>>> Later, I will merge the schematic into the system design document.
>>>>> 
>>>>> <Zeppelin system architecture diagram00.png>
>>>>> 
>>>>> 
>>>>> <Distributed zeppelin Server fault tolerance diagram 1.png>
>>>>> 
>>>>> 
>>>>> 
>>>>> <Distributed zeppelin Server fault tolerance diagram 2.png>
>>>>> 
>>>>> 
>>>>> 
>>>>>> 在 2018年7月18日，上午1:16，Ruslan Dautkhanov <[email protected] <mailto:
>>> [email protected]>> 写道：
>>>>>> 
>>>>>> Nice.
>>>>>> 
>>>>>> Thanks for sharing.
>>>>>> 
>>>>>> Can you explain how are users routed into a particular zeppelin server
>>>>>> instance? I've seen nginx on top of them, but I don't think the
>>> document
>>>>>> covers details? If one zeppelin server goes down or unhealthy, is nginx
>>>>>> supposed to detect (if so, how?) that and reroute users to a survived
>>>>>> instance?
>>>>>> 
>>>>>> Thanks,
>>>>>> Ruslan Dautkhanov
>>>>>> 
>>>>>> 
>>>>>> On Tue, Jul 17, 2018 at 2:46 AM liuxun <[email protected] <mailto:
>>> [email protected]>> wrote:
>>>>>> 
>>>>>>> hi:
>>>>>>> 
>>>>>>> Our company installed and deployed a lot of zeppelin for data
>>> analysis.
>>>>>>> The single server version of zeppelin could not meet our application
>>>>>>> scenarios, so we transformed zeppelin into a clustered service that
>>>>>>> supports distributed deployment, Have a unified entrance, high
>>>>>>> availability, and High server resource usage.  the email attachment
>>> is the
>>>>>>> entire design document, I am very happy to feedback our modified code
>>> back
>>>>>>> to the community.
>>>>>>> 
>>>>>>> 
>>>>>>> this is the JIRA I submitted in the community,
>>>>>>> 
>>>>>>> https://issues.apache.org/jira/browse/ZEPPELIN-3471 <
>>> https://issues.apache.org/jira/browse/ZEPPELIN-3471>
>>>>>>> 
>>>>>>> 
>>>>>>> Since the design document size exceeds the mail attachment size
>>> limit, the
>>>>>>> document link address has to be sent.
>>>>>>> 
>>>>>>> https://issues.apache.org/jira/secure/attachment/12931896/Zeppelin%
>>> 20distributed%20architecture%20design.pdf <https://issues.apache.org/
>>> jira/secure/attachment/12931896/Zeppelin%20distributed%20architecture%
>>> 20design.pdf>
>>>>>>> 
>>>>>>> https://issues.apache.org/jira/secure/attachment/
>>> 12931895/zepplin%20Cluster%20Sequence%20Diagram.png <
>>> https://issues.apache.org/jira/secure/attachment/
>>> 12931895/zepplin%20Cluster%20Sequence%20Diagram.png>
>>>>>>> 
>>>>>>> 
>>>>>>> liuxun
>>>>>>> 
>>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> -- 
>> 이종열, Jongyoul Lee, 李宗烈
>> http://madeng.net <http://madeng.net/>
>

Re: Zeppelin distributed architecture design

Reply via email to