hi, I have submitted the first module of the zeppline cluster upgrade, please help me review the code, thank you! https://github.com/apache/zeppelin/pull/3156 <https://github.com/apache/zeppelin/pull/3156>
I updated the atomix algorithm library module in the system design documentation, please click on the link below to browse. https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit#heading=h.qbcgqhd0wwh8 <https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit#heading=h.qbcgqhd0wwh8> > 在 2018年8月11日,上午10:36,liuxun <[email protected]> 写道: > > hi, > > After 2 weeks of development, I have completed the development of upgrading > copycat to the atomix algorithm library. > The reason for the increased workload is the need to resolve the problem of > netty package conflicts. Now it has been used on our intra-company clusters > using the atomix algorithm. > > Because atomix uses the 4.1.27-Final version of the netty JAR package. > If you put the high version of the netty package directly in ./zeppelin/lib > or the ./zeppelin/interpreter path, it will conflict with the netty package > version of spark, causing the spark-interpreter to fail. > Need to be isolated in zeppelin-server and interpreter-process by loading the > atomix netty JAR and the netty package in the classpath through the custom > classloader. > > I updated the atomix algorithm library module in the system design > documentation, please click on the link below to browse. > > Atomix Raft algorithm library > https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit#heading=h.qbcgqhd0wwh8 > > <https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit#heading=h.qbcgqhd0wwh8> > > I will send a new code to submit the pull, please help me merge it, thank you. > > Thanks, > Xun Liu > >> 在 2018年7月24日,下午12:57,liuxun <[email protected] <mailto:[email protected]>> 写道: >> >> @Jongyoul Lee: >> Thank you for your attention. >> >> Indeed, as you said, the `Copycat` project has been closed and has been >> migrated to `https://github.com/atomix/atomix` >> <https://github.com/atomix/atomix%60>. >> >> I also considered this issue during development. >> The main reason was that it was enough to realize Raft using `Copycat` at >> the time, and it was not considered too long. >> >> Today, I took a look at the documentation of atomix, >> https://atomix.io/docs/latest/user-manual/ >> <https://atomix.io/docs/latest/user-manual/> , >> which has a lot of features, such as broadcasting messages in the cluster, >> detecting cluster events... , >> From the perspective of zeppelin's long-term development, it is better to >> use atomix. >> So, I will switch the Raft protocol algorithm library to atomix, which is >> not difficult to modify. >> >> Struggle for zeppelin!!! :-) >> >> >>> 在 2018年7月24日,上午9:35,Jongyoul Lee <[email protected] >>> <mailto:[email protected]>> 写道: >>> >>> First of all, thank you for your effort and contribution. >>> >>> I read it carefully today, and personally, it's a very nice feature and >>> idea. >>> >>> Let's discuss it and improve more concretely. I also left comments on the >>> doc. >>> >>> And I have a simple question. >>> >>> `Copycat`, which you used to implement it, is deprecated by owner[1] and >>> moved under https://github.com/atomix/atomix/ >>> <https://github.com/atomix/atomix/>. I'm afraid of it. Do you >>> have any reason to use this library? It's even SNAPSHOT version. >>> >>> Regards, >>> JL >>> >>> [1]: https://github.com/atomix/copycat <https://github.com/atomix/copycat> >>> >>> On Sat, Jul 21, 2018 at 2:07 AM, liuxun <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>>> HI: >>>> >>>> In order to more intuitively express the actual use of distributed >>>> zeppelin clusters. >>>> I updated this design document, starting with the 16th page of the >>>> document, adding 2 GIF animations showing the operation record screen of >>>> the zeppelin cluster we are using now. >>>> https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeu >>>> <https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeu> >>>> VDKCRRBm-Qa3Bw/edit# <https://docs.google.com/document/d/ >>>> <https://docs.google.com/document/d/> >>>> 1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit#> >>>> >>>> Distributed clustered zeppelin is already in use at our company, and the >>>> recorded screens are all real. >>>> The first recorded screens GIF shows the following >>>> Create a cluster of three zeppelin servers >>>> Add 234, 235, 236 to the zeppelin.cluster.addr attribute in >>>> zeppelin-site.xml to create a cluster >>>> Start these 3 servers at the same time >>>> Open the web pages of these 3 servers and prepare for the notebook >>>> operation. >>>> >>>> >>>> The second recorded screens GIF shows the following >>>> Create an interpreter process in the cluster >>>> Create a notebook on host234 and execute it, This action will create an >>>> interpreter process in the server with free resources in the cluster. >>>> You can then continue editing this notebook on host235 and execute it, You >>>> can return results immediately without waiting for the time to create an >>>> interpreter process. >>>> Again, you can continue to edit this notebook on host236. And execute it, >>>> you can return results immediately without waiting for the time to create >>>> the interpreter process >>>> The same notebook will reuse the first created interpreter process, so you >>>> can get the execution result immediately on any server. >>>> By looking at the background server process, you will find that host234, >>>> host235, and host235 use the same interpreter process for the same >>>> notebook. >>>> >>>> Originally, I wanted to record the interpreter process exception. The >>>> cluster re-created the screenshot of the interpreter process in the idle >>>> server, but I am too tired now. >>>> There is time to record later. >>>> >>>> >>>>> 在 2018年7月19日,上午7:36,Ruslan Dautkhanov <[email protected] >>>>> <mailto:[email protected]>> 写道: >>>>> >>>>> Thank you luxun, >>>>> >>>>> I left a couple of comments in that google document. >>>>> >>>>> -- >>>>> Ruslan Dautkhanov >>>>> >>>>> >>>>> On Tue, Jul 17, 2018 at 11:30 PM liuxun <[email protected] >>>>> <mailto:[email protected]> <mailto: >>>> [email protected] <mailto:[email protected]>>> wrote: >>>>> hi,Ruslan Dautkhanov >>>>> >>>>> Thank you very much for your question. according to your advice, I added >>>> 3 schematics to illustrate. >>>>> 1. Distributed Zeppelin Deployment architecture diagram. >>>>> 2. Distributed zeppelin Server fault tolerance diagram. >>>>> 3. Distributed zeppelin Server & intp process fault tolerance diagram. >>>>> >>>>> >>>>> The email attachment exceeded the size limit, so I reorganized the >>>> document and updated it with Google Docs. >>>>> https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeu >>>>> <https://docs.google.com/document/d/1a8QLSyR3M5AhlG1GIYuDTj6bwazeu> >>>> VDKCRRBm-Qa3Bw/edit?usp=sharing <https://docs.google.com/document/d/ >>>> <https://docs.google.com/document/d/> >>>> 1a8QLSyR3M5AhlG1GIYuDTj6bwazeuVDKCRRBm-Qa3Bw/edit?usp=sharing> >>>>> >>>>> >>>>>> 在 2018年7月18日,下午1:03,liuxun <[email protected] <mailto:[email protected]> >>>>>> <mailto:[email protected] <mailto:[email protected]>>> >>>> 写道: >>>>>> >>>>>> hi,Ruslan Dautkhanov >>>>>> >>>>>> Thank you very much for your question. according to your advice, I >>>> added 3 schematics to illustrate. >>>>>> 1. Zeppelin Cluster architecture diagram. >>>>>> 2. Distributed zeppelin Server fault tolerance diagram. >>>>>> 3. Distributed zeppelin Server & intp process fault tolerance diagram. >>>>>> >>>>>> Later, I will merge the schematic into the system design document. >>>>>> >>>>>> <Zeppelin system architecture diagram00.png> >>>>>> >>>>>> >>>>>> <Distributed zeppelin Server fault tolerance diagram 1.png> >>>>>> >>>>>> >>>>>> >>>>>> <Distributed zeppelin Server fault tolerance diagram 2.png> >>>>>> >>>>>> >>>>>> >>>>>>> 在 2018年7月18日,上午1:16,Ruslan Dautkhanov <[email protected] >>>>>>> <mailto:[email protected]> <mailto: >>>> [email protected] <mailto:[email protected]>>> 写道: >>>>>>> >>>>>>> Nice. >>>>>>> >>>>>>> Thanks for sharing. >>>>>>> >>>>>>> Can you explain how are users routed into a particular zeppelin server >>>>>>> instance? I've seen nginx on top of them, but I don't think the >>>> document >>>>>>> covers details? If one zeppelin server goes down or unhealthy, is nginx >>>>>>> supposed to detect (if so, how?) that and reroute users to a survived >>>>>>> instance? >>>>>>> >>>>>>> Thanks, >>>>>>> Ruslan Dautkhanov >>>>>>> >>>>>>> >>>>>>> On Tue, Jul 17, 2018 at 2:46 AM liuxun <[email protected] >>>>>>> <mailto:[email protected]> <mailto: >>>> [email protected] <mailto:[email protected]>>> wrote: >>>>>>> >>>>>>>> hi: >>>>>>>> >>>>>>>> Our company installed and deployed a lot of zeppelin for data >>>> analysis. >>>>>>>> The single server version of zeppelin could not meet our application >>>>>>>> scenarios, so we transformed zeppelin into a clustered service that >>>>>>>> supports distributed deployment, Have a unified entrance, high >>>>>>>> availability, and High server resource usage. the email attachment >>>> is the >>>>>>>> entire design document, I am very happy to feedback our modified code >>>> back >>>>>>>> to the community. >>>>>>>> >>>>>>>> >>>>>>>> this is the JIRA I submitted in the community, >>>>>>>> >>>>>>>> https://issues.apache.org/jira/browse/ZEPPELIN-3471 >>>>>>>> <https://issues.apache.org/jira/browse/ZEPPELIN-3471> < >>>> https://issues.apache.org/jira/browse/ZEPPELIN-3471 >>>> <https://issues.apache.org/jira/browse/ZEPPELIN-3471>> >>>>>>>> >>>>>>>> >>>>>>>> Since the design document size exceeds the mail attachment size >>>> limit, the >>>>>>>> document link address has to be sent. >>>>>>>> >>>>>>>> https://issues.apache.org/jira/secure/attachment/12931896/Zeppelin% >>>>>>>> <https://issues.apache.org/jira/secure/attachment/12931896/Zeppelin%> >>>> 20distributed%20architecture%20design.pdf <https://issues.apache.org/ >>>> <https://issues.apache.org/> >>>> jira/secure/attachment/12931896/Zeppelin%20distributed%20architecture% >>>> 20design.pdf> >>>>>>>> >>>>>>>> https://issues.apache.org/jira/secure/attachment/ >>>>>>>> <https://issues.apache.org/jira/secure/attachment/> >>>> 12931895/zepplin%20Cluster%20Sequence%20Diagram.png < >>>> https://issues.apache.org/jira/secure/attachment/ >>>> <https://issues.apache.org/jira/secure/attachment/> >>>> 12931895/zepplin%20Cluster%20Sequence%20Diagram.png> >>>>>>>> >>>>>>>> >>>>>>>> liuxun >>>>>>>> >>>>>> >>>>> >>>> >>>> >>> >>> >>> -- >>> 이종열, Jongyoul Lee, 李宗烈 >>> http://madeng.net <http://madeng.net/> >> >
