Thanks for the sharing, it would be nice if you can write a blog to share it with more wide zeppelin users.
Fabrizio Fab <fabrizio.dagost...@tiscali.it> 于2021年10月28日周四 下午4:29写道: > > Yeah ! Thank you very much Philipp: tonight I explored carefully the > source code and discovered the 2 thrift servers stuff. > > Therefore I solved my problem: here the solution adopted, which can be > useful for other people. > > CONTEXT > I have my Zeppelin Server installation located into a LAN, where a K8s > Cluster is available, and I want to submit notes in cluster mode over the > k8s Cluster. > > SOLUTION > - the driver pod must have its address exposed on the LAN network, > otherwise the Zeppelin server cannot connect to the Interpreter Thrift > server: I suppose that there are several ways of doing this, but I am not a > k8s expert so I simply created a basic driver-pod.template.yaml with a > "hostNetwork" spec and referenced it by the > "spark.kubernetes.driver.podTemplateFile" interpreter setting. > > At this point, the 2 servers can talk each other. > > NOTE > 1) do not set the zeppelin run mode = k8s. It must be "local" (or the > default "auto") > 2) a NFS share (or other shared persistent volume) is required in order to > upload the required JARS and easily access the driver logs when the driver > shuts down: > > spark.kubernetes.driver.volumes.nfs.<whichever name>.options.server=<your > server> > spark.kubernetes.driver.volumes.nfs.<whichever name>.options.path=<local > path> > spark.kubernetes.driver.volumes.nfs.<whichever name>.mount.path=<mount > path> > > > > > > > > > > > > > > > > On 2021/10/28 06:48:54, Philipp Dallig <philipp.dal...@gmail.com> wrote: > > Hi Fabrizio, > > > > We have two connections. First, the Zeppelin interpreter opens a > > connection to the Zeppelin server to register and to send back the > > interpreter output. The Zeppelin server is the CALLBACK_HOST and the > > PORT indicates where the Zeppelin server opened the Thrift service for > > the Zeppelin interpreter. > > > > An important part of the registration is that the Zeppelin interpreter > > tells the Zeppelin server where the interpreter pod has an open Thrifts > > server port. This information can be found in the Zeppelin server log > > output. Be on the lookout for this message. > > > https://github.com/apache/zeppelin/blob/master/zeppelin-plugins/launcher/k8s-standard/src/main/java/org/apache/zeppelin/interpreter/launcher/K8sRemoteInterpreterProcess.java#L483 > > Also note the function ZEPPELIN_K8S_PORTFORWARD, which should help your > > Zeppelin server to reach the Zeppelin interpreter in K8s. > > > > > the 1st "spark-submit" in "cluster mode" is started from the client > > (in the zeppelin host, in our case), then the 2nd "spark-submit" in > > "client mode" is started by the "/opt/entrypoint.sh" script inside the > > standard spark docker image. > > > > Are you sure you are using the K8s launcher? As you can see in this part > > of the code > > ( > https://github.com/apache/zeppelin/blob/2f55fe8ed277b28d71f858633f9c9d76fd18f0c3/zeppelin-plugins/launcher/k8s-standard/src/main/java/org/apache/zeppelin/interpreter/launcher/K8sRemoteInterpreterProcess.java#L411), > > > Zeppelin always uses client mode. > > > > The architecture is quite simple: > > > > Zeppelin-Server -> Zeppelin-Interpreter (with Spark in client mode) on > > K8s -> x-Spark-executors (based on your config) > > > > Best Regards > > Philipp > > > > > > Am 27.10.21 um 15:19 schrieb Fabrizio Fab: > > > > > Hi Philipp, okay, I realized just now of my HUGE misunderstanding ! > > > > > > The "double-spark-submit" patter is just the standard spark-on-k8s way > of running spark applications in cluster mode: > > > the 1st "spark-submit" in "cluster mode" is started from the client > (in the zeppelin host, in our case), then the 2nd "spark-submit" in "client > mode" is started by the "/opt/entrypoint.sh" script inside the standard > spark docker image. > > > > > > At this point I can make a more precise question: > > > > > > I see that the interpreter.sh starts the RemoteInterpreterServer with, > in particular the following paramters: CALLBACK_HOST / PORT > > > They refers to the Zeppelin host and RPC port > > > > > > Moreover, when the interpreter starts, it runs a Thrift server on some > random port. > > > > > > So, I ask: which communications are supposed to happen, in order to > correctly set-up my firewall/routing rules ? > > > > > > -1 Must the Zeppelin server connect to the Interpreter Thrift server ? > > > -2 Must the Interpreter Thrift server connect to the Zeppelin server? > > > -3 Both ? > > > > > > - Which ports must the Zeppelin server/ The thrift server find open > on the other server ? > > > > > > Thank you everybody! > > > > > > Fabrizio > > > > > > > > > > > > > > > On 2021/10/26 11:40:24, Philipp Dallig <philipp.dal...@gmail.com> > wrote: > > >> Hi Fabrizio, > > >> > > >> At the moment I think zeppelin does not support running spark jobs in > > >> cluster mode. But in fact K8s mode simulates cluster mode. Because the > > >> Zeppelin interpreter is already started as a pod in K8s, as a manual > > >> Spark submit execution would do in cluster mode. > > >> > > >> Spark-submit is called only once during the start of the Zeppelin > > >> interpreter. You will find the call in these lines: > > >> > https://github.com/apache/zeppelin/blob/2f55fe8ed277b28d71f858633f9c9d76fd18f0c3/bin/interpreter.sh#L303-L305 > > >> > > >> Best Regards > > >> Philipp > > >> > > >> > > >> Am 25.10.21 um 21:58 schrieb Fabrizio Fab: > > >>> Dear All, I am struggling since more than a week on the following > problem. > > >>> My Zeppelin Server is running outside the k8s cluster (there is a > reason for this) and I am able to run Spark zeppelin notes in Client mode > but not in Cluster mode. > > >>> > > >>> I see that, at first, a pod for the interpreter > (RemoteInterpreterServer) is created on the cluster by spark-submit from > the Zeppelin host, with deployMode=cluster (and this happens without > errors), then the interpreter itself runs another spark-submit (this time > from the Pod) with deployMode=client. > > >>> > > >>> Exactly, the following is the command line submitted by the > interpreter from its pod > > >>> > > >>> /opt/spark/bin/spark-submit \ > > >>> --conf spark.driver.bindAddress=<ip address of the interpreter pod> \ > > >>> --deploy-mode client \ > > >>> --properties-file /opt/spark/conf/spark.properties \ > > >>> --class > org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer \ > > >>> spark-internal \ > > >>> <ZEPPELIN_HOST> \ > > >>> <ZEPPELIN_SERVER_RPC_PORT> \ > > >>> <interpreter_name>-<user name> > > >>> > > >>> At this point, the interpreter Pod remains in "Running" state, while > the Zeppelin note remains in "Pending" forever. > > >>> > > >>> The log of the Interpreter (level = DEBUG) at the end only says: > > >>> INFO [2021-10-25 18:16:58,229] ({RemoteInterpreterServer-Thread} > RemoteInterpreterServer.java[run]:194) Launching ThriftServer at <ip > address of the interpreter pod>:<random port> > > >>> INFO [2021-10-25 18:16:58,229] ({RegisterThread} > RemoteInterpreterServer.java[run]:592) Start registration > > >>> INFO [2021-10-25 18:16:58,332] ({RegisterThread} > RemoteInterpreterServer.java[run]:606) Registering interpreter process > > >>> INFO [2021-10-25 18:16:58,356] ({RegisterThread} > RemoteInterpreterServer.java[run]:608) Registered interpreter process > > >>> INFO [2021-10-25 18:16:58,356] ({RegisterThread} > RemoteInterpreterServer.java[run]:629) Registration finished > > >>> (I replaced the true ip and port with a placeholder to make the log > more clear for you) > > >>> > > >>> I am stuck at this point.... > > >>> Anyone can help me ? Thank you in advance. Fabrizio > > >>> > > > -- Best Regards Jeff Zhang