Yeah ! Thank you very much Philipp: tonight I explored carefully the source code and discovered the 2 thrift servers stuff.
Therefore I solved my problem: here the solution adopted, which can be useful for other people. CONTEXT I have my Zeppelin Server installation located into a LAN, where a K8s Cluster is available, and I want to submit notes in cluster mode over the k8s Cluster. SOLUTION - the driver pod must have its address exposed on the LAN network, otherwise the Zeppelin server cannot connect to the Interpreter Thrift server: I suppose that there are several ways of doing this, but I am not a k8s expert so I simply created a basic driver-pod.template.yaml with a "hostNetwork" spec and referenced it by the "spark.kubernetes.driver.podTemplateFile" interpreter setting. At this point, the 2 servers can talk each other. NOTE 1) do not set the zeppelin run mode = k8s. It must be "local" (or the default "auto") 2) a NFS share (or other shared persistent volume) is required in order to upload the required JARS and easily access the driver logs when the driver shuts down: spark.kubernetes.driver.volumes.nfs.<whichever name>.options.server=<your server> spark.kubernetes.driver.volumes.nfs.<whichever name>.options.path=<local path> spark.kubernetes.driver.volumes.nfs.<whichever name>.mount.path=<mount path> On 2021/10/28 06:48:54, Philipp Dallig <[email protected]> wrote: > Hi Fabrizio, > > We have two connections. First, the Zeppelin interpreter opens a > connection to the Zeppelin server to register and to send back the > interpreter output. The Zeppelin server is the CALLBACK_HOST and the > PORT indicates where the Zeppelin server opened the Thrift service for > the Zeppelin interpreter. > > An important part of the registration is that the Zeppelin interpreter > tells the Zeppelin server where the interpreter pod has an open Thrifts > server port. This information can be found in the Zeppelin server log > output. Be on the lookout for this message. > https://github.com/apache/zeppelin/blob/master/zeppelin-plugins/launcher/k8s-standard/src/main/java/org/apache/zeppelin/interpreter/launcher/K8sRemoteInterpreterProcess.java#L483 > Also note the function ZEPPELIN_K8S_PORTFORWARD, which should help your > Zeppelin server to reach the Zeppelin interpreter in K8s. > > > the 1st "spark-submit" in "cluster mode" is started from the client > (in the zeppelin host, in our case), then the 2nd "spark-submit" in > "client mode" is started by the "/opt/entrypoint.sh" script inside the > standard spark docker image. > > Are you sure you are using the K8s launcher? As you can see in this part > of the code > (https://github.com/apache/zeppelin/blob/2f55fe8ed277b28d71f858633f9c9d76fd18f0c3/zeppelin-plugins/launcher/k8s-standard/src/main/java/org/apache/zeppelin/interpreter/launcher/K8sRemoteInterpreterProcess.java#L411), > > Zeppelin always uses client mode. > > The architecture is quite simple: > > Zeppelin-Server -> Zeppelin-Interpreter (with Spark in client mode) on > K8s -> x-Spark-executors (based on your config) > > Best Regards > Philipp > > > Am 27.10.21 um 15:19 schrieb Fabrizio Fab: > > > Hi Philipp, okay, I realized just now of my HUGE misunderstanding ! > > > > The "double-spark-submit" patter is just the standard spark-on-k8s way of > > running spark applications in cluster mode: > > the 1st "spark-submit" in "cluster mode" is started from the client (in the > > zeppelin host, in our case), then the 2nd "spark-submit" in "client mode" > > is started by the "/opt/entrypoint.sh" script inside the standard spark > > docker image. > > > > At this point I can make a more precise question: > > > > I see that the interpreter.sh starts the RemoteInterpreterServer with, in > > particular the following paramters: CALLBACK_HOST / PORT > > They refers to the Zeppelin host and RPC port > > > > Moreover, when the interpreter starts, it runs a Thrift server on some > > random port. > > > > So, I ask: which communications are supposed to happen, in order to > > correctly set-up my firewall/routing rules ? > > > > -1 Must the Zeppelin server connect to the Interpreter Thrift server ? > > -2 Must the Interpreter Thrift server connect to the Zeppelin server? > > -3 Both ? > > > > - Which ports must the Zeppelin server/ The thrift server find open on the > > other server ? > > > > Thank you everybody! > > > > Fabrizio > > > > > > > > > > On 2021/10/26 11:40:24, Philipp Dallig <[email protected]> wrote: > >> Hi Fabrizio, > >> > >> At the moment I think zeppelin does not support running spark jobs in > >> cluster mode. But in fact K8s mode simulates cluster mode. Because the > >> Zeppelin interpreter is already started as a pod in K8s, as a manual > >> Spark submit execution would do in cluster mode. > >> > >> Spark-submit is called only once during the start of the Zeppelin > >> interpreter. You will find the call in these lines: > >> https://github.com/apache/zeppelin/blob/2f55fe8ed277b28d71f858633f9c9d76fd18f0c3/bin/interpreter.sh#L303-L305 > >> > >> Best Regards > >> Philipp > >> > >> > >> Am 25.10.21 um 21:58 schrieb Fabrizio Fab: > >>> Dear All, I am struggling since more than a week on the following problem. > >>> My Zeppelin Server is running outside the k8s cluster (there is a reason > >>> for this) and I am able to run Spark zeppelin notes in Client mode but > >>> not in Cluster mode. > >>> > >>> I see that, at first, a pod for the interpreter (RemoteInterpreterServer) > >>> is created on the cluster by spark-submit from the Zeppelin host, with > >>> deployMode=cluster (and this happens without errors), then the > >>> interpreter itself runs another spark-submit (this time from the Pod) > >>> with deployMode=client. > >>> > >>> Exactly, the following is the command line submitted by the interpreter > >>> from its pod > >>> > >>> /opt/spark/bin/spark-submit \ > >>> --conf spark.driver.bindAddress=<ip address of the interpreter pod> \ > >>> --deploy-mode client \ > >>> --properties-file /opt/spark/conf/spark.properties \ > >>> --class org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer \ > >>> spark-internal \ > >>> <ZEPPELIN_HOST> \ > >>> <ZEPPELIN_SERVER_RPC_PORT> \ > >>> <interpreter_name>-<user name> > >>> > >>> At this point, the interpreter Pod remains in "Running" state, while the > >>> Zeppelin note remains in "Pending" forever. > >>> > >>> The log of the Interpreter (level = DEBUG) at the end only says: > >>> INFO [2021-10-25 18:16:58,229] ({RemoteInterpreterServer-Thread} > >>> RemoteInterpreterServer.java[run]:194) Launching ThriftServer at <ip > >>> address of the interpreter pod>:<random port> > >>> INFO [2021-10-25 18:16:58,229] ({RegisterThread} > >>> RemoteInterpreterServer.java[run]:592) Start registration > >>> INFO [2021-10-25 18:16:58,332] ({RegisterThread} > >>> RemoteInterpreterServer.java[run]:606) Registering interpreter process > >>> INFO [2021-10-25 18:16:58,356] ({RegisterThread} > >>> RemoteInterpreterServer.java[run]:608) Registered interpreter process > >>> INFO [2021-10-25 18:16:58,356] ({RegisterThread} > >>> RemoteInterpreterServer.java[run]:629) Registration finished > >>> (I replaced the true ip and port with a placeholder to make the log more > >>> clear for you) > >>> > >>> I am stuck at this point.... > >>> Anyone can help me ? Thank you in advance. Fabrizio > >>> >
