Yeah ! Thank you very much Philipp: tonight I explored carefully the source 
code and discovered the 2 thrift servers stuff.

Therefore I solved my problem: here the solution adopted, which can be useful 
for other people.

CONTEXT
I have my Zeppelin Server installation located into a LAN, where a K8s Cluster 
is available, and I want to submit notes in cluster mode over the k8s Cluster.

SOLUTION
- the driver pod must have its address exposed on the LAN network, otherwise 
the Zeppelin server cannot connect to the Interpreter Thrift server: I suppose 
that there are several ways of doing this, but I am not a k8s expert so I 
simply created a basic driver-pod.template.yaml with a "hostNetwork" spec and 
referenced it by the  "spark.kubernetes.driver.podTemplateFile" interpreter 
setting.
 
At this point, the 2 servers can talk each other.

NOTE
1) do not set the zeppelin run mode = k8s. It must be "local" (or the default 
"auto")
2) a NFS share (or other shared persistent volume) is required in order to 
upload the required JARS and easily access the driver logs when the driver 
shuts down:

spark.kubernetes.driver.volumes.nfs.<whichever name>.options.server=<your 
server>
spark.kubernetes.driver.volumes.nfs.<whichever name>.options.path=<local path>
spark.kubernetes.driver.volumes.nfs.<whichever name>.mount.path=<mount path>















On 2021/10/28 06:48:54, Philipp Dallig <philipp.dal...@gmail.com> wrote: 
> Hi Fabrizio,
> 
> We have two connections. First, the Zeppelin interpreter opens a 
> connection to the Zeppelin server to register and to send back the 
> interpreter output. The Zeppelin server is the CALLBACK_HOST and the 
> PORT indicates where the Zeppelin server opened the Thrift service for 
> the Zeppelin interpreter.
> 
> An important part of the registration is that the Zeppelin interpreter 
> tells the Zeppelin server where the interpreter pod has an open Thrifts 
> server port. This information can be found in the Zeppelin server log 
> output. Be on the lookout for this message. 
> https://github.com/apache/zeppelin/blob/master/zeppelin-plugins/launcher/k8s-standard/src/main/java/org/apache/zeppelin/interpreter/launcher/K8sRemoteInterpreterProcess.java#L483
> Also note the function ZEPPELIN_K8S_PORTFORWARD, which should help your 
> Zeppelin server to reach the Zeppelin interpreter in K8s.
> 
>  > the 1st "spark-submit" in "cluster mode" is started from the client 
> (in the zeppelin host, in our case), then the 2nd "spark-submit" in 
> "client mode" is started by the "/opt/entrypoint.sh" script inside the 
> standard spark docker image.
> 
> Are you sure you are using the K8s launcher? As you can see in this part 
> of the code 
> (https://github.com/apache/zeppelin/blob/2f55fe8ed277b28d71f858633f9c9d76fd18f0c3/zeppelin-plugins/launcher/k8s-standard/src/main/java/org/apache/zeppelin/interpreter/launcher/K8sRemoteInterpreterProcess.java#L411),
>  
> Zeppelin always uses client mode.
> 
> The architecture is quite simple:
> 
> Zeppelin-Server -> Zeppelin-Interpreter (with Spark in client mode) on 
> K8s -> x-Spark-executors (based on your config)
> 
> Best Regards
> Philipp
> 
> 
> Am 27.10.21 um 15:19 schrieb Fabrizio Fab:
> 
> > Hi Philipp, okay, I realized just now of my HUGE misunderstanding !
> >
> > The "double-spark-submit" patter is just the standard spark-on-k8s way of 
> > running spark applications in cluster mode:
> > the 1st "spark-submit" in "cluster mode" is started from the client (in the 
> > zeppelin host, in our case), then the 2nd "spark-submit" in "client mode" 
> > is started by the "/opt/entrypoint.sh" script inside the standard spark 
> > docker image.
> >
> > At this point I can make a more precise question:
> >
> > I see that the interpreter.sh starts the RemoteInterpreterServer with, in 
> > particular the following paramters: CALLBACK_HOST / PORT
> > They refers to the Zeppelin host and RPC port
> >
> > Moreover, when the interpreter starts, it runs a Thrift server on some 
> > random port.
> >
> > So, I ask: which communications are supposed to happen, in order to 
> > correctly set-up my firewall/routing rules ?
> >
> > -1 Must the Zeppelin server connect to the Interpreter Thrift server ?
> > -2 Must the Interpreter Thrift server connect to the Zeppelin server?
> > -3 Both ?
> >
> > - Which ports must the Zeppelin server/ The thrift server  find open on the 
> > other server ?
> >
> > Thank you everybody!
> >
> > Fabrizio
> >
> >
> >
> >
> > On 2021/10/26 11:40:24, Philipp Dallig <philipp.dal...@gmail.com> wrote:
> >> Hi Fabrizio,
> >>
> >> At the moment I think zeppelin does not support running spark jobs in
> >> cluster mode. But in fact K8s mode simulates cluster mode. Because the
> >> Zeppelin interpreter is already started as a pod in K8s, as a manual
> >> Spark submit execution would do in cluster mode.
> >>
> >> Spark-submit is called only once during the start of the Zeppelin
> >> interpreter. You will find the call in these lines:
> >> https://github.com/apache/zeppelin/blob/2f55fe8ed277b28d71f858633f9c9d76fd18f0c3/bin/interpreter.sh#L303-L305
> >>
> >> Best Regards
> >> Philipp
> >>
> >>
> >> Am 25.10.21 um 21:58 schrieb Fabrizio Fab:
> >>> Dear All, I am struggling since more than a week on the following problem.
> >>> My Zeppelin Server is running outside the k8s cluster (there is a reason 
> >>> for this) and I am able to run Spark zeppelin notes in Client mode but 
> >>> not in Cluster mode.
> >>>
> >>> I see that, at first, a pod for the interpreter (RemoteInterpreterServer) 
> >>> is created on the cluster by spark-submit from the Zeppelin host, with 
> >>> deployMode=cluster (and this happens without errors), then the 
> >>> interpreter itself runs another spark-submit  (this time from the Pod) 
> >>> with deployMode=client.
> >>>
> >>> Exactly, the following is the command line submitted by the interpreter 
> >>> from its pod
> >>>
> >>> /opt/spark/bin/spark-submit \
> >>> --conf spark.driver.bindAddress=<ip address of the interpreter pod> \
> >>> --deploy-mode client \
> >>> --properties-file /opt/spark/conf/spark.properties \
> >>> --class org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer \
> >>> spark-internal \
> >>> <ZEPPELIN_HOST> \
> >>> <ZEPPELIN_SERVER_RPC_PORT> \
> >>> <interpreter_name>-<user name>
> >>>
> >>> At this point, the interpreter Pod remains in "Running" state, while the 
> >>> Zeppelin note remains in "Pending" forever.
> >>>
> >>> The log of the Interpreter (level = DEBUG) at the end only says:
> >>>    INFO [2021-10-25 18:16:58,229] ({RemoteInterpreterServer-Thread} 
> >>> RemoteInterpreterServer.java[run]:194) Launching ThriftServer at <ip 
> >>> address of the interpreter pod>:<random port>
> >>>    INFO [2021-10-25 18:16:58,229] ({RegisterThread} 
> >>> RemoteInterpreterServer.java[run]:592) Start registration
> >>>    INFO [2021-10-25 18:16:58,332] ({RegisterThread} 
> >>> RemoteInterpreterServer.java[run]:606) Registering interpreter process
> >>>    INFO [2021-10-25 18:16:58,356] ({RegisterThread} 
> >>> RemoteInterpreterServer.java[run]:608) Registered interpreter process
> >>>    INFO [2021-10-25 18:16:58,356] ({RegisterThread} 
> >>> RemoteInterpreterServer.java[run]:629) Registration finished
> >>> (I replaced the true ip and port with a placeholder to make the log more 
> >>> clear for you)
> >>>
> >>> I am stuck at this point....
> >>> Anyone can help me ? Thank you in advance. Fabrizio
> >>>
> 

Reply via email to