Hi Philipp, okay, I realized just now of my HUGE misunderstanding ! The "double-spark-submit" patter is just the standard spark-on-k8s way of running spark applications in cluster mode: the 1st "spark-submit" in "cluster mode" is started from the client (in the zeppelin host, in our case), then the 2nd "spark-submit" in "client mode" is started by the "/opt/entrypoint.sh" script inside the standard spark docker image.
At this point I can make a more precise question: I see that the interpreter.sh starts the RemoteInterpreterServer with, in particular the following paramters: CALLBACK_HOST / PORT They refers to the Zeppelin host and RPC port Moreover, when the interpreter starts, it runs a Thrift server on some random port. So, I ask: which communications are supposed to happen, in order to correctly set-up my firewall/routing rules ? -1 Must the Zeppelin server connect to the Interpreter Thrift server ? -2 Must the Interpreter Thrift server connect to the Zeppelin server? -3 Both ? - Which ports must the Zeppelin server/ The thrift server find open on the other server ? Thank you everybody! Fabrizio On 2021/10/26 11:40:24, Philipp Dallig <philipp.dal...@gmail.com> wrote: > Hi Fabrizio, > > At the moment I think zeppelin does not support running spark jobs in > cluster mode. But in fact K8s mode simulates cluster mode. Because the > Zeppelin interpreter is already started as a pod in K8s, as a manual > Spark submit execution would do in cluster mode. > > Spark-submit is called only once during the start of the Zeppelin > interpreter. You will find the call in these lines: > https://github.com/apache/zeppelin/blob/2f55fe8ed277b28d71f858633f9c9d76fd18f0c3/bin/interpreter.sh#L303-L305 > > Best Regards > Philipp > > > Am 25.10.21 um 21:58 schrieb Fabrizio Fab: > > Dear All, I am struggling since more than a week on the following problem. > > My Zeppelin Server is running outside the k8s cluster (there is a reason > > for this) and I am able to run Spark zeppelin notes in Client mode but not > > in Cluster mode. > > > > I see that, at first, a pod for the interpreter (RemoteInterpreterServer) > > is created on the cluster by spark-submit from the Zeppelin host, with > > deployMode=cluster (and this happens without errors), then the interpreter > > itself runs another spark-submit (this time from the Pod) with > > deployMode=client. > > > > Exactly, the following is the command line submitted by the interpreter > > from its pod > > > > /opt/spark/bin/spark-submit \ > > --conf spark.driver.bindAddress=<ip address of the interpreter pod> \ > > --deploy-mode client \ > > --properties-file /opt/spark/conf/spark.properties \ > > --class org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer \ > > spark-internal \ > > <ZEPPELIN_HOST> \ > > <ZEPPELIN_SERVER_RPC_PORT> \ > > <interpreter_name>-<user name> > > > > At this point, the interpreter Pod remains in "Running" state, while the > > Zeppelin note remains in "Pending" forever. > > > > The log of the Interpreter (level = DEBUG) at the end only says: > > INFO [2021-10-25 18:16:58,229] ({RemoteInterpreterServer-Thread} > > RemoteInterpreterServer.java[run]:194) Launching ThriftServer at <ip > > address of the interpreter pod>:<random port> > > INFO [2021-10-25 18:16:58,229] ({RegisterThread} > > RemoteInterpreterServer.java[run]:592) Start registration > > INFO [2021-10-25 18:16:58,332] ({RegisterThread} > > RemoteInterpreterServer.java[run]:606) Registering interpreter process > > INFO [2021-10-25 18:16:58,356] ({RegisterThread} > > RemoteInterpreterServer.java[run]:608) Registered interpreter process > > INFO [2021-10-25 18:16:58,356] ({RegisterThread} > > RemoteInterpreterServer.java[run]:629) Registration finished > > (I replaced the true ip and port with a placeholder to make the log more > > clear for you) > > > > I am stuck at this point.... > > Anyone can help me ? Thank you in advance. Fabrizio > > >