Thank you Philipp, for your answer.

interpreter.sh is the shell script which is run by the Zeppelin Server and, in 
particular, the following Line that you highlighted starts the interpreter in 
CLUSTER MODE in my case:

INTERPRETER_RUN_COMMAND+=("${SPARK_SUBMIT}" "--class" "${ZEPPELIN_SERVER}" 
"--driver-class-path" 
"${ZEPPELIN_INTP_CLASSPATH_OVERRIDES}:${ZEPPELIN_INTP_CLASSPATH}" 
"--driver-java-options" "${JAVA_INTP_OPTS}" "${SPARK_SUBMIT_OPTIONS_ARRAY[@]}" 
"${ZEPPELIN_SPARK_CONF_ARRAY[@]}" "${SPARK_APP_JAR}" "${CALLBACK_HOST}" 
"${PORT}" "${INTP_GROUP_ID}" "${INTP_PORT}")

at this point, I can see that the interpreter is started on the cluster, in a 
pod, as expected.
But then, THE INTERPRETER ITSELF, run a SECOND spark-submit. 
To be more clear : this time IS NOT the Zeppelin Server which runs the 
spark-submit, and it is not the "interpreter.sh" script which is called; 
this time is the Interpreter pod which runs the following:

/opt/spark/bin/spark-submit \
 --conf spark.driver.bindAddress=<ip address of the interpreter pod> \
 --deploy-mode client \
 --properties-file /opt/spark/conf/spark.properties \
 --class org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer \
 spark-internal \
  <ZEPPELIN_HOST> \
 <ZEPPELIN_SERVER_RPC_PORT> \
  <interpreter_name>-<user name>

As you can see, the format of this second spark submit is quite different from 
the first:

1) deploy-mode = CLIENT  (not cluster)
2) the resource name is "spark-internal", not a jar file.

It seems like the 1st intance of the spark interpreter (run by the zeppelin 
server) should work as a bridge between the zeppelin server and the 2nd 
instance of the Spark interpreter (run by this 1st instance)  which should 
perform is ordinary duty of getting paragraphs from zeppelin, running them,  
sending the output back to zeppelin.

In this case what is not working as expected is the communication between the 3 
processes:

1) zeppelin server
2) "bridge" interpreter 
3) "true" interpreter

Is it possibile to have some more low-level technical info on the 
interconnection flows between the 2 interpreter instances and the zeppelin 
server ?

Many thanks again.
Fabrizio


On 2021/10/26 11:40:24, Philipp Dallig <philipp.dal...@gmail.com> wrote: 
> Hi Fabrizio,
> 
> At the moment I think zeppelin does not support running spark jobs in 
> cluster mode. But in fact K8s mode simulates cluster mode. Because the 
> Zeppelin interpreter is already started as a pod in K8s, as a manual 
> Spark submit execution would do in cluster mode.
> 
> Spark-submit is called only once during the start of the Zeppelin 
> interpreter. You will find the call in these lines: 
> https://github.com/apache/zeppelin/blob/2f55fe8ed277b28d71f858633f9c9d76fd18f0c3/bin/interpreter.sh#L303-L305
> 
> Best Regards
> Philipp
> 
> 
> Am 25.10.21 um 21:58 schrieb Fabrizio Fab:
> > Dear All, I am struggling since more than a week on the following problem.
> > My Zeppelin Server is running outside the k8s cluster (there is a reason 
> > for this) and I am able to run Spark zeppelin notes in Client mode but not 
> > in Cluster mode.
> >
> > I see that, at first, a pod for the interpreter (RemoteInterpreterServer) 
> > is created on the cluster by spark-submit from the Zeppelin host, with 
> > deployMode=cluster (and this happens without errors), then the interpreter 
> > itself runs another spark-submit  (this time from the Pod) with 
> > deployMode=client.
> >
> > Exactly, the following is the command line submitted by the interpreter 
> > from its pod
> >
> > /opt/spark/bin/spark-submit \
> > --conf spark.driver.bindAddress=<ip address of the interpreter pod> \
> > --deploy-mode client \
> > --properties-file /opt/spark/conf/spark.properties \
> > --class org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer \
> > spark-internal \
> > <ZEPPELIN_HOST> \
> > <ZEPPELIN_SERVER_RPC_PORT> \
> > <interpreter_name>-<user name>
> >
> > At this point, the interpreter Pod remains in "Running" state, while the 
> > Zeppelin note remains in "Pending" forever.
> >
> > The log of the Interpreter (level = DEBUG) at the end only says:
> >   INFO [2021-10-25 18:16:58,229] ({RemoteInterpreterServer-Thread} 
> > RemoteInterpreterServer.java[run]:194) Launching ThriftServer at <ip 
> > address of the interpreter pod>:<random port>
> >   INFO [2021-10-25 18:16:58,229] ({RegisterThread} 
> > RemoteInterpreterServer.java[run]:592) Start registration
> >   INFO [2021-10-25 18:16:58,332] ({RegisterThread} 
> > RemoteInterpreterServer.java[run]:606) Registering interpreter process
> >   INFO [2021-10-25 18:16:58,356] ({RegisterThread} 
> > RemoteInterpreterServer.java[run]:608) Registered interpreter process
> >   INFO [2021-10-25 18:16:58,356] ({RegisterThread} 
> > RemoteInterpreterServer.java[run]:629) Registration finished
> > (I replaced the true ip and port with a placeholder to make the log more 
> > clear for you)
> >
> > I am stuck at this point....
> > Anyone can help me ? Thank you in advance. Fabrizio
> >
> 

Reply via email to