Re: Spark on k8s cluster mode, from outside of the cluster. [SOLVED]

Jeff Zhang Thu, 28 Oct 2021 01:31:12 -0700

Thanks for the sharing, it would be nice if you can write a blog to share
it with more wide zeppelin users.



Fabrizio Fab <[email protected]> 于2021年10月28日周四 下午4:29写道：

>
> Yeah ! Thank you very much Philipp: tonight I explored carefully the
> source code and discovered the 2 thrift servers stuff.
>
> Therefore I solved my problem: here the solution adopted, which can be
> useful for other people.
>
> CONTEXT
> I have my Zeppelin Server installation located into a LAN, where a K8s
> Cluster is available, and I want to submit notes in cluster mode over the
> k8s Cluster.
>
> SOLUTION
> - the driver pod must have its address exposed on the LAN network,
> otherwise the Zeppelin server cannot connect to the Interpreter Thrift
> server: I suppose that there are several ways of doing this, but I am not a
> k8s expert so I simply created a basic driver-pod.template.yaml with a
> "hostNetwork" spec and referenced it by the
> "spark.kubernetes.driver.podTemplateFile" interpreter setting.
>
> At this point, the 2 servers can talk each other.
>
> NOTE
> 1) do not set the zeppelin run mode = k8s. It must be "local" (or the
> default "auto")
> 2) a NFS share (or other shared persistent volume) is required in order to
> upload the required JARS and easily access the driver logs when the driver
> shuts down:
>
> spark.kubernetes.driver.volumes.nfs.<whichever name>.options.server=<your
> server>
> spark.kubernetes.driver.volumes.nfs.<whichever name>.options.path=<local
> path>
> spark.kubernetes.driver.volumes.nfs.<whichever name>.mount.path=<mount
> path>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On 2021/10/28 06:48:54, Philipp Dallig <[email protected]> wrote:
> > Hi Fabrizio,
> >
> > We have two connections. First, the Zeppelin interpreter opens a
> > connection to the Zeppelin server to register and to send back the
> > interpreter output. The Zeppelin server is the CALLBACK_HOST and the
> > PORT indicates where the Zeppelin server opened the Thrift service for
> > the Zeppelin interpreter.
> >
> > An important part of the registration is that the Zeppelin interpreter
> > tells the Zeppelin server where the interpreter pod has an open Thrifts
> > server port. This information can be found in the Zeppelin server log
> > output. Be on the lookout for this message.
> >
> https://github.com/apache/zeppelin/blob/master/zeppelin-plugins/launcher/k8s-standard/src/main/java/org/apache/zeppelin/interpreter/launcher/K8sRemoteInterpreterProcess.java#L483
> > Also note the function ZEPPELIN_K8S_PORTFORWARD, which should help your
> > Zeppelin server to reach the Zeppelin interpreter in K8s.
> >
> >  > the 1st "spark-submit" in "cluster mode" is started from the client
> > (in the zeppelin host, in our case), then the 2nd "spark-submit" in
> > "client mode" is started by the "/opt/entrypoint.sh" script inside the
> > standard spark docker image.
> >
> > Are you sure you are using the K8s launcher? As you can see in this part
> > of the code
> > (
> https://github.com/apache/zeppelin/blob/2f55fe8ed277b28d71f858633f9c9d76fd18f0c3/zeppelin-plugins/launcher/k8s-standard/src/main/java/org/apache/zeppelin/interpreter/launcher/K8sRemoteInterpreterProcess.java#L411),
>
> > Zeppelin always uses client mode.
> >
> > The architecture is quite simple:
> >
> > Zeppelin-Server -> Zeppelin-Interpreter (with Spark in client mode) on
> > K8s -> x-Spark-executors (based on your config)
> >
> > Best Regards
> > Philipp
> >
> >
> > Am 27.10.21 um 15:19 schrieb Fabrizio Fab:
> >
> > > Hi Philipp, okay, I realized just now of my HUGE misunderstanding !
> > >
> > > The "double-spark-submit" patter is just the standard spark-on-k8s way
> of running spark applications in cluster mode:
> > > the 1st "spark-submit" in "cluster mode" is started from the client
> (in the zeppelin host, in our case), then the 2nd "spark-submit" in "client
> mode" is started by the "/opt/entrypoint.sh" script inside the standard
> spark docker image.
> > >
> > > At this point I can make a more precise question:
> > >
> > > I see that the interpreter.sh starts the RemoteInterpreterServer with,
> in particular the following paramters: CALLBACK_HOST / PORT
> > > They refers to the Zeppelin host and RPC port
> > >
> > > Moreover, when the interpreter starts, it runs a Thrift server on some
> random port.
> > >
> > > So, I ask: which communications are supposed to happen, in order to
> correctly set-up my firewall/routing rules ?
> > >
> > > -1 Must the Zeppelin server connect to the Interpreter Thrift server ?
> > > -2 Must the Interpreter Thrift server connect to the Zeppelin server?
> > > -3 Both ?
> > >
> > > - Which ports must the Zeppelin server/ The thrift server  find open
> on the other server ?
> > >
> > > Thank you everybody!
> > >
> > > Fabrizio
> > >
> > >
> > >
> > >
> > > On 2021/10/26 11:40:24, Philipp Dallig <[email protected]>
> wrote:
> > >> Hi Fabrizio,
> > >>
> > >> At the moment I think zeppelin does not support running spark jobs in
> > >> cluster mode. But in fact K8s mode simulates cluster mode. Because the
> > >> Zeppelin interpreter is already started as a pod in K8s, as a manual
> > >> Spark submit execution would do in cluster mode.
> > >>
> > >> Spark-submit is called only once during the start of the Zeppelin
> > >> interpreter. You will find the call in these lines:
> > >>
> https://github.com/apache/zeppelin/blob/2f55fe8ed277b28d71f858633f9c9d76fd18f0c3/bin/interpreter.sh#L303-L305
> > >>
> > >> Best Regards
> > >> Philipp
> > >>
> > >>
> > >> Am 25.10.21 um 21:58 schrieb Fabrizio Fab:
> > >>> Dear All, I am struggling since more than a week on the following
> problem.
> > >>> My Zeppelin Server is running outside the k8s cluster (there is a
> reason for this) and I am able to run Spark zeppelin notes in Client mode
> but not in Cluster mode.
> > >>>
> > >>> I see that, at first, a pod for the interpreter
> (RemoteInterpreterServer) is created on the cluster by spark-submit from
> the Zeppelin host, with deployMode=cluster (and this happens without
> errors), then the interpreter itself runs another spark-submit  (this time
> from the Pod) with deployMode=client.
> > >>>
> > >>> Exactly, the following is the command line submitted by the
> interpreter from its pod
> > >>>
> > >>> /opt/spark/bin/spark-submit \
> > >>> --conf spark.driver.bindAddress=<ip address of the interpreter pod> \
> > >>> --deploy-mode client \
> > >>> --properties-file /opt/spark/conf/spark.properties \
> > >>> --class
> org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer \
> > >>> spark-internal \
> > >>> <ZEPPELIN_HOST> \
> > >>> <ZEPPELIN_SERVER_RPC_PORT> \
> > >>> <interpreter_name>-<user name>
> > >>>
> > >>> At this point, the interpreter Pod remains in "Running" state, while
> the Zeppelin note remains in "Pending" forever.
> > >>>
> > >>> The log of the Interpreter (level = DEBUG) at the end only says:
> > >>>    INFO [2021-10-25 18:16:58,229] ({RemoteInterpreterServer-Thread}
> RemoteInterpreterServer.java[run]:194) Launching ThriftServer at <ip
> address of the interpreter pod>:<random port>
> > >>>    INFO [2021-10-25 18:16:58,229] ({RegisterThread}
> RemoteInterpreterServer.java[run]:592) Start registration
> > >>>    INFO [2021-10-25 18:16:58,332] ({RegisterThread}
> RemoteInterpreterServer.java[run]:606) Registering interpreter process
> > >>>    INFO [2021-10-25 18:16:58,356] ({RegisterThread}
> RemoteInterpreterServer.java[run]:608) Registered interpreter process
> > >>>    INFO [2021-10-25 18:16:58,356] ({RegisterThread}
> RemoteInterpreterServer.java[run]:629) Registration finished
> > >>> (I replaced the true ip and port with a placeholder to make the log
> more clear for you)
> > >>>
> > >>> I am stuck at this point....
> > >>> Anyone can help me ? Thank you in advance. Fabrizio
> > >>>
> >
>


-- 
Best Regards

Jeff Zhang

Re: Spark on k8s cluster mode, from outside of the cluster. [SOLVED]

Reply via email to