Oh. This issue is pretty straightforward to solve actually. Particularly, in spark-3.5.2.
Just download the `spark-connect` maven jar and place it in `$SPARK_HOME/jars`. Then rebuild the docker image. I saw that I had posted a comment on this Jira as well. I could fix this up for standalone cluster at least this way. On Mon, Sep 9, 2024 at 7:04 PM Nagatomi Yasukazu <yassan0...@gmail.com> wrote: > Hi Prabodh, > > Thank you for your response. > > As you can see from the following JIRA issue, it is possible to run the > Spark Connect Driver on Kubernetes: > > https://issues.apache.org/jira/browse/SPARK-45769 > > However, this issue describes a problem that occurs when the Driver and > Executors are running on different nodes. This could potentially be the > reason why only Standalone mode is currently supported, but I am not > certain about it. > > Thank you for your attention. > > > 2024年9月9日(月) 12:40 Prabodh Agarwal <prabodh1...@gmail.com>: > >> My 2 cents regarding my experience with using spark connect in cluster >> mode. >> >> 1. Create a spark cluster of 2 or more nodes. Make 1 node as master & >> other nodes as workers. Deploy spark connect pointing to the master node. >> This works well. The approach is not well documented, but I could figure >> it out by hit-and-trial. >> 2. In k8s, by default; we can actually get the executors to run on >> kubernetes itself. That is pretty straightforward, but the driver continues >> to run on a local machine. But yeah, I agree as well, making the driver to >> run on k8s itself would be slick. >> >> Thank you. >> >> >> On Mon, Sep 9, 2024 at 6:17 AM Nagatomi Yasukazu <yassan0...@gmail.com> >> wrote: >> >>> Hi All, >>> >>> Why is it not possible to specify cluster as the deploy mode for Spark >>> Connect? >>> >>> As discussed in the following thread, it appears that there is an >>> "arbitrary decision" within spark-submit that "Cluster mode is not >>> applicable" to Spark Connect. >>> >>> GitHub Issue Comment: >>> >>> https://github.com/kubeflow/spark-operator/issues/1801#issuecomment-2000494607 >>> >>> > This will circumvent the submission error you may have gotten if you >>> tried to just run the SparkConnectServer directly. From my investigation, >>> that looks to be an arbitrary decision within spark-submit that Cluster >>> mode is "not applicable" to SparkConnect. Which is sort of true except when >>> using this operator :) >>> >>> I have reviewed the following commit and pull request, but I could not >>> find any discussion or reason explaining why cluster mode is not available: >>> >>> Related Commit: >>> >>> https://github.com/apache/spark/commit/11260310f65e1a30f6b00b380350e414609c5fd4 >>> >>> Related Pull Request: >>> https://github.com/apache/spark/pull/39928 >>> >>> This restriction poses a significant obstacle when trying to use Spark >>> Connect with the Spark Operator. If there is a technical reason for this, I >>> would like to know more about it. Additionally, if this issue is being >>> tracked on JIRA or elsewhere, I would appreciate it if you could provide a >>> link. >>> >>> Thank you in advance. >>> >>> Best regards, >>> Yasukazu Nagatomi >>> >>