Re: Re: Running Spark Connect Server in Cluster Mode on Kubernetes

[email protected] Wed, 18 Oct 2023 23:30:13 -0700

Hi all, 
    Has the spark connect server running on k8s functionality been implemented?

From: Nagatomi Yasukazu
Date: 2023-09-05 17:51
To: user
Subject: Re: Running Spark Connect Server in Cluster Mode on Kubernetes
Dear Spark Community,

I've been exploring the capabilities of the Spark Connect Server and 
encountered an issue when trying to launch it in a cluster deploy mode with 
Kubernetes as the master.

While initiating the `start-connect-server.sh` script with the `--conf` 
parameter for `spark.master` and `spark.submit.deployMode`, I was met with an 
error message:

```
Exception in thread "main" org.apache.spark.SparkException: Cluster deploy mode 
is not applicable to Spark Connect server.
```

This error message can be traced back to Spark's source code here:
https://github.com/apache/spark/blob/6c885a7cf57df328b03308cff2eed814bda156e4/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L307

Given my observations, I'm curious about the Spark Connect Server roadmap:

Is there a plan or current conversation to enable Kubernetes as a master in 
Spark Connect Server's cluster deploy mode?

I have tried to gather information from existing JIRA tickets, but have not 
been able to get a definitive answer:

https://issues.apache.org/jira/browse/SPARK-42730
https://issues.apache.org/jira/browse/SPARK-39375
https://issues.apache.org/jira/browse/SPARK-44117

Any thoughts, updates, or references to similar conversations or initiatives 
would be greatly appreciated.

Thank you for your time and expertise!

Best regards,
Yasukazu

2023年9月5日(火) 12:09 Nagatomi Yasukazu <[email protected]>:
Hello Mich,
Thank you for your questions. Here are my responses:

> 1. What investigation have you done to show that it is running in local mode?

I have verified through the History Server's Environment tab that:
- "spark.master" is set to local[*]
- "spark.app.id" begins with local-xxx
- "spark.submit.deployMode" is set to local

> 2. who has configured this kubernetes cluster? Is it supplied by a cloud 
> vendor?

Our Kubernetes cluster was set up in an on-prem environment using RKE2( 
https://docs.rke2.io/ ).

> 3. Confirm that you have configured Spark Connect Server correctly for 
> cluster mode. Make sure you specify the cluster manager (e.g., Kubernetes) 
> and other relevant Spark configurations in your Spark job submission.

Based on the Spark Connect documentation I've read, there doesn't seem to be 
any specific settings for cluster mode related to the Spark Connect Server.

Configuration - Spark 3.4.1 Documentation
https://spark.apache.org/docs/3.4.1/configuration.html#spark-connect

Quickstart: Spark Connect — PySpark 3.4.1 documentation
https://spark.apache.org/docs/latest/api/python/getting_started/quickstart_connect.html

Spark Connect Overview - Spark 3.4.1 Documentation
https://spark.apache.org/docs/latest/spark-connect-overview.html

The documentation only suggests running ./sbin/start-connect-server.sh 
--packages org.apache.spark:spark-connect_2.12:3.4.0, leaving me at a loss.

> 4. Can you provide a full spark submit command

Given the nature of Spark Connect, I don't use the spark-submit command. 
Instead, as per the documentation, I can execute workloads using only a Python 
script. For the Spark Connect Server, I have a Kubernetes manifest executing 
"/opt.spark/sbin/start-connect-server.sh --packages 
org.apache.spark:spark-connect_2.12:3.4.0".

> 5. Make sure that the Python client script connecting to Spark Connect Server 
> specifies the cluster mode explicitly, like using --master or --deploy-mode 
> flags when creating a SparkSession.

The Spark Connect Server operates as a Driver, so it isn't possible to specify 
the --master or --deploy-mode flags in the Python client script. If I try, I 
encounter a RuntimeError.

like this:
RuntimeError: Spark master cannot be configured with Spark Connect server; 
however, found URL for Spark Connect [sc://.../]

> 6. Ensure that you have allocated the necessary resources (CPU, memory etc) 
> to Spark Connect Server when running it on Kubernetes.

Resources are ample, so that shouldn't be the problem.

> 7. Review the environment variables and configurations you have set, 
> including the SPARK_NO_DAEMONIZE=1 variable. Ensure that these variables are 
> not conflicting with 

I'm unsure if SPARK_NO_DAEMONIZE=1 conflicts with cluster mode settings. But 
without it, the process goes to the background when executing 
start-connect-server.sh, causing the Pod to terminate prematurely.

> 8. Are you using the correct spark client version that is fully compatible 
> with your spark on the server?

Yes, I have verified that without using Spark Connect (e.g., using Spark 
Operator), Spark applications run as expected.

> 9. check the kubernetes error logs

The Kubernetes logs don't show any errors, and jobs are running in local mode.

> 10. Insufficient resources can lead to the application running in local mode

I wasn't aware that insufficient resources could lead to local mode execution. 
Thank you for pointing it out.

Best regards,
Yasukazu

2023年9月5日(火) 1:28 Mich Talebzadeh <[email protected]>:

personally I have not used this feature myself. However, some points

What investigation have you done to show that it is running in local mode?
who has configured this kubernetes cluster? Is it supplied by a cloud vendor? 
Confirm that you have configured Spark Connect Server correctly for cluster 
mode. Make sure you specify the cluster manager (e.g., Kubernetes) and other 
relevant Spark configurations in your Spark job submission.
Can you provide a full spark submit command
Make sure that the Python client script connecting to Spark Connect Server 
specifies the cluster mode explicitly, like using --master or --deploy-mode 
flags when creating a SparkSession.
Ensure that you have allocated the necessary resources (CPU, memory etc) to 
Spark Connect Server when running it on Kubernetes.
Review the environment variables and configurations you have set, including the 
SPARK_NO_DAEMONIZE=1 variable. Ensure that these variables are not conflicting 
with cluster mode settings.
Are you using the correct spark client version that is fully compatible with 
your spark on the server? 
check the kubernetes error logs 
Insufficient resources can lead to the application running in local mode
HTH

Mich Talebzadeh,
Distinguished Technologist, Solutions Architect & Engineer
London
United Kingdom

   view my Linkedin profile

 https://en.everybodywiki.com/Mich_Talebzadeh

Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction. 

On Mon, 4 Sept 2023 at 04:57, Nagatomi Yasukazu <[email protected]> wrote:
Hi Cley,

Thank you for taking the time to respond to my query. Your insights on Spark 
cluster deployment are much appreciated.

However, I'd like to clarify that my specific challenge is related to running 
the Spark Connect Server on Kubernetes in Cluster Mode. While I understand the 
general deployment strategies for Spark on Kubernetes, I am seeking guidance 
particularly on the Spark Connect Server aspect.

cf. Spark Connect Overview - Spark 3.4.1 Documentation
    https://spark.apache.org/docs/latest/spark-connect-overview.html

To reiterate, when I connect from an external Python client and execute 
scripts, the server operates in Local Mode instead of the expected Kubernetes 
Cluster Mode (with master as k8s://... and deploy-mode set to cluster).

If I've misunderstood your initial response and it was indeed related to Spark 
Connect, I sincerely apologize for the oversight. In that case, could you 
please expand a bit on the Spark Connect-specific aspects?

Do you, or anyone else in the community, have experience with this specific 
setup or encountered a similar issue with Spark Connect Server on Kubernetes? 
Any targeted advice or guidance would be invaluable.

Thank you again for your time and help.

Best regards,
Yasukazu

2023年9月4日(月) 0:23 Cleyson Barros <[email protected]>:
Hi Nagatomi,
Use Apache imagers, then run your master node, then start your many slavers. 
You can add a command line in the docker files to call for the master using the 
docker container names in your service composition if you wish to run 2 masters 
active and standby follow the instructions in the Apache docs to do this 
configuration, the recipe is the same except when you start the masters and how 
you expect the behaviour of your cluster. 
I hope it helps. 
Have a nice day :) 
Cley

Nagatomi Yasukazu <[email protected]> escreveu no dia sábado, 2/09/2023 à(s) 
15:37:
Hello Apache Spark community,

I'm currently trying to run Spark Connect Server on Kubernetes in Cluster Mode 
and facing some challenges. Any guidance or hints would be greatly appreciated.

## Environment:
Apache Spark version: 3.4.1
Kubernetes version:  1.23
Command executed: 
 /opt/spark/sbin/start-connect-server.sh \
   --packages 
org.apache.spark:spark-connect_2.13:3.4.1,org.apache.iceberg:iceberg-spark-runtime-3.4_2.13:1.3.1...
Note that I'm running it with the environment variable SPARK_NO_DAEMONIZE=1.

## Issue:
When I connect from an external Python client and run scripts, it operates in 
Local Mode instead of the expected Cluster Mode.

## Expected Behavior:
When connecting from a Python client to the Spark Connect Server, I expect it 
to run in Cluster Mode.

If anyone has any insights, advice, or has faced a similar issue, I'd be 
grateful for your feedback.
Thank you in advance.

Re: Re: Running Spark Connect Server in Cluster Mode on Kubernetes

Reply via email to