One thing to be pointed out is that you never bundle the Spark Client with your 
code. You compile against a Spark version. You bundle your code (without Spark 
jars) in an uber jar and deploy the Uber jar into Spark. Spark is already 
bundled with the jars that are required to send jobs to scheduler. At runtime, 
your code will be using the jars bundled in the instance of Spark that your 
application is running in

Spark is backward compatible; ie; a jar, compiled against 3.1.x , will run in a 
Spark 3.2.0 cluster
Like Sean mentioned, Spark is not guaranteed to be forward compatible; ie; a 
jar, compiled against 3.2.1, may not run in a Spark 2.4.0 cluster. It might 
work if the functions called from your code are available in 2.4.0. But, it 
will fail if you are calling API that was introduced after 2.4.0.

So, the question of “Can I use an older version of the client to submit jobs to 
a newer version of Spark” is moot. You never do that.

From: Amin Borjian <borjianami...@outlook.com>
Date: Wednesday, November 24, 2021 at 2:44 PM
To: Sean Owen <sro...@gmail.com>
Cc: "user@spark.apache.org" <user@spark.apache.org>
Subject: RE: [EXTERNAL] [Spark] Does Spark support backward and forward 
compatibility?


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


Thanks again for reply.

Personally, I think the whole cluster should have a single version. What 
mattered most to me was how important the client version that sends the jobs to 
scheduler, that we should hope everything work well in small version changes! 
(In version changes less than major)

From: Sean Owen<mailto:sro...@gmail.com>
Sent: Wednesday, November 24, 2021 10:48 PM
To: Amin Borjian<mailto:borjianami...@outlook.com>
Cc: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: [Spark] Does Spark support backward and forward compatibility?

I think/hope that it goes without saying you can't mix Spark versions within a 
cluster.
Forwards compatibility is something you don't generally expect as a default 
from any piece of software, so not sure there is something to document 
explicitly.
Backwards compatibility is important, and this is documented extensively where 
it doesn't hold in the Spark docs and release notes.


On Wed, Nov 24, 2021 at 1:16 PM Amin Borjian 
<borjianami...@outlook.com<mailto:borjianami...@outlook.com>> wrote:
Thank you very much for the reply you sent. It would be great if these items 
were mentioned in the Spark document (for example, the download page or 
something else)

If I understand correctly, it means that we can compile the client (for example 
Java, etc.) with a newer version (for example 3.2.0) within the range of a 
major version against older server (for example 3.1.x) and do not see any 
problem in most cases. Am I right? (Because the issue of backward-compatibility 
can be expressed from both the server and the client view, I repeated the 
sentence to make sure I got it right.)

But what happened if we update server to 3.2.x and our client was in version 
3.1.x? Does it client can work with newer cluster version because it uses just 
old feature of severs? (Maybe you mean this and in fact my previous sentence 
was wrong and I misunderstood)

From: Sean Owen<mailto:sro...@gmail.com>
Sent: Wednesday, November 24, 2021 5:38 PM
To: Amin Borjian<mailto:borjianami...@outlook.com>
Cc: user@spark.apache.org<mailto:user@spark.apache.org>
Subject: Re: [Spark] Does Spark support backward and forward compatibility?

Can you mix different Spark versions on driver and executor? no.
Can you compile against a different version of Spark than you run on? That 
typically works within a major release, though forwards compatibility may not 
work (you can't use a feature that doesn't exist in the version on the 
cluster). Compiling vs 3.2.0 and running on 3.1.x for example should work fine 
in 99% of cases.

On Wed, Nov 24, 2021 at 8:04 AM Amin Borjian 
<borjianami...@outlook.com<mailto:borjianami...@outlook.com>> wrote:

I have a simple question about using Spark, which although most tools usually 
explain this question explicitly (in important text, such as a specific format 
or a separate page), I did not find it anywhere. Maybe my search was not 
enough, but I thought it was good that I ask this question in the hope that 
maybe the answer will benefit other people as well.

Spark binary is usually downloaded from the following link and installed and 
configured on the cluster: Download Apache 
Spark<https://spark.apache.org/downloads.html>

If, for example, we use the Java language for programming (although it can be 
other supported languages), we need the following dependencies to communicate 
with Spark:

<dependency>

    <groupId>org.apache.spark</groupId>

    <artifactId>spark-core_2.12</artifactId>

    <version>3.2.0</version>

</dependency>

<dependency>

    <groupId>org.apache.spark</groupId>

    <artifactId>spark-sql_2.12</artifactId>

    <version>3.2.0</version>

</dependency>

As is clear, both the Spark cluster (binary of Spark) and the dependencies used 
on the application side have a specific version. In my opinion, it is obvious 
that if the version used is the same on both the application side and the 
server side, everything will most likely work in its ideal state without any 
problems.

But the question is, what if the two versions are not the same? Is it possible 
to have compatibility between the server and the application in specific number 
of conditions (such as not changing major version)? Or, for example, if the 
client is always ahead, is it not a problem? Or if the server is always ahead, 
is it not a problem?

The argument is that there may be a library that I did not write and it is an 
old version, but I want to update my cluster (server version). Or it may not be 
possible for me to update the server version and all the applications version 
at the same time, so I want to update each one separately. As a result, the 
application-server version differs in a period of time. (maybe short or long 
period) I want to know exactly how Spark works in this situation.


Reply via email to