Yay!
On Sun, 7 Sept 2025 at 13:54, Dongjoon Hyun wrote:
> We are happy to announce the availability of Apache Spark 4.0.1!
>
> Spark 4.0.1 is the first maintenance release based on the branch-4.0
> maintenance branch of Spark. It contains many fixes including security and
> corr
We are happy to announce the availability of Apache Spark 4.0.1!
Spark 4.0.1 is the first maintenance release based on the branch-4.0
maintenance branch of Spark. It contains many fixes including security and
correctness domains. We strongly recommend all 4.0 users to upgrade to this
stable
Hello! This is my first time using a mailing list like this, apologies if
I’ve missed something to conform to standards for it.
I’m using the Java API to interact with a data source that’s column-based,
and expensive to request entire rows from. However, using the interface
that my Table needs to
Hi everyone,
I hope this message finds you well.
We have several use cases involving Spark Structured Streaming that would
benefit from auto-scaling. We understand that Dynamic Resource Allocation
does not work optimally with Spark Structured Streaming, so we are
exploring alternative solutions
>> option("path", "s3://bucketname")
Shouldn’t the schema prefix be s3a instead of s3?
Information Classification: General
From: 刘唯
Sent: Tuesday, August 5, 2025 5:34 PM
To: Kleckner, Jade
Cc: user@spark.apache.org
Subject: Re: [PySpark] [Beginner] [Debug] Doe
This is not necessarily about the readStream / read API. As long as you
correctly imported the needed dependencies and set up spark config, you
should be able to readStream from s3 path.
See
https://stackoverflow.com/questions/46740670/no-filesystem-for-scheme-s3-with-pyspark
Kleckner, Jade 于
Hello all,
I'm developing a pipeline to possibly read a stream from a MinIO bucket. I
have no issues setting Hadoop s3a variables and reading files but when I try to
create a bucket for Spark to use as a readStream location it produces the
following errors:
Example code: i
I am trying to convert an application to Spark and need to find out all
the serialization issues. Unfortunately, the SerializationDebugger
appears to no longer work with Java 21 and presumably also Java 17. The
problem is reflective access to sun.security.action.GetBooleanAction,
which is
Hi Folks,
Manas here from Data Platform Team of CRED <https://cred.club/>
We have been running spark connect 4.0.0 in production and facing the
following issue!
A gRPC connection failure occurs when executing a PySpark DataFrame action
that involves a complex, dynamically generated
Hi Spark team,
We encountered a `NoSuchMethodError` when running a PySpark application
with Spark 4.0.0 and kafka-clients-4.0.0:
java.lang.NoSuchMethodError:
org.apache.kafka.clients.admin.DescribeTopicsResult.all()
This appears to be due to Spark’s Kafka integration module still calling
Dear Spark Community,
Why Python Data Source API (pyspark.sql.datasource.Datasource) is not
using "spark.sql.execution.pyspark.python" config, but UDF do?
Datasource
1) executor always looks for "python3" ignoring
"spark.sql.execution.pyspark.python" config
2) so pr
Hello. In Spark 4, loading a dataframe from a path that contains a wildcard
produces a warning and a stack trace that doesn't happen in Spark 3.
>>> spark.read.load('s3a://ullswater-dev/uw01/temp/test_parquet/*.parquet'
Hi Team,
I am trying to get the property "spark.api.mode" in pyspark console. But it is
not working.
I have installed pyspark, pyspark-connect and other dependencies, set up spark
4.0.0, started pyspark session in command line. But it is not working and
stating this property is not
Sounds super interesting ...
El jue, 17 jul 2025, 14:17, Hitesh Vaghela escribió:
> Hi Spark community! I’ve posted a detailed question on Stack Overflow
> regarding a persistent issue where my Spark job remains in an “Active”
> state even after successful dataset processing. No error
Hi Spark community! I’ve posted a detailed question on Stack Overflow
regarding a persistent issue where my Spark job remains in an “Active”
state even after successful dataset processing. No errors in logs, and
attempts to kill the job fail. I’d love your insights on root causes and
how to
Dear Spark Community,
I’m currently managing a data platform that uses Trino with Hive Metastore
integration. Our Hive Metastore contains a mix of legacy Hive tables and
views, alongside newer views created via Trino.
As expected, Trino stores views in the metastore with viewOriginalText
Hi Nimrod,
i am also interested in your first point, what exactly doesn "false alarm" mean.
Today had following scenario, which in my opinion is a false alarm.
Following example:
- Topic contains 'N' Messages
- Spark Streaming application consumed all 'N' messages
Hi Nimrod,
i am also interested in your first point, what exactly doesn "false alarm"
mean.
Today had following scenario, which in my opinion is a false alarm.
Following example:
- Topic contains 'N' Messages
- Spark Streaming application consumed all 'N' messages
red-streaming-kafka-integration.html
>> ):
>>
>> "latest" for streaming, "earliest" for batch
>>
>>
>> On Thu, 10 Jul 2025, 11:04 Nimrod Ofek, wrote:
>>
>>> Hi everyone,
>>>
>>> I'm currently working wit
atest/streaming/structured-streaming-kafka-integration.html
> ):
>
> "latest" for streaming, "earliest" for batch
>
>
> On Thu, 10 Jul 2025, 11:04 Nimrod Ofek, wrote:
>
>> Hi everyone,
>>
>> I'm currently working with Spark Structured Streaming
://spark.apache.org/docs/latest/streaming/structured-streaming-kafka-integration.html
):
"latest" for streaming, "earliest" for batch
On Thu, 10 Jul 2025, 11:04 Nimrod Ofek, wrote:
> Hi everyone,
>
> I'm currently working with Spark Structured Streaming integrated w
Hi everyone,
I'm currently working with Spark Structured Streaming integrated with Kafka
and had some questions regarding the failOnDataLoss option.
The current documentation states:
*"Whether to fail the query when it's possible that data is lost (e.g.,
topics are deleted, or
Hi, Spark friends
This is Yifan. I am a software developer from Workday. I am not very familiar
with Spark and I have a question about the Tag in TreeNode. We have a use case
where we will add some information to Tag and we hope the tag will be persisted
in Spark. But I noticed that the tag is
Hi
I am new in Apache Spark and I created a spark job that reads the data from
a Mysql database and does some processing on it and then commits it to
another table.
The odd thing I faced was that Spark reads all the data from the table when
I use
`sparkSession.read.jdbc` and `sparkDf.rdd.map
Hi All.
We are happy to announce the availability of Apache Spark Kubernetes
Operator 0.4.0!
- Website
* https://s.apache.org/spark-kubernetes-operator/
- Artifact Hub
*
https://artifacthub.io/packages/helm/spark-kubernetes-operator/spark-kubernetes-operator/
- Release Note
* https
Hello,
We have published a follow-up blog that compares the latest versions: 1)
Trino 476, 2) Spark 4.0.0, 3) Hive 4 on MR3 2.1. At the end, we discuss MPP
and MapReduce.
https://mr3docs.datamonad.com/blog/2025-07-02-performance-evaluation-2.1
--- Sungwoo
On Tue, Apr 22, 2025 at 7:08 PM
Hi,
Starting from Spark 4.0.0, we support multiple stateful operators in append
mode. You can perform the chain of stream-stream joins.
One thing you need to care about is, the output of stream-stream join will
have two different event time columns, which is ambiguous w.r.t. which
column has to
Dear [Team / Support / Apache Spark Community],
I hope this message finds you well.
I'm reaching out to inquire about the support for *user impersonation* in
the *Spark Thrift Server* across different versions of Apache Spark,
specifically from *Spark 1.x through Spark 4.x*.
We are curr
Hi,
Given two Spark-Structured streams and using them as
https://spark.apache.org/docs/3.5.6/structured-streaming-programming-guide.html#inner-joins-with-optional-watermarking,
just works.
Now if I want to join three streams using the same technique, Spark
complains about multiple possible
hello
The spark on streaming task is running for a long time and has the
need to dynamically adjust the log level (monitorInterval=30). It is more
convenient to modify the log4j configuration through configmap.
Dear community,
I am working on a popular open-source connector that provides a custom Data
Source V2 Strategy which is providing a useful planning extension to Spark, yet
I can't seem to reconcile the API updates in spark 4 in relation to adding
extensions.
We add a custom planner str
Hi All.
We are happy to announce the availability of Apache Spark Kubernetes
Operator 0.3.0!
- Notable Changes
* Built and tested with Apache Spark 4.0 and Spark Connect Swift Client
* Running on Java 24
* Promoting CRDs to v1beta1 from v1alpha1
- Website
* https://s.apache.org/spark
Hi All.
We are happy to announce the availability of Apache Spark Connect Swift
Client 0.3.0!
This is the first release tested with the official Apache Spark 4.0.0.
Website
- https://apache.github.io/spark-connect-swift/
Release Note
- https://github.com/apache/spark-connect-swift/releases
Dear Apache Spark Community/Development Team,
I was wondering whether you had a chance to take a look at my previous
email. I would appreciate any and all information which you could provide
on the aforementioned points.
I hope all is well on your end and do thank you for your time
and
Dear Apache Spark Community/Development Team,
I hope this message finds you well.
I am writing to inquire about the roadmap and future plans for extending
Spark ML support through Spark Connect to the Scala API in a manner
analogous to SPARK-50812. Specifically, my team is very interested in
Hi All,
We are happy to announce the availability of *Apache Spark 4.0.0*!
Apache Spark 4.0.0 is the first release of the 4.x line. This release
resolves more than 5100 tickets with contributions from more than 390
individuals.
To download Spark 4.0.0, head over to the download page:
https
tting
> generated successfully but the debug log is showing the unexpected
> response. I tried from managed identity using python to access the storage
> account. It is able to access the storage account without any issue but
> from spark i am getting the following error.
>
> full log gist:
account. It is able to access the storage account without any issue but
from spark i am getting the following error.
full log gist: Full Log
<https://gist.github.com/akramshaik541/e231d578403f795adff5e6ecd493d445>
Spark version using 3.5.5
Hadoop-azure 3.4.1
Hadoop-common 3.4.1
25/05/21 18
Hi All.
We are happy to announce the availability of Apache Spark Kubernetes
Operator 0.2.0!
- Website
* https://s.apache.org/spark-kubernetes-operator/
- Artifact Hub
*
https://artifacthub.io/packages/helm/spark-kubernetes-operator/spark-kubernetes-operator/
- Release Note
* https
Hi All.
We are happy to announce the availability of Apache Spark Connect Swift
Client 0.2.0!
Website
- https://apache.github.io/spark-connect-swift/
Release Note
- https://github.com/apache/spark-connect-swift/releases/tag/0.2.0
- https://s.apache.org/spark-connect-swift-0.2.0
Swift
To answer the question on the configuration of Spark 4.0.0-RC2, this
is spark-defaults.conf used in the benchmark. Any suggestion on adding or
changing configuration values will be appreciated.
spark.driver.cores=36
spark.driver.maxResultSize=0
spark.driver.memory=196g
Dear Spark users and developers,
As you know, the Apache Software Foundation takes our users' security
seriously, and defines sensible release and security processes to make sure
potential security issues are dealt with responsibly. These indirectly also
protect our committers, shie
I had not checked the release.
The release notes mention that Apache Spark 4.0 is supported - which has
not yet been released.
While I don’t expect drastic changes - and most likely the support which
will continue to work, the messaging is not accurate
- Mridul
On Wed, May 7, 2025 at 8:54 PM
Hi All.
We are happy to announce the availability of Apache Spark Connect Swift
Client 0.1.0!
Release Note
- https://github.com/apache/spark-connect-swift/releases/tag/v0.1.0
- https://s.apache.org/spark-connect-swift-0.1.0
Swift Package Index
- https://swiftpackageindex.com/apache/spark
Hi All.
We are happy to announce the availability of Apache Spark Kubernetes
Operator 0.1.0!
- Release Note:
* https://github.com/apache/spark-kubernetes-operator/releases/tag/v0.1.0
* https://s.apache.org/spark-kubernetes-operator-0.1.0
- Published Docker Image:
* apache/spark-kubernetes
|UNKNOWN
|
|Created By |Spark 3.5.3
|
|Type|EXTERNAL
|
|Provider
|db1
|
|Table |table1
|
|Owner |root
|
|Created Time|Tue Apr 15 15:30:00 UTC 2025
|
|Last Access |UNKNOWN
|
|Created By
Hello,
I am trying to deploy a Spark streaming application using the Spark
Kubernetes Operator, but the application crashes after a while.
After describing CRD using *kubectl -n my-namespace describe
sparkapplication my-app,* I see the following -
Qos Class: Guaranteed
Hello,
We published a blog that reports the performance evaluation of Trino 468,
Spark 4.0.0-RC2, and Hive 4 on Tez/MR3 2.0 using the TPC-DS Benchmark, 10TB
scale factor. Hope you find it useful.
https://mr3docs.datamonad.com/blog/2025-04-18-performance-evaluation-2.0
--- Sungwoo
@Ángel Álvarez Pascua
Thanks, however I am thinking of some other solution which does not involve
saving the dataframe result. Will update this thread with details soon.
@daniel williams
Thanks, I will surely check spark-testing-base out.
Regards,
Abhishek Singla
On Thu, Apr 17, 2025 at 11
I have not. Most of my work and development on Spark has been on the scala
side of the house and I've built a suite of tools for Kafka integration
with Spark for stream analytics along with spark-testing-base
<https://github.com/holdenk/spark-testing-base>
On Thu, Apr 17, 2025 at 12:
Have you used the new equality functions introduced in Spark 3.5?
https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.testing.assertDataFrameEqual.html
El jue, 17 abr 2025, 13:18, daniel williams
escribió:
> Good call out. Yeah, once you take your work out of Spark it’s
Good call out. Yeah, once you take your work out of Spark it’s all on you.
Any level partitions operations (e.g. map, flat map, foreach) ends up as a
lambda in catalyst. I’ve found, however, not using explode and doing things
procedurally at this point with a sufficient amount of unit testing
Just a quick note on working at the RDD level in Spark — once you go down
to that level, it’s entirely up to you to handle everything. You gain more
control and flexibility, but Spark steps back and hands you the steering
wheel. If tasks fail, it's usually because you're allowing them t
failures. I wanted to know if there is an
existing way in spark batch to checkpoint already processed rows of a
partition if using foreachPartition or mapParitions, so that they are not
processed again on rescheduling of task due to failure or retriggering of
job due to failures.
Regards,
Abhish
o with task/job failures. I wanted to know if there is an
> existing way in spark batch to checkpoint already processed rows of a
> partition if using foreachPartition or mapParitions, so that they are not
> processed again on rescheduling of task due to failure or retriggering of
> job due to f
cribió:
> Hi Team,
>
> We are using foreachPartition to send dataset row data to third system via
> HTTP client. The operation is not idempotent. I wanna ensure that in case
> of failures the previously processed dataset should not get processed
> again.
>
> Is there a way to
client. The operation is not idempotent. I wanna ensure that in case
> of failures the previously processed dataset should not get processed
> again.
>
> Is there a way to checkpoint in Spark batch
> 1. checkpoint processed partitions so that if there are 1000 partitions
> and 100 were p
Hi Team,
We are using foreachPartition to send dataset row data to third system via
HTTP client. The operation is not idempotent. I wanna ensure that in case
of failures the previously processed dataset should not get processed
again.
Is there a way to checkpoint in Spark batch
1. checkpoint
Tim,
Yes, you can use Java for your Spark workloads just fine.
Cheers
Jules
Excuse the thumb typos
On Fri, 04 Apr 2025 at 12:53 AM, tim wade wrote:
> Hello
>
> I am just newbie to spark. I am programming with Java mainly, knowing
> scala very bit.
>
> Can I just write code
I have a spark streaming dataset that is a union of 12 datasets (for 12
different s3 buckets). On start up , it takes nearly 18/20 mins for the Spark
Streaming Job to show up on the Spark Streaming UI and an additional 18-20 mins
for the job to even start. When looking at the logs I see
Hi Tim,
We have a large ETL project comprising about forty individual Apache Spark
applications, all built exclusively in Java.
They are executed on three different Spark clusters built on AWS EC2 instances.
The applications are built in Java 17 for Spark 3.5.x.
Cheers,
Steve C
> On 4
One issue I've seen is that after about 24 hours, the sparkapplication job
pods seem to be getting evicted .. i've installed spark history server,
and am verifying the case.
It could be due to resource constraints, checking this.
Pls note : kubeflow spark operator is installed in
Thanks, Megh !
I did some research and realized the same - PVC is not a good option for
spark shuffle, primarily for latency issues.
The same is the case with S3 or MinIO.
I've implemented option 2, and am testing this out currently: Storing data
in host path is possible
regds,
Karan
t
> me know.
>
> thanks!
>
>
> On Mon, Mar 31, 2025 at 1:58 PM karan alang wrote:
>
>> hello all - checking to see if anyone has any input on this
>>
>> thanks!
>>
>>
>> On Tue, Mar 25, 2025 at 12:22 PM karan alang
>> wrote:
>
Hello,
I'm trying to run a simple Python client against a spark connect server
running in Kubernetes as a proof-of-concept. The client writes a couple
of records to a local Iceberg table. The Iceberg runtime is provisioned
using "--packages" argument to the "start-connect-
Java is very much supported in Spark. In our open source project, we
haven’t done spark connect yet but we do a lot of transformations, ML and
graph stuff using Java with Spark. Never faced the language barrier.
Cheers,
Sonal
https://github.com/zinggAI/zingg
On Sat, 5 Apr 2025 at 4:42 PM
I think you have more limitations using Spark Connect than Spark from Java.
I used RDD, registered UDFs, ... from Java without any problems.
El sáb, 5 abr 2025, 9:50, tim wade escribió:
> Hello
>
> I only know Java programming. If I use Java to communicate with the
> Spark API and
Hello
I only know Java programming. If I use Java to communicate with the
Spark API and submit tasks to Spark API from Java, I'm not sure what
disadvantages this might have. I see other people writing tasks in
Scala, then compiling them and submitting to Spark using spark-submit.
T
I think I did that some years ago in Spark 2.4 on a Hortonworks cluster
with SSL and Kerberos enabled. It worked, but never went into production.
El vie, 4 abr 2025, 9:54, tim wade escribió:
> Hello
>
> I am just newbie to spark. I am programming with Java mainly, knowing
> sc
Hey Tim!
What are you aiming to achieve exactly?
Regards,
Jevon C
> On Apr 4, 2025, at 3:54 AM, tim wade wrote:
>
> Hello
>
> I am just newbie to spark. I am programming with Java mainly, knowing scala
> very bit.
>
> Can I just write code with java to talk
Hello folks,
My colleague has posted this issue on Github:
https://github.com/kubeflow/spark-operator/issues/2491
I'm wondering whether anyone here is using the kubeflow, Spark-Operator and
could provide any insight into what's happening here. I know he's been
stumped for a
Hi Spark Dev Team,
I believe I've encountered a potential bug in spark 3.5.1 concerning the
UNIX_SECONDS function when used with TO_UTC_TIMESTAMP.
When converting a timestamp from a specific timezone (e.g.,
'Europe/Amsterdam') to UTC and then getting its Unix seconds, the result
Hello
I am just newbie to spark. I am programming with Java mainly, knowing
scala very bit.
Can I just write code with java to talk to Spark's java API for
submitting jobs? (the main job is a stru-streaming job).
T
yes apache celeborn may be useful. You need to do some research though.
https://celeborn.apache.org/
Have a look at this link as well Spark Executor Shuffle Storage Options
<https://iomete.com/resources/k8s/spark-executor-shuffle-storage-options>HTHDr
Mich Talebzadeh,
Architect | Data S
wrote:
>
>> hello All,
>>
>> I have kubeflow Spark Operator installed on k8s and from what i
>> understand - Spark Shuffle is not officially supported on kubernetes.
>>
>> Looking for feedback from the community on what approach is being taken
>> t
hello all - checking to see if anyone has any input on this
thanks!
On Tue, Mar 25, 2025 at 12:22 PM karan alang wrote:
> hello All,
>
> I have kubeflow Spark Operator installed on k8s and from what i understand
> - Spark Shuffle is not officially supported on kubernetes.
>
Howdy All,
The Spark 3.3 documentation states that it is Java 8/11/17 compatible, but
I'm having a hard time finding an existing code base that is using JDK 17
for the userland compilation. Even the Spark 3.3 branch doesn't appear to
compile/test with JDK 17 in the github actions for
Dear Apache Foundation Team,
I hope this email finds you well. My name is Juan, and I am a co-organizer of
two Apache Spark user groups: Apache Spark
Bogotá<https://www.meetup.com/es/Apache-Spark-Bogota> and Apache Spark
Mexico<https://www.meetup.com/es/apache-spark-mexicocity
Hello Team,
I was working with Spark 3.2 and Hadoop 2.7.6 and writing to MinIO object
storage . It was slower when compared to writing to MapR FS with the above
tech stack. Then moved on to a later upgraded version of Spark 3.5.2 and
Hadoop 4.3.1 which started writing to MinIO with V2
hello All,
I have kubeflow Spark Operator installed on k8s and from what i understand
- Spark Shuffle is not officially supported on kubernetes.
Looking for feedback from the community on what approach is being taken to
handle this issue - especially since dynamicAllocation cannot be
enabled
Just one more variable is Spark 3.5.2 runs on kubernetes and Spark 3.2.0 runs on YARN . It seems kubernetes can be a cause of slowness too .Sent from my iPhoneOn Mar 24, 2025, at 7:10 PM, Prem Gmail wrote:Hello Spark Dev/users,Any one has any clue why and how a better version have performance
Hello Spark Dev/users,Any one has any clue why and how a better version have performance issue .I will be happy to raise JIRA .Sent from my iPhoneOn Mar 24, 2025, at 4:20 PM, Prem Sahoo wrote:The problem is on the writer's side. It takes longer to write to Minio with Spark 3.5.2 and Hadoop
The problem is on the writer's side. It takes longer to write to Minio with
Spark 3.5.2 and Hadoop 3.4.1 . so it seems there are some tech changes
between hadoop 2.7.6 to 3.4.1 which made the write process faster.
On Sun, Mar 23, 2025 at 12:09 AM Ángel Álvarez Pascua <
angel.alv
@Prem Sahoo , could you test both versions of
Spark+Hadoop by replacing your "write to MinIO" statement with
write.format("noop")? This would help us determine whether the issue lies
on the reader side or the writer side.
El dom, 23 mar 2025 a las 4:53, Prem Gmail ()
escri
V2 writer in 3.5.2 and Hadoop 3.4.1 should be much faster than Spark 3.2.0 and Hadoop 2.7.6 but that’s not the case , tried magic committer option which is agin more slow . So internally something changed which made this slow . May I know ?Sent from my iPhoneOn Mar 22, 2025, at 11:05 PM
Seems like the Jackson version hasn't changed since Spark 1.4 (pom.xml
<https://github.com/apache/spark/blob/branch-1.4/pom.xml>). Even Spark 4 is
still using this super old (2013) version. Maybe it's time ...
El mar, 18 mar 2025 a las 16:05, Mohammad, Ejas Ali
() escribió:
>
Hi Spark Community,
I hope you are doing well.
We have identified high and critical CVEs related to the jackson-mapper-asl
package used in Apache Spark 3.5.5. We would like to understand if there are
any official fixes or recommended mitigation steps available for these
vulnerabilities.
| CVE
Hi Team,
Can you please help with a date when the community plans to release a
stable PROD ready version for spark-kubernetes-operator
<https://github.com/apache/spark-kubernetes-operator> ?
Does Spark recommend using the kubeflow/spark-operator
<https://github.com/kubeflow/spark
Two things come to mind, low hanging fruits - update to Spark 3.5 that
should reduce the CVEs. Alternatively consider using Spark connect - where
you can address the client side vulnerabilities yourself.
Best Regards
Soumasish Goswami
in: www.linkedin.com/in/soumasish
# (415) 530-0405
-
On
Hi Spark Community,
I am using the official Docker image `apache/spark-py:v3.4.0` and installing
`pyspark==3.4.0` on top of it. However, I have encountered multiple security
vulnerabilities related to outdated dependencies in the base image.
Issues:
1. Security Concerns:
- Prisma scan
Hi spark users,
A few years back I created a java implementation of the hnsw algorithm in
my spare time. Hnsw is an algorithm to do k-nearest neighbour search. Or as
as people tend to refer to it now: vector search
It can can be used to implement things like recommendation systems, image
search
Hello everyone,
I noticed that a recent PR appears to disable the start of Spark Connect
when the deployment mode is set to "cluster".
PR: [SPARK-42371][CONNECT] Add scripts to start and stop Spark Connect
server by HyukjinKwon · Pull Request #39928 · apache/spark · GitHub
https://
We are happy to announce the availability of Apache Spark 3.5.5!
Spark 3.5.5 is the fifth maintenance release based on the
branch-3.5 maintenance branch of Spark. It contains many fixes
including security and correctness domains. We strongly
recommend all 3.5 users to upgrade to this stable
> Thanks Mich
>>>
>>> > created on driver memory
>>>
>>> That I hadn't anticipated. Are you sure?
>>> I understood that caching a table pegged the RDD partitions into the
>>> memory of the executors holding the partition.
>>>
>&
emory
>>
>> That I hadn't anticipated. Are you sure?
>> I understood that caching a table pegged the RDD partitions into the
>> memory of the executors holding the partition.
>>
>>
>>
>>
>> On Sun, Feb 16, 2025 at 11:17 AM Mich Talebzadeh <
tition.
>
>
>
>
> On Sun, Feb 16, 2025 at 11:17 AM Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> yep. created on driver memory. watch for OOM if the size becomes too large
>>
>> spark-submit --driver-memory 8G ...
>>
>> HTH
>
mory. watch for OOM if the size becomes too large
>
> spark-submit --driver-memory 8G ...
>
> HTH
>
> Dr Mich Talebzadeh,
> Architect | Data Science | Financial Crime | Forensic Analysis | GDPR
>
>view my Linkedin profile
> <https://www.linkedin.com/in/mich-tal
yep. created on driver memory. watch for OOM if the size becomes too large
spark-submit --driver-memory 8G ...
HTH
Dr Mich Talebzadeh,
Architect | Data Science | Financial Crime | Forensic Analysis | GDPR
view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-520
occurrence_svampe");
On Sun, Feb 16, 2025 at 10:05 AM Tim Robertson
wrote:
> Hi folks
>
> Is it possible to cache a table for shared use across sessions with spark
> connect?
> I'd like to load a read only table once that many sessions will then
> query to improve per
1 - 100 of 3745 matches
Mail list logo