Hi Spark Team,
I am using Spark 3.4.0 version in my application which is use to consume
messages from Kafka topics.
I have below queries:
1. Does DStream support pause/resume streaming message consumption at runtime
on particular condition? If yes, please provide details.
2. I tried to revoke
Congratulations!
At 2023-12-01 01:23:55, "Dongjoon Hyun" wrote:
We are happy to announce the availability of Apache Spark 3.4.2!
Spark 3.4.2 is a maintenance release containing many fixes including
security and correctness domains. This release is based on the
branch-3.4 m
We are happy to announce the availability of Apache Spark 3.4.2!
Spark 3.4.2 is a maintenance release containing many fixes including
security and correctness domains. This release is based on the
branch-3.4 maintenance branch of Spark. We strongly
recommend all 3.4 users to upgrade
Hi,
I am seeking advice on measuring the performance of each QueryStage (QS) when
AQE is enabled in Spark SQL. Specifically, I need help to automatically map a
QS to its corresponding jobs (or stages) to get the QS runtime metrics.
I recorded the QS structure via a customized injected Query
Team,
Do we have any updates when spark 4.x version will release in order to
address below issues related to > java.lang.NoClassDefFoundError:
javax/servlet/Servlet
Thanks and Regards,
Guru
On 2023/10/05 17:19:51 Angshuman Bhattacharya wrote:
> Thanks Ahmed. I am trying to bring t
Finkelshteyn
Developer Advocate for Data Engineering
JetBrains
asm0...@jetbrains.com
https://linktr.ee/asm0dey
Find out more <https://jetbrains.com>
On Tue, 28 Nov 2023 at 17:04, Faiz Halde wrote:
> Hey Pasha,
>
> Is your suggestion towards the spark team? I can make use of
Hey Pasha,
Is your suggestion towards the spark team? I can make use of the plugin
system on the driver side of spark but considering spark is distributed,
the executor side of spark needs to adapt to the pf4j framework I believe
too
Thanks
Faiz
On Tue, Nov 28, 2023, 16:57 Pasha Finkelshtein
there, so even if it does work it’s something you’d need to pay
> attention to on upgrades. Class path isolation is tricky to get right.
>
> On Mon, Nov 27, 2023 at 2:58 PM Faiz Halde wrote:
>
>> Hello,
>>
>> We are using spark 3.5.0 and were wondering if the following is
&
Thanks Holden,
So you're saying even Spark connect is not going to provide that guarantee?
The code referred to above is taken up from Spark connect implementation
Could you explain which parts are tricky to get right? Just to be well
prepared of the consequences
On Tue, Nov 28, 2023, 01:30
So I don’t think we make any particular guarantees around class path
isolation there, so even if it does work it’s something you’d need to pay
attention to on upgrades. Class path isolation is tricky to get right.
On Mon, Nov 27, 2023 at 2:58 PM Faiz Halde wrote:
> Hello,
>
> We are us
Hello,
We are using spark 3.5.0 and were wondering if the following is achievable
using spark-core
Our use case involves spinning up a spark cluster where the driver
application loads user jars containing spark transformations at runtime. A
single spark application can load multiple user jars
The feature was added in Spark 3.0. Btw, you may want to check out the EOL
date for Apache Spark releases - https://endoflife.date/apache-spark 2.x is
already EOLed.
On Fri, Nov 24, 2023 at 11:13 PM mallesh j
wrote:
> Hi Team,
>
> I am trying to test the performance of a spark
Hi, all
The ANALYZE TABLE command run from Spark on a Hive table.
Question:
Before I run ANALYZE TABLE' Command on Spark-sql client, I ran 'ANALYZE
TABLE' Command on Hive client, the wrong Statistic Info show up.
For example
1. run the analyze table command o hive client
- create table
Hi,
How are you submitting your spark job from your client?
Your files can either be on HDFS or HCFS such as gs, s3 etc.
With reference to --py-files hdfs://yarn-master-url hdfs://foo.py', I
assume you want your
spark-submit --verbose \
--deploy-mode cluster
Perhaps you also need to upgrade Scala?
Clay Stevens
From: Hanyu Huang
Sent: Wednesday, 15 November, 2023 1:15 AM
To: user@spark.apache.org
Subject: The job failed when we upgraded from spark 3.3.1 to spark3.4.1
Caution, this email may be from a sender outside Wolters Kluwer. Verify
I am not 100% sure but I do not think this works - the driver would need access to HDFS.What you could try (have not tested it though in your scenario):- use SparkConnect: https://spark.apache.org/docs/latest/spark-connect-overview.html- host the zip file on a https server and use that url (I
Hi Eugene,
As the logs indicate, when executing spark-submit, Spark will package and
upload spark/conf to HDFS, along with uploading spark/jars. These files are
uploaded to HDFS unless you specify uploading them to another OSS. To do so,
you'll need to modify the configuration in hdfs
ioning properly.
> It seems that the issue might be due to insufficient disk space.
>
> --
> eabour
>
>
> *From:* Eugene Miretsky
> *Date:* 2023-11-16 05:31
> *To:* user
> *Subject:* Spark-submit without access to HDFS
> Hey All,
>
to insufficient disk space.
eabour
From: Eugene Miretsky
Date: 2023-11-16 05:31
To: user
Subject: Spark-submit without access to HDFS
Hey All,
We are running Pyspark spark-submit from a client outside the cluster. The
client has network connectivity only to the Yarn Master, not the HDFS
Hey All,
We are running Pyspark spark-submit from a client outside the cluster. The
client has network connectivity only to the Yarn Master, not the HDFS
Datanodes. How can we submit the jobs? The idea would be to preload all the
dependencies (job code, libraries, etc) to HDFS, and just submit
Hi Team,
I am working on a basic streaming aggregation where I have one file stream
source and two write sinks (Hudi table). The only difference is the
aggregation performed is different, hence I am using the same spark
session to perform both operations.
(File Source)
--> Agg1 -&g
The version our job originally ran was spark 3.3.1 and Apache Iceberg to
1.2.0, But since we upgraded to spark3.4.1 and Apache Iceberg to 1.3.1,
jobs started to fail frequently, We tried to upgrade only iceberg without
upgrading spark, and the job did not report an error.
Detailed description
The version our job originally ran was spark 3.3.1 and Apache Iceberg to
1.2.0, But since we upgraded to spark3.4.1 and Apache Iceberg to 1.3.1,
jobs started to fail frequently, We tried to upgrade only iceberg without
upgrading spark, and the job did not report an error.
Detailed description
The version our job originally ran was spark 3.3.1 and Apache Iceberg to
1.2.0, But since we upgraded to spark3.4.1 and Apache Iceberg to 1.3.1,
jobs started to fail frequently, We tried to upgrade only iceberg without
upgrading spark, and the job did not report an error.
Detailed description
may be more straightforward to upgrade the
> library that brings it in, assuming a later version brings in a later okio.
> You can also manage up the version directly with a new entry in
>
>
> However, does this affect Spark? all else equal it doesn't hurt to
> upgrade, but wonde
Greetings,
tl;dr there must have been a regression in spark *connect*'s ability to
retrieve data, more details in linked issues
https://issues.apache.org/jira/browse/SPARK-45598
https://issues.apache.org/jira/browse/SPARK-45769
we have projects that depend on spark connect 3.5 and we'd
>
> On Thu, 12 Oct, 2023, 7:46 pm Suyash Ajmera,
> wrote:
>
>> I have upgraded my spark job from spark 3.3.1 to spark 3.5.0, I am
>> querying to Mysql Database and applying
>>
>> `*UPPER(col) = UPPER(value)*` in the subsequent sql query. It is working
>>
Hi,
Spark standalone mode does not use or rely on ZooKeeper by default. The
Spark master and workers communicate directly with each other without using
ZooKeeper. However, it appears that in your case you are relying on
ZooKeeper to provide high availability for your standalone cluster
I am using spark-3.4.1 I have a setup with three ZooKeeper servers, Spark
master shuts down when a Zookeeper instance is down a new master is elected
as leader and the cluster is up. But the original master that was down
never comes up. can you please help me with this issue?
Stackoverflow link
Hi,
Our company is currently introducing the Spark Connect server to
production.
Most of the issues have been solved yet I don't know how to configure
authentication from a pySpark client to the Spark Connect server.
I noticed that there is some interceptor configs at the Scala client side
Hi all,
Wondering if anyone has run into this as I can't find any similar issues in
JIRA, mailing list archives, Stack Overflow, etc. I had a query that was
running successfully, but the query planning time was extremely long (4+
hours). To fix this I added `checkpoint()` calls earlier in the
Thanks Alonso,
I think this gives me some ideas.
My code is written in Python, and I use spark-submit to submit it.
I am not sure what code is written in scala. Maybe the Phoenix driver based on
the stack trace?
How do I tell which version of scala that was compiled against?
Is there a jar
The error message Caused by: java.lang.ClassNotFoundException:
scala.Product$class indicates that the Spark job is trying to load a class
that is not available in the classpath. This can happen if the Spark job is
compiled with a different version of Scala than the version of Scala that
is used
I am getting the error below when I try to run a spark job connecting to
phoneix. It seems like I have the incorrect scala version that some part of
the code is expecting.
I am using spark 3.5.0, and I have copied these phoenix jars into the spark lib
phoenix-server-hbase-2.5-5.1.3.jar
I was thinking in line of elasticity and autoscaling for Spark in the
context of Kubernetes. My experience with Kubernetes and Spark on the so
called autopilot has not been that great.This is mainly from my experience
that in autopilot you let the choice of nodes be decided by the vendor's
Hi, eabour
Thank you for the insights.
Based on the information you provided, along with the PR
[SPARK-42371][CONNECT] that add "./sbin/start-connect-server.sh" script,
I'll experiment with launching the Spark Connect Server in Cluster Mode on
Kubernetes.
[SPARK-42371][CONNECT] A
id
>> and rev. scode = I.scode;
>>
>> Thanks,
>> Sadha
>>
>> On Sat, Oct 21, 2023 at 3:21 PM Meena Rajani
>> wrote:
>>
>>> Hello all:
>>>
>>> I am using spark sql to join two tables. To my surprise I am
>>> getting re
Hi all.
I read source code at spark/python/pyspark/sql/connect/session.py at master
· apache/spark (github.com) and the comment for the "stop" method is described
as follows:
def stop(self) -> None:
# Stopping the session will only close the connection to the cur
hi all,
i noticed a weird behavior to when spark parses nested json with schema
conflict.
i also just noticed that spark "fixed" this in the most recent release
3.5.0 but since i'm working with AWS services being:
* EMR 6: spark 3.3.* spark3.4.*
* Glue 3: spark3.1.1
* Glue 4: spark 3
Hi Team.
I use spark 3.5.0 to start Spark cluster with start-master.sh and
start-worker.sh, when I use ./bin/spark-shell --master
spark://LAPTOP-TC4A0SCV.:7077 and get error logs:
```
23/10/24 12:00:46 ERROR TaskSchedulerImpl: Lost an executor 1 (already
removed): Command exited with code
t; Thanks,
> Sadha
>
> On Sat, Oct 21, 2023 at 3:21 PM Meena Rajani
> wrote:
>
>> Hello all:
>>
>> I am using spark sql to join two tables. To my surprise I am
>> getting redundant rows. What could be the cause.
>>
>>
>> select rev.* from rev
>&
code
left join item I
on rev.sys = I.sys
and rev.custumer_id = I.custumer_id
and rev. scode = I.scode;
Thanks,
Sadha
On Sat, Oct 21, 2023 at 3:21 PM Meena Rajani wrote:
> Hello all:
>
> I am using spark sql to join two tables. To my surprise I am
> getting redundant rows. What co
Hi Meena,
It's not impossible, but it's unlikely that there's a bug in Spark SQL
randomly duplicating rows. The most likely explanation is there are more
records in the item table that match your sys/custumer_id/scode criteria
than you expect.
In your original query, try changing select rev
Hello all:
I am using spark sql to join two tables. To my surprise I am
getting redundant rows. What could be the cause.
select rev.* from rev
inner join customer c
on rev.custumer_id =c.id
inner join product p
rev.sys = p.sys
rev.prin = p.prin
rev.scode= p.bcode
left join item I
on rev.sys
Hi,
my code:
from pyspark.sql import SparkSession
spark = SparkSession.builder.remote("sc://172.29.190.147").getOrCreate()
import pandas as pd
# 创建pandas dataframe
pdf = pd.DataFrame({
"name": ["Alice", "Bob", "Charlie"],
"ag
SparkConnectService.
org.apache.spark.sql.connect.SparkConnectPlugin : To enable Spark Connect,
simply make sure that the appropriate JAR is available in the CLASSPATH and the
driver plugin is configured to load this class.
org.apache.spark.sql.connect.SimpleSparkConnectService : A simple main class
method
Hi all,
Has the spark connect server running on k8s functionality been implemented?
From: Nagatomi Yasukazu
Date: 2023-09-05 17:51
To: user
Subject: Re: Running Spark Connect Server in Cluster Mode on Kubernetes
Dear Spark Community,
I've been exploring the capabilities of the Spark
UNSUBSCRIBE
On Tue, Oct 17, 2023 at 5:09 PM Amirhossein Kabiri <
amirhosseikab...@gmail.com> wrote:
> I used Ambari to config and install Hive and Spark. I want to insert into
> a hive table using Spark execution Engine but I face to this weird error.
> The error is:
I used Ambari to config and install Hive and Spark. I want to insert into a
hive table using Spark execution Engine but I face to this weird error. The
error is:
Job failed with java.lang.ClassNotFoundException:
ive_20231017100559_301568f9-bdfa-4f7c-89a6-f69a65b30aaf:1
2023-10-17 10:07:42,972
compare to
HPC local mode. They tested with some complex data science scripts using
spark and other data science projects. The cluster is really stable and
very performant.
I enabled dynamic allocation and cap the memory and cpu accordingly at
spark-defaults. Conf and at our spark framework code. So its
This issue is related to CharVarcharCodegenUtils readSidePadding method .
Appending white spaces while reading ENUM data from mysql
Causing issue in querying , writing the same data to Cassandra.
On Thu, 12 Oct, 2023, 7:46 pm Suyash Ajmera,
wrote:
> I have upgraded my spark job from sp
I have upgraded my spark job from spark 3.3.1 to spark 3.5.0, I am querying
to Mysql Database and applying
`*UPPER(col) = UPPER(value)*` in the subsequent sql query. It is working as
expected in spark 3.3.1 , but not working with 3.5.0.
Where Condition :: `*UPPER(vn) = 'ERICSSON' AND (upper(st
This has been brought up a few times. I will focus on Spark Structured
Streaming
Autoscaling does not support Spark Structured Streaming (SSS). Why because
streaming jobs are typically long-running jobs that need to maintain state
across micro-batches. Autoscaling is designed to scale up and down
Hello Experts
Is there any true auto scaling option for spark? The dynamic auto scaling
works only for batch. Any guidelines on spark streaming autoscaling and
how that will be tied to any cluster level autoscaling solutions?
Thanks
pm Agrawal, Sanket,
wrote:
> Hi All,
>
>
>
> We are trying to send the spark logs using fluent-bit. We validated that
> fluent-bit is able to move logs of all other pods except the
> driver/executor pods.
>
>
>
> It would be great if someone can guide us whe
Unsubscribe
> Em 9 de out. de 2023, à(s) 07:03, Mich Talebzadeh
> escreveu:
>
> Hi,
>
> Please see my responses below:
>
> 1) In Spark Structured Streaming does commit mean streaming data has been
> delivered to the sink like Snowflake?
>
> No. a co
Your mileage varies. Often there is a flavour of Cloud Data warehouse
already there. CDWs like BigQuery, Redshift, Snowflake and so forth. They
can all do a good job for various degrees
- Use efficient data types. Choose data types that are efficient for
Spark to process. For example, use
Thank you for your feedback Mich.
In general how can one optimise the cloud data warehouses (the sink part), to
handle streaming Spark data efficiently, avoiding bottlenecks that discussed.
AKOn Monday, 9 October 2023 at 11:04:41 BST, Mich Talebzadeh
wrote:
Hi,
Please see my
Hi All,
We are trying to send the spark logs using fluent-bit. We validated that
fluent-bit is able to move logs of all other pods except the driver/executor
pods.
It would be great if someone can guide us where should I look for spark logs in
Spark on Kubernetes with client/cluster mode
Hi,
Please see my responses below:
1) In Spark Structured Streaming does commit mean streaming data has been
delivered to the sink like Snowflake?
No. a commit does not refer to data being delivered to a sink like
Snowflake or bigQuery. The term commit refers to Spark Structured Streaming
(SS
Hello team
1) In Spark Structured Streaming does commit mean streaming data has been
delivered to the sink like Snowflake?
2) if sinks like Snowflake cannot absorb or digest streaming data in a timely
manner, will there be an impact on spark streaming itself?
Thanks
AK
You might be affected by this issue:
https://github.com/apache/iceberg/issues/8601
It was already patched but it isn't released yet.
On Thu, Oct 5, 2023 at 7:47 PM Prashant Sharma wrote:
> Hi Sanket, more details might help here.
>
> How does your spark configuration look like?
Thanks Ahmed. I am trying to bring this up with Spark DE community
On Thu, Oct 5, 2023 at 12:32 PM Ahmed Albalawi <
ahmed.albal...@capitalone.com> wrote:
> Hello team,
>
> We are in the process of upgrading one of our apps to Spring Boot 3.x
> while using Spark, and we have en
I think we already updated this in Spark 4. However for now you would have
to also include a JAR with the jakarta.* classes instead.
You are welcome to try Spark 4 now by building from master, but it's far
from release.
On Thu, Oct 5, 2023 at 11:53 AM Ahmed Albalawi
wrote:
> Hello team,
>
Hello team,
We are in the process of upgrading one of our apps to Spring Boot 3.x while
using Spark, and we have encountered an issue with Spark compatibility,
specifically with Jakarta Servlet. Spring Boot 3.x uses Jakarta Servlet
while Spark uses Javax Servlet. Can we get some guidance on how
Hi Sanket, more details might help here.
How does your spark configuration look like?
What exactly was done when this happened?
On Thu, 5 Oct, 2023, 2:29 pm Agrawal, Sanket,
wrote:
> Hello Everyone,
>
>
>
> We are trying to stream the changes in our Iceberg tables stored
Hello Everyone,
We are trying to stream the changes in our Iceberg tables stored in AWS S3. We
are achieving this through Spark-Iceberg Connector and using JAR files for
Spark-AWS. Suddenly we have started receiving error "Connection pool shut down".
Spark Version: 3.4.1
Iceberg:
Hello,
Due to the way Spark implements shuffle, a loss of an executor sometimes
results in the recomputation of partitions that were lost
The definition of a *partition* is the tuple ( RDD-ids, partition id )
RDD-ids is a sequence of RDD ids
In our system, we define the unit of work performed
Dear Jörn Franke, Jayabindu Singh and Spark Community members,
Thank you profoundly for your initial insights. I feel it's necessary to
provide more precision on our setup to facilitate a deeper understanding.
We're interfacing with S3 Compatible storages, but our operational context
is somewhat
Identity federation may ease this compared to a secret store.Am 01.10.2023 um 08:27 schrieb Jon Rodríguez Aranguren :Dear Jörn Franke, Jayabindu Singh and Spark Community members,Thank you profoundly for your initial insights. I feel it's necessary to provide more precision on our setup to facilitate
s arising from such
loss, damage or destruction.
On Sun, 1 Oct 2023 at 06:36, Jayabindu Singh <jayabi...@gmail.com> wrote:Hi Jon,Using IAM as suggested by Jorn is the best approach.We recently moved our spark workload from HDP to Spark on K8 and utilizing IAM.It will save you from secret
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.
On Sun, 1 Oct 2023 at 06:36, Jayabindu Singh wrote:
> Hi Jon,
>
> Using IAM as suggested by Jorn is t
Hi Jon,
Using IAM as suggested by Jorn is the best approach.
We recently moved our spark workload from HDP to Spark on K8 and utilizing
IAM.
It will save you from secret management headaches and also allows a lot
more flexibility on access control and option to allow access to multiple
S3 buckets
uez Aranguren
> :
>
>
> Dear Spark Community Members,
>
> I trust this message finds you all in good health and spirits.
>
> I'm reaching out to the collective expertise of this esteemed community with
> a query regarding Spark on Kubernetes. As a newcomer, I ha
Dear Spark Community Members,
I trust this message finds you all in good health and spirits.
I'm reaching out to the collective expertise of this esteemed community
with a query regarding Spark on Kubernetes. As a newcomer, I have always
admired the depth and breadth of knowledge shared within
Hello Everyone,
We have setup spark and setup Iceberg-Glue connectors as mentioned at
https://iceberg.apache.org/docs/latest/aws/ to integrate Spark, Iceberg, and
AWS Glue Catalog. We are able to read tables through this but we are unable to
read data through views. PFB, the error
Hello,
What would be the right way, if any, to inject a runtime variable into Spark
logs. So that, for example, if Spark (driver/worker) logs some
info/warning/error message, the variable will be output there (in order to help
filtering logs for the sake of monitoring and troubleshooting
Hi,
>From Spark Connect's official site's image, it mentions the "Multi-tenant
Application Gateway" on driver. Are there any more documents about it? Can
I know how users can utilize such a feature?
Thanks,
Kezhi
Hi all,
This week, I tried upgrading to Spark 3.5.0, as it contained some fixes
for spark-protobuf that I need for my project. However, my code is no
longer running under Spark 3.5.0.
My build.sbt file is configured as follows:
val sparkV = "3.5.0"
val hadoopV
Hello everyone,
I'm using scala and spark with the version 3.4.1 in Windows 10. While streaming
using Spark, I give the `cleanSource` option as "archive" and the
`sourceArchiveDir` option as "archived" as in the code below.
```
spark.readStream
.option("cleanSour
Multiple applications can run at once, but you need to either configure
Spark or your applications to allow that. In stand-alone mode, each
application attempts to take all resources available by default. This
section of the documentation has more details:
https://spark.apache.org/docs/latest
already.
3. We will stick with NFS for now and stand alone then may be will explore
HDFS and YARN.
Can you please confirm whether multiple users can run spark jobs at the
same time?
If so I will start working on it and let you know how it goes
Mich, the link to Hadoop is not working. Can you
Hello all,
I know that these parameters exist for shuffle tuning:
*spark.shuffle.io.serverThreadsspark.shuffle.io.clientThreadsspark.shuffle.io.threads*
But we also have
*spark.rpc.io.serverThreadsspark.rpc.io.clientThreadsspark.rpc.io.threads*
So specifically talking about *Shuffling,
use Hive Metastore called Derby :( ) is something respetable like
> Postgres DB that can handle multiple concurrent spark jobs
>
> HTH
>
>
> Mich Talebzadeh,
> Distinguished Technologist, Solutions Architect & Engineer
> London
> United Kingdom
>
>
multiple concurrent spark jobs
HTH
Mich Talebzadeh,
Distinguished Technologist, Solutions Architect & Engineer
London
United Kingdom
view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
https://en.everybodywiki.com/Mich_Talebzadeh
*Disclaimer:* Use
Yes, should work fine, just set up according to the docs. There needs to be
network connectivity between whatever the driver node is and these 4 nodes.
On Thu, Sep 14, 2023 at 11:57 PM Ilango wrote:
>
> Hi all,
>
> We have 4 HPC nodes and installed spark individually in all node
I use Spark in standalone mode. It works well, and the instructions on the
site are accurate for the most part. The only thing that didn't work for me
was the start_all.sh script. Instead, I use a simple script that starts the
master node, then uses SSH to connect to the worker machines and start
Hi all,
We have 4 HPC nodes and installed spark individually in all nodes.
Spark is used as local mode(each driver/executor will have 8 cores and 65
GB) in Sparklyr/pyspark using Rstudio/Posit workbench. Slurm is used as
scheduler.
As this is local mode, we are facing performance issue(as only
at’s so cool! Great work y’all :)
>>
>> On Tue, Sep 12, 2023 at 8:14 PM bo yang wrote:
>>
>>> Hi Spark Friends,
>>>
>>> Anyone interested in using Golang to write Spark application? We created
>>> a Spark Connect Go Client library
>>>
This is absolutely awesome! Thank you so much for dedicating your time to
this project!
On Wed, Sep 13, 2023 at 6:04 AM Holden Karau wrote:
> That’s so cool! Great work y’all :)
>
> On Tue, Sep 12, 2023 at 8:14 PM bo yang wrote:
>
>> Hi Spark Friends,
>>
>> Any
That’s so cool! Great work y’all :)
On Tue, Sep 12, 2023 at 8:14 PM bo yang wrote:
> Hi Spark Friends,
>
> Anyone interested in using Golang to write Spark application? We created a
> Spark
> Connect Go Client library <https://github.com/apache/spark-connect-go>.
> Wou
Hello Spark community
Can anyone direct me to a simple graph/chart that shows APACHE Spark
adoption, preferably one that includes recent years? Of less importance, a
similar Databricks plot?
An internet search gave me plots only up to 2015. I also searched
spark.apache.org and databricks.com
Hi Spark Friends,
Anyone interested in using Golang to write Spark application? We
created a Spark
Connect Go Client library <https://github.com/apache/spark-connect-go>.
Would love to hear feedback/thoughts from the community.
Please see the quick start guide
<https://github.com/apa
Hi Yasukazu,
I tried by replacing the jar though the spark code didn’t work but the
vulnerability was removed. But I agree that even 3.1.3 has other
vulnerabilities listed on maven page but these are medium level
vulnerabilities. We are currently targeting Critical and High vulnerabilities
@Alfie Davidson : Awesome, it worked with
"“org.elasticsearch.spark.sql”"
But as soon as I switched to *elasticsearch-spark-20_2.12, *"es" also
worked.
On Fri, Sep 8, 2023 at 12:45 PM Dipayan Dev wrote:
>
> Let me try that and get back. Just wondering, if there a ch
Let me try that and get back. Just wondering, if there a change in the way
we pass the format in connector from Spark 2 to 3?
On Fri, 8 Sep 2023 at 12:35 PM, Alfie Davidson
wrote:
> I am pretty certain you need to change the write.format from “es” to
> “org.elasticsearch.spark.sql”
&g
Hi, I tried replacing just this JAR but getting errors.
From: Nagatomi Yasukazu
Sent: Friday, September 8, 2023 9:35 AM
To: Agrawal, Sanket
Cc: Chao Sun ; Yeachan Park ;
user@spark.apache.org
Subject: [EXT] Re: Spark 3.4.1 and Hive 3.1.3
Hi Sanket,
While migrating to Hive 3.1.3 may resolve
hursday, September 7, 2023 10:23 PM
> *To:* Agrawal, Sanket
> *Cc:* Yeachan Park ; user@spark.apache.org
> *Subject:* [EXT] Re: Spark 3.4.1 and Hive 3.1.3
>
>
>
> Hi Sanket,
>
>
>
> Spark 3.4.1 currently only works with Hive 2.3.9, and it would require a
> lot of
ame issue.
>
>
> org.elasticsearch
> elasticsearch-spark-30_${scala.compat.version}
> 7.12.1
>
>
>
> On Fri, Sep 8, 2023 at 4:41 AM Sean Owen wrote:
>
>> By marking it provided, you are not including this dependency with your
>> app. If it is also
Hi Sean,
Removed the provided thing, but still the same issue.
org.elasticsearch
elasticsearch-spark-30_${scala.compat.version}
7.12.1
On Fri, Sep 8, 2023 at 4:41 AM Sean Owen wrote:
> By marking it provided, you are not including this dependency with your
> app. If it i
301 - 400 of 34865 matches
Mail list logo