Hello,
Is it possible to run SparkML using Spark Connect 3.5.0? So far I've had no
success setting up a connect client that uses ML package
The ML package uses spark core/sql afaik which seems to be shadowing the
Spark connect client classes
Do I have to exclude any dependencies from the mllib
Vajiha filed a spark-rapids discussion here
https://github.com/NVIDIA/spark-rapids/discussions/7205, so if you are
interested please follow there.
On Wed, Nov 30, 2022 at 7:17 AM Vajiha Begum S A <
vajihabegu...@maestrowiz.com> wrote:
> Hi,
> I'm using an Ubuntu system with the NVIDIA Quadro
Hi,
I'm using an Ubuntu system with the NVIDIA Quadro K1200 with GPU memory 20GB
Installed - CUDF 22.10.0 jar file, Rapid 4 Spark 2.12-22.10.0 jar file,
CUDA Toolkit 11.8.0 Linux version., JAVA 8
I'm running only single server, Master is localhost
I'm trying to run pyspark code through spark
spark-submit /home/mwadmin/Documents/test.py
22/11/30 14:59:32 WARN Utils: Your hostname, mwadmin-HP-Z440-Workstation
resolves to a loopback address: 127.0.1.1; using ***.***.**.** instead (on
interface eno1)
22/11/30 14:59:32 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to
another address
Oh, I got it. I thought SPARK can get local scala version.
- 原始邮件 -
发件人:Sean Owen
收件人:ckgppl_...@sina.cn
抄送人:user
主题:Re: Spark got incorrect scala version while using spark 3.2.1 and spark 3.2.2
日期:2022年08月26日 21点08分
Spark is built with and ships with a copy of Scala. It doesn't use
good answer. nice to know too.
Sean Owen wrote:
Spark is built with and ships with a copy of Scala. It doesn't use your
local version.
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Spark is built with and ships with a copy of Scala. It doesn't use your
local version.
On Fri, Aug 26, 2022 at 2:55 AM wrote:
> Hi all,
>
> I found a strange thing. I have run SPARK 3.2.1 prebuilt in local mode. My
> OS scala version is 2.13.7.
> But when I run spark-sumit then check the
Hi all,
I found a strange thing. I have run SPARK 3.2.1 prebuilt in local mode. My OS
scala version is 2.13.7.But when I run spark-sumit then check the SparkUI, the
web page shown that my scala version is 2.13.5.I used spark-shell, it also
shown that my scala version is 2.13.5.Then I tried
Following up on this in case anyone runs across it in the archives in the
future
>From reading through the config docs and trying various combinations, I've
discovered that:
- You don't want to disable codegen. This roughly doubled the time to
perform simple, few-column/few-row queries from basic
Hi all,
I've not got much experience with Spark, but have been reading the Catalyst
and
Datasources V2 code/tests to try to get a basic understanding.
I'm interested in trying Catalyst's query planner + optimizer for queries
spanning one-or-more JDBC sources.
Somewhat unusually, I'd like to do
Hello together,
I am trying to run a minimal example in my k8s cluster.
First, I cloned the petastorm github repo: https://github.com/uber/petastorm
Second, I created a Dockerimage as follows:
FROMubuntu:20.04
RUN apt-get update -qq
RUN apt-get install -qq -y software-properties-common
RUN
ubject: [EXTERNAL] Re: Unable to access Google buckets using spark-submit
Caution! This email originated outside of FedEx. Please do not open attachments
or click links from an unknown or suspicious origin.
Hi Gaurav, All,
I'm doing a spark-submit from my local system to a GCP Dataproc c
put the GS access jar with your Spark jars — that’s what the
>> class not found exception is pointing you towards.
>>
>> On Fri, Feb 11, 2022 at 11:58 PM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> BTW I also answered you in in stackove
owards.
>
> On Fri, Feb 11, 2022 at 11:58 PM Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> BTW I also answered you in in stackoverflow :
>>
>>
>> https://stackoverflow.com/questions/71088934/unable-to-access-google-buckets-using-spark-submi
Thanks, Mich - will check this and update.
regds,
Karan Alang
On Sat, Feb 12, 2022 at 1:57 AM Mich Talebzadeh
wrote:
> BTW I also answered you in in stackoverflow :
>
>
> https://stackoverflow.com/questions/71088934/unable-to-access-google-buckets-using-spark-submit
>
> HT
and I quote "I'm
trying to access google buckets, when using spark-submit and running into
issues., What needs to be done to debug/fix this". Quote from stack
overflow
Hence the approach adopted is correct. He has created a bucket in GCP
called gs://spark-jars-karan/ and wants to ac
t;
> On Fri, Feb 11, 2022 at 11:58 PM Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> BTW I also answered you in in stackoverflow :
>>
>>
>> https://stackoverflow.com/questions/71088934/unable-to-access-google-buckets-using-spark-submi
88934/unable-to-access-google-buckets-using-spark-submit
>
> HTH
>
>
>view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
> https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your ow
BTW I also answered you in in stackoverflow :
https://stackoverflow.com/questions/71088934/unable-to-access-google-buckets-using-spark-submit
HTH
view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
https://en.everybodywiki.com/Mich_Tale
.UnsupportedFileSystemException: No FileSystem for scheme
> "gs"
>
> ```
> I tried adding the --conf
> spark.hadoop.fs.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem
>
> to the spark-submit command, but getting ClassNotFoundException
>
> Details are in s
are in stackoverflow :
https://stackoverflow.com/questions/71088934/unable-to-access-google-buckets-using-spark-submit
Any ideas on how to fix this ?
tia !
coordinates? So that
> we run sth like pip install or download from pypi index?
>
>
>
> *From:* Mich Talebzadeh
> *Sent:* Mittwoch, 24. November 2021 18:28
> *Cc:* user@spark.apache.org
> *Subject:* Re: [issue] not able to add external libs to pyspark job while
> using s
using
spark-submit
The easiest way to set this up is to create dependencies.zip file.
Assuming that you have a virtual environment already set-up, where there is
directory called site-packages, go to that directory and just create a minimal
a shell script say package_and_zip_dependencies.sh to do
Dear Spark team,
> hope my email finds you well
>
>
> I am using pyspark 3.0 and facing an issue with adding external library
> [configparser] while running the job using [spark-submit] & [yarn]
>
> issue:
>
>
> import configparser
> ImportError: No module named c
external libs to pyspark job while using
spark-submit
You don't often get email from sro...@gmail.com. Learn why this is
important<http://aka.ms/LearnAboutSenderIdentification>
External Sender: be CAUTION , Particularly with links and attachments.
That's not how you add a library. From th
facing an issue with adding external library
> [configparser] while running the job using [spark-submit] & [yarn]
>
> issue:
>
>
> import configparser
> ImportError: No module named configparser21/11/24 08:54:38 INFO
> util.ShutdownHookManager: Shutdown hook called
>
>
Dear Spark team,
hope my email finds you well
I am using pyspark 3.0 and facing an issue with adding external library
[configparser] while running the job using [spark-submit] & [yarn]
issue:
import configparser
ImportError: No module named configparser
21/11/24 08:54:38
Hi,
I have a pod on openshift 4.6 running a jupyter notebook with spark 3.1.1 and
python 3.7 (based on open data hub, tweaked the dockerfile because I wanted
this specific python version).
I'm trying to run spark in client mode using the image of google's spark
operator
Issue: We are using wholeTextFile() API to read files from S3. But this API
is extremely SLOW due to reasons mentioned below. Question is how to fix this
issue?
Here is our analysis so FAR:
Issue is we are using Spark WholeTextFile API to read s3 files. WholeTextFile
API works in two step
BTW what assumption is there that the thread owner is writing to the
cluster? The thrift server is running locally on localhost:1. I concur
that JDBC to remote Hive is needed. However, this is not the impression I
get here.
df.write
.format("jdbc")
.option("url",
>From the Cloudera Documentation:
https://docs.cloudera.com/documentation/other/connectors/hive-jdbc/latest/Cloudera-JDBC-Driver-for-Apache-Hive-Install-Guide.pdf
UseNativeQuery
1: The driver does not transform the queries emitted by applications, so
the native query is used.
0: The driver
Insert mode is "overwrite", it shouldn't doesn't matter if the table
already exists or not. The JDBC driver should be based on the Cloudera Hive
version, we can't know the CDH version he's using.
On Tue, Jul 20, 2021 at 1:21 PM Mich Talebzadeh
wrote:
> The driver is fine and latest and it
The driver is fine and latest and it should work.
I have asked the thread owner to send the DDL of the table and how the
table is created. In this case JDBC from Spark expects the table to be
there.
The error below
java.sql.SQLException: [Cloudera][HiveJDBCDriver](500051) ERROR processing
Badrinath is trying to write to a Hive in a cluster where he doesn't have
permission to submit spark jobs, he doesn't have Hive/Spark metadata
access.
The only way to communicate with this third-party Hive cluster is through
JDBC protocol.
[ Cloudera Data Hub - Hive Server] <-> [Spark Standalone]
As Mich mentioned, no need to use jdbc API, using the DataFrameWriter's
saveAsTable method is the way to go. JDBC Driver is for a JDBC client
(a Java client for instance) to access the Hive tables in Spark via the
Thrift server interface.
-- ND
On 7/19/21 2:42 AM, Badrinath Patchikolla
I have trying to create table in hive from spark itself,
And using local mode it will work what I am trying here is from spark
standalone I want to create the manage table in hive (another spark cluster
basically CDH) using jdbc mode.
When I try that below are the error I am facing.
On Thu, 15
Your Driver seems to be OK.
hive_driver: com.cloudera.hive.jdbc41.HS2Driver
However this is theSQL error you are getting
Caused by: com.cloudera.hiveserver2.support.exceptions.GeneralException:
[Cloudera][HiveJDBCDriver](500051) ERROR processing query/statement. Error
Code: 4, SQL state:
Have you created that table in Hive or are you trying to create it from
Spark itself.
You Hive is local. In this case you don't need a JDBC connection. Have you
tried:
df2.write.mode("overwrite").saveAsTable(mydb.mytable)
HTH
view my Linkedin profile
Hi,
Trying to write data in spark to the hive as JDBC mode below is the sample
code:
spark standalone 2.4.7 version
21/07/15 08:04:07 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
Setting default log level to
'.'+tableName
>>>>> user = self.config['OracleVariables']['oracle_user']
>>>>> password = self.config['OracleVariables']['oracle_password']
>>>>> driver = self.config['OracleVariables']['oracle_driver']
>>>>>
tchsize = self.config['OracleVariables']['fetchsize']
>>>> read_df =
>>>> s.loadTableFromJDBC(self.spark,oracle_url,fullyQualifiedTableName,user,password,driver,fetchsize)
>>>> # check that all rows are there
>>>> if df2.
oaded to Oracle table, quitting")
>>> sys.exit(1)
>>>
>>> in the statement where it says
>>>
>>> option("dbtable", tableName). \
>>>
>>> You can replace *tableName* with the equivalent SQL insert statement
>>>
t;>>
>>> option("dbtable", tableName). \
>>>
>>> You can replace *tableName* with the equivalent SQL insert statement
>>>
>>> You will need JDBC driver for Oracle say ojdbc6.jar in
>>> $SPARK_HOME/conf/spark-defau
.driver.extraClassPath
>> /home/hduser/jars/jconn4.jar:/home/hduser/jars/ojdbc6.jar
>>
>> HTH
>>
>>
>>
>>view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>
>> *Disclaimer:
gt; spark.driver.extraClassPath
> /home/hduser/jars/jconn4.jar:/home/hduser/jars/ojdbc6.jar
>
> HTH
>
>
>
>view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all resp
ll in no case be liable for any monetary damages arising from
such loss, damage or destruction.
On Fri, 18 Jun 2021 at 20:49, Anshul Kala wrote:
> Hi All,
>
> I am using spark to ingest data from file to database Oracle table . For
> one of the fields , the value to be populated i
Hi All,
I am using spark to ingest data from file to database Oracle table . For
one of the fields , the value to be populated is generated from a function
that is written in database .
The input to the function is one of the fields of data frame
I wanted to use spark.dbc.write to perform
Definitely not a spark task.
Moving files within the same filesystem is merely a linking exercise, you
don't have to actually move any data. Write a shell script creating hard
links in the new location, once you're satisfied, remove the old links,
profit.
--
Sent from:
Hello ,
I know this might not be a valid use case for spark. But I have millions of
files in a single folder. file names are having a pattern. based on pattern
I want to move it to different directory.
Can you pls suggest what can be done?
Thanks
rajat
Hi Friends,
I’d like to publish a document to Medium about data lakes using Spark.
Its latter parts include info that is not widely known, unless you have
experience with data lakes.
https://github.com/borislitvak/datalake-article/blob/initial_comments/Building%20a%20Real%20Life%20Data%20Lake
ying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.
On Mon, 22 Mar 2021 at 05:38, Gaurav Singh wrote:
> Hi Team,
>
> We have lots of complex oracle views ( containing multiple tables, joins,
> analytical and aggregate functions, sub queries etc) and we are wondering
> if Spark can help us execute those views faster.
>
> Also we want to know if those complex views can be implemented using Spark
> SQL?
>
> Thanks and regards,
> Gaurav Singh
> +91 8600852256
>
>
nt to know if those complex views can be implemented using Spark
> SQL?
>
> Thanks and regards,
> Gaurav Singh
> +91 8600852256
>
>
Hi Team,
We have lots of complex oracle views ( containing multiple tables, joins,
analytical and aggregate functions, sub queries etc) and we are wondering
if Spark can help us execute those views faster.
Also we want to know if those complex views can be implemented using Spark
SQL?
Thanks
, March 12, 2021 at 2:53 PM
> *To: *User
> *Subject: *[EXTERNAL] Using Spark as a fail-over platform for Java app
>
>
>
> *CAUTION*: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and kno
balancer will
shift the traffic to the healthy node until the crashed node recovers.
From: Sergey Oboguev
Date: Friday, March 12, 2021 at 2:53 PM
To: User
Subject: [EXTERNAL] Using Spark as a fail-over platform for Java app
CAUTION: This email originated from outside of the organization. Do
I have an existing plain-Java (non-Spark) application that needs to run in
a fault-tolerant way, i.e. if the node crashes then the application is
restarted on another node, and if the application crashes because of
internal fault, the application is restarted too.
Normally I would run it in a
I think a lot will depend on what the scripts do. I've seen some legacy
hive scripts which were written in an awkward way (e.g. lots of subqueries,
nested explodes) because pre-spark it was the only way to express certain
logic. For fairly straightforward operations I expect Catalyst would reduce
My 2 cents is that this is a complicated question since I'm not confident
that Spark is 100% compatible with Hive in terms of query language. I have
an unanswered question in this list about this:
Hi All,
Not sure if I need to ask this question on spark community or hive community.
We have a set of hive scripts that runs on EMR (Tez engine). We would like to
experiment by moving some of it onto Spark. We are planning to experiment with
two options.
1. Use the current code based on
Hi Spark Users,
I am trying to execute bash script from my spark app. I can run the below
command without issues from spark-shell however when I use it in the spark-app
and submit with spark-submit, container is not able to find the directories.
val result = "export LD_LIBRARY_PATH=/
Are local paths not exposed in containers ?
Thanks,
Nasrulla
From: Nasrulla Khan Haris
Sent: Thursday, July 23, 2020 6:13 PM
To: user@spark.apache.org
Subject: Unable to run bash script when using spark-submit in cluster mode.
Importance: High
Hi Spark Users,
I am trying to execute bash
Please try with maxBytesPerTrigger option, probably files are big enough to
crash the JVM.
Please give some info on Executors and file info ( size etc)
Regards,
..Piyush
On Sun, Jul 19, 2020 at 3:29 PM Rachana Srivastava
wrote:
> *Issue:* I am trying to process 5000+ files of gzipped json file
Can you reduce maxFilesPerTrigger further and see if the OOM still persists, if
it does then the problem may be somewhere else.
> On Jul 19, 2020, at 5:37 AM, Jungtaek Lim
> wrote:
>
> Please provide logs and dump file for the OOM case - otherwise no one could
> say what's the cause.
>
>
Please provide logs and dump file for the OOM case - otherwise no one could
say what's the cause.
Add JVM options to driver/executor => -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath="...dir..."
On Sun, Jul 19, 2020 at 6:56 PM Rachana Srivastava
wrote:
> *Issue:* I am trying to process 5000+
Issue: I am trying to process 5000+ files of gzipped json file periodically
from S3 using Structured Streaming code.
Here are the key steps:
-
Read json schema and broadccast to executors
-
Read Stream
Dataset inputDS = sparkSession.readStream() .format("text")
allelize model training developed using standard libraries like
>> Keras, use Horovod from Uber.
>> https://horovod.readthedocs.io/en/stable/spark_include.html
>>
>> On Mon, Jul 13, 2020 at 6:59 AM Mukhtaj Khan wrote:
>> >
>> > Dear Spark User
>> >
&
ent: Thursday, July 16, 2020 at 6:54 PM
> From: "Davide Curcio"
> To: "user@spark.apache.org"
> Subject: “Pyspark.zip does not exist” using Spark in cluster mode with Yarn
>
> I'm trying to run some Spark script in cluster mode using Yarn but I've
> always obt
I'm trying to run some Spark script in cluster mode using Yarn but I've always
obtained this error. I read in other similar question that the cause can be:
"Local" set up hard-coded as a master but I don't have it
HADOOP_CONF_DIR environment variable that's wrong inside spark-env.sh but it
Ok, thanks.
You can buy it here
https://www.amazon.com/s?k=hands+on+machine+learning+with+scikit-learn+and+tensorflow+2=2U0P9XVIJ790T=Hands+on+machine+%2Caps%2C246=nb_sb_ss_i_1_17
This book is like an accompaniment to the Andrew Ng course on coursera.
It uses exact same mathematical notations ,
It is still copyrighted material, no matter its state of editing. Yes,
you should not be sharing this on the internet.
On Tue, Jul 14, 2020 at 9:46 AM Anwar AliKhan wrote:
>
> Please note It is freely available because it is an early unedited raw
> edition.
> It is not 100% complete , it is not
t;> On Mon, Jul 13, 2020 at 6:59 AM Mukhtaj Khan
>>> wrote:
>>> >
>>> > Dear Spark User
>>> >
>>> > I am trying to parallelize the CNN (convolutional neural network)
>>> model using spark. I have developed the model using pyt
gt;> >
>> > Dear Spark User
>> >
>> > I am trying to parallelize the CNN (convolutional neural network) model
>> using spark. I have developed the model using python and Keras library. The
>> model works fine on a single machine but when we try on multi
M Mukhtaj Khan wrote:
> >
> > Dear Spark User
> >
> > I am trying to parallelize the CNN (convolutional neural network) model
> using spark. I have developed the model using python and Keras library. The
> model works fine on a single machine but when we try on multiple machines,
libraries like
> Keras, use Horovod from Uber.
> https://horovod.readthedocs.io/en/stable/spark_include.html
>
> On Mon, Jul 13, 2020 at 6:59 AM Mukhtaj Khan wrote:
> >
> > Dear Spark User
> >
> > I am trying to parallelize the CNN (convolutional neural network) mo
when running spark on a hadoop yarn cluster. Is this
correct? Does the spark history server have the same user functions as
the Spark UI?
But how could this be possible (the possibility of using Spark UI) if
the Spark master server isn't active when all the job scheduling and
resource
Mukhtaj Khan wrote:
>
> Dear Spark User
>
> I am trying to parallelize the CNN (convolutional neural network) model using
> spark. I have developed the model using python and Keras library. The model
> works fine on a single machine but when we try on multiple machines, the
> e
CNN (convolutional neural network) model using
spark. I have developed the model using python and Keras library. The model
works fine on a single machine but when we try on multiple machines, the
execution time remains the same as sequential.Could you please tell me that
there is any built-in library for
Dear Spark User
I am trying to parallelize the CNN (convolutional neural network) model
using spark. I have developed the model using python and Keras library. The
model works fine on a single machine but when we try on multiple machines,
the execution time remains the same as sequential.
Could
Hi Teja,
To access Hive 3 using Apache Spark 2.x.x you need to use this connector
from Cloudera
https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/integrating-hive/content/hive_hivewarehouseconnector_for_handling_apache_spark_data.html
.
It has many limitations You just can write to Hive
2.4 works with Hadoop 3 (optionally) and Hive 1. I doubt it will work
connecting to Hadoop 3 / Hive 3; it's possible in a few cases.
It's also possible some vendor distributions support this combination.
On Mon, Jul 6, 2020 at 7:51 AM Teja wrote:
>
> We use spark 2.4.0 to connect to Hadoop 2.7
We use spark 2.4.0 to connect to Hadoop 2.7 cluster and query from Hive
Metastore version 2.3. But the Cluster managing team has decided to upgrade
to Hadoop 3.x and Hive 3.x. We could not migrate to spark 3 yet, which is
compatible with Hadoop 3 and Hive 3, as we could not test if anything
way:
>>>>>> > > > > > > Dataset productUpdates = watermarkedDS
>>>>>> > > > > > > .groupByKey(
>>>>>> > > > > > > (MapFunction>>>>> String>) event
>>>>
t; appConfig, accumulators),
>>>>> > > > > > >
>>>>> Encoders.bean(ModelStateInfo.class),
>>>>> > > > > > > Encoders.bean(ModelUpdate.class),
>>>>> > > > > > >
>>>
gt; Yes, that's exactly how I am creating them.
>>>> > > > > > > >
>>>> > > > > > > > Question... Are you using 'Stateful Structured Streaming'
>>>> in which
>>>> > > > > >
gt; updateAcrossEvents
>>> > > > > > > > )
>>> > > > > > > >
>>> > > > > > > > And updating the Accumulator inside 'updateAcrossEvents'?
>>> We're
>>> > > > > > experiencing this only under 'S
> > am
>> > > > > > > >> getting the values printed in my driver log as well as
>> sent to
>> > > > > > Grafana. Not
>> > > > > > > >> sure where and when I saw 0 before. My deploy mode
gt; > >> Create accumulators like this:
> > > > > > > >> AccumulatorV2 accumulator =
> sparkContext.longAccumulator(name);
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> On Tue, May 26, 2020
about the Application Specific
> > > Accumulators.
> > > > > The
> > > > > > >>> other standard counters such as
> > > 'event.progress.inputRowsPerSecond'
> > > > > are
> > > > > > >>> getting populated correctly!
> > > > > > >>>
> > > > > >
print in OnQueryProgress. I use
> > > > > >>>> LongAccumulator as well. Yes, it prints on my local but not on
> > > > cluster.
> > > > > >>>> But one consolation is that when I send metrics to Graphana, the
> >
t;>>>> Accumulators. The other standard counters such as
>>>>>>>>> 'event.progress.inputRowsPerSecond' are getting populated correctly!
>>>>>>>>>
>>>>>>>>> On Mon, May 25, 2020 at 8:39 PM Srinivas V
>
>>>> wrote:
>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>> Even for me it comes as 0 when I print in OnQueryProgress. I use
>>>>>>>>> LongAccumulator as well. Yes, it prints on my local but not on
;>>> But one consolation is that when I send metrics to Graphana, the
>>>>>>>> values are coming there.
>>>>>>>>
>>>>>>>> On Tue, May 26, 2020 at 3:10 AM Something Something <
>>>>>>>> mailinglist...@gmail.com
ist...@gmail.com> wrote:
>>>>>>>
>>>>>>>> No this is not working even if I use LongAccumulator.
>>>>>>>>
>>>>>>>> On Fri, May 15, 2020 at 9:54 PM ZHANG Wei wrote:
>>>>>>>>
>>>>>>>>> There is a restriction in Accumulat
;>>>
> > > > >>>>> No this is not working even if I use LongAccumulator.
> > > > >>>>>
> > > > >>>>> On Fri, May 15, 2020 at 9:54 PM ZHANG Wei >
> > > wrote:
> > > > >>>
t;>>
>>>>>>>> There is a restriction in AccumulatorV2 API [1], the OUT type
>>>>>>>> should be atomic or thread safe. I'm wondering if the implementation
>>>>>>>> for
>>>>>>>> `java.util.Map[T, Long
gt;>>>>> `java.util.Map[T, Long]` can meet it or not. Is there any chance
> > to replace
> > > >>>>>> CollectionLongAccumulator by CollectionAccumulator[2] or
> > LongAccumulator[3]
> > > >>>>>> and test if the StreamingListener and ot
gt;>>>>> replace
>>>>>>> CollectionLongAccumulator by CollectionAccumulator[2] or
>>>>>>> LongAccumulator[3]
>>>>>>> and test if the StreamingListener and other codes are able to work?
>>>>>>>
&g
>>>
> https://eur06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspark.apache.org%2Fdocs%2Flatest%2Fapi%2Fscala%2Findex.html%23org.apache.spark.util.AccumulatorV2data=02%7C01%7C%7Ce9cd79340511422f368608d802fc468d%7C84df9e7fe9f640afb435%7C1%7C0%7C637262629
.apache.spark.util.AccumulatorV2data=02%7C01%7C%7Ce9cd79340511422f368608d802fc468d%7C84df9e7fe9f640afb435%7C1%7C0%7C637262629816034378sdata=73AxOzjhvImCuhXPoMN%2Bm7%2BY3KYwwaoCvmYMoOEGDtU%3Dreserved=0
> >>>>>> [2]
> >>>>>> https://eur06.safelinks.protection.
--
>>>>>> Cheers,
>>>>>> -z
>>>>>> [1]
>>>>>> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.util.AccumulatorV2
>>>>>> [2]
>>>>>> http://spark.apache.org/docs/late
1 - 100 of 1072 matches
Mail list logo