Sara, Apache Spark is open source under Apache License 2.0
(https://github.com/apache/spark/blob/master/LICENSE). It is not under
export control of any country! Please feel free to use, reproduce and
distribute, as long as your practice is compliant with the license.
Having said that, some c
n Wed, Jul 12, 2023, 6:00 PM Artemis User wrote:
The error screenshot doesn't tell much. Maybe your job wasn't
submitted
properly. Make sure you IP/port numbers were defined correctly.
Take a
look at the Spark server UI to see what errors occur.
On 7/12/23
Looks like Maven build did find the javac, just can't run it. So it's
not a path problem but a compatibility problem. Are you doing this on a
Mac with M1/M2? I don't think that Zulu JDK supports Apple silicon.
Your best option would be to use homebrew to install the dev tools
(including Op
Not sure where you get the property name "spark.memory.offHeap.use". The
correct one should be "spark.memory.offHeap.enabled". See
https://spark.apache.org/docs/latest/configuration.html#spark-properties
for details.
On 1/30/23 10:12 AM, Jain, Sanchi wrote:
I am not sure if this is the inte
Try this one: "select country, city, max(population) from your_table
group by country"
Please note this returns a table of three columns, instead of two. This
is a standard SQL query, and supported by Spark as well.
On 12/20/22 3:35 PM, Oliver Ruebenacker wrote:
Hello,
Let's say th
Your DDL statement doesn't look right. You may want to check the Spark
SQL Reference online for how to create table in Hive format
(https://spark.apache.org/docs/latest/sql-ref-syntax-ddl-create-table-hiveformat.html).
You should be able to populate the table directly using CREATE by
providing
If you didn't have performance issues before with the history server, it
may not be a threading or RAM problem. You may want to check on the
disk space availability for the event logs...
On 12/8/22 8:00 PM, Nikhil Goyal wrote:
Hi folks,
We are experiencing slowness in Spark history server, he
What if you just do a join with the first condition (equal chromosome)
and append a select with the rest of the conditions after join? This
will allow you to test your query step by step, maybe with a visual
inspection to figure out what the problem is. It may be a data quality
problem as well
What problems did you encounter? Most likely your problem may be
related to saving the model object in different partitions. If that the
case, just apply the dataframe's coalesce(1) method before saving the
model to a shared disk drive...
On 11/16/22 1:51 AM, Vajiha Begum S A wrote:
Hi,
Thi
1 GPU
per executor.
So, the question is how do I limit the stage resources to 20 GPUs total?
Thanks again,
Shay
--------
*From:* Artemis User
*Sent:* Thursday, November 3, 2022 5:23 PM
*To:* user@spark.apache.org
*Subject:* [EXT
----
*From:* Artemis User
*Sent:* Thursday, November 3, 2022 1:16 AM
*To:* user@spark.apache.org
*Subject:* [EXTERNAL] Re: Stage level scheduling - lower the number of
executors when using GPUs
*ATTENTION:*This email originated from outside of GM.
Are you using Rapids for GPU support in Spark? Cou
Are you using Rapids for GPU support in Spark? Couple of options you
may want to try:
1. In addition to dynamic allocation turned on, you may also need to
turn on external shuffling service.
2. Sounds like you are using Kubernetes. In that case, you may also
need to turn on shuffle track
The master UI doesn't return much details, not designed for this
purpose. You need to use the application-level/driver UI instead (on
port 4040/4041...). Please see online doc monitoring and
instrumentation for details
(https://spark.apache.org/docs/latest/monitoring.html#rest-api).
On 10/2
0/26/22 3:20 PM, Holden Karau wrote:
So Spark can dynamically scale on YARN, but standalone mode becomes a
bit complicated — where do you envision Spark gets the extra resources
from?
On Wed, Oct 26, 2022 at 12:18 PM Artemis User
wrote:
Has anyone tried to make a Spark cluster dynamicall
Has anyone tried to make a Spark cluster dynamically scalable, i.e.,
adding a new worker node automatically to the cluster when no more
executors are available upon a new job submitted? We need to make the
whole cluster on-prem and really lightweight, so standalone mode is
preferred and no k8s
Are these Cloudera specific acronyms? Not sure how Cloudera configures
Spark differently, but obviously the number of nodes is too small,
considering each app only uses a small number of cores and RAM. So you
may consider increase the number of nodes. When all these apps jam on
a few nodes,
anyone to connect using
pyspark. the port 9083 is open for anyone without authentication
feature. The only way pyspark able to connect to hive is through 9083
and not through port 1.
On Friday, October 21, 2022 at 04:06:38 AM GMT+8, Artemis User
wrote:
By default, Spark uses Apache Derby (
By default, Spark uses Apache Derby (running in embedded mode with store
content defined in local files) for hosting the Hive metastore. You can
externalize the metastore on a JDBC-compliant database (e.g.,
PostgreSQL) and use the database authentication provided by the
database. The JDBC con
Spark doesn't offer a native graph database like Neo4j does since GraphX
is still using the RDD tabular data structure. Spark doesn't have a GQL
or Cypher query engine either, but uses Google's Pregal API for graph
processing. Don't see any prospect that Spark is going to implement any
types
If you have the hardware resources, it isn't difficult to set up Spark
in a kubernetes cluster. The online doc describes everything you would
need (https://spark.apache.org/docs/latest/running-on-kubernetes.html).
You're right, both AWS EMR and Google's environment aren't flexible and
not che
Do you have to use SQL/window function for this? If I understand this
correctly, you could just keep track of the last record of each "thing",
then calculate the new sum by adding the current value of "thing" to the
sum of last record when a new record is generated. Looks like your
problem will
Read by default can't be parallelized in a Spark job, and doing your own
multi-threaded programming in a Spark program isn't a good idea. Adding
fast disk I/O and increase RAM may speed things up, but won't help with
parallelization. You may have to be more creative here. One option
would be,
The reduce phase is always more resource-intensive than the map phase.
Couple of suggestions you may want to consider:
1. Setting the number of partitions to 18K may be way too high (the
default number is only 200). You may want to just use the default
and the scheduler will automaticall
The off-heap memory isn't subjected to GC. So the obvious reason is
that your have too many states to maintain in your streaming app, and
the GC couldn't keep up, and end up with resources but to die. Are you
using continues processing or microbatch in structured streaming? You
may want to lo
Not sure what you mean by offerts/offsets. I assume you were using
file-based instead of Kafka-based of data sources. Are the incoming
data generated in mini-batch files or in a single large file? Have you
had this type of problem before?
On 7/21/22 1:02 PM, KhajaAsmath Mohammed wrote:
Hi,
WAITFOR is part of the Transact-SQL and it's Microsoft SQL server
specific, not supported by Spark SQL. If you want to impose a delay in
a Spark program, you may want to use the thread sleep function in Java
or Scala. Hope this helps...
On 5/19/22 1:45 PM, K. N. Ramachandran wrote:
Hi Sean,
What scanner did you use? Looks like all CVEs you listed for
jackson-databind-xxx.jar are for older versions (2.9.10.x). A quick
search on NVD revealed that there is only one CVE (CVE-2020-36518) that
affects your Spark versions. This CVE (not on your scanned CVE list) is
on jackson-databind
Your test result just gave the verdict so #2 is the answer - Spark
ignores those non-numeric rows completely when aggregating the average.
On 5/1/22 8:20 PM, wilson wrote:
I did a small test as follows.
scala> df.printSchema()
root
|-- fruit: string (nullable = true)
|-- number: string (null
Most likely your JSON files are not formatted correctly. Please see the
Spark doc on specific formatting requirement for JSON data.
https://spark.apache.org/docs/latest/sql-data-sources-json.html.
On 4/26/22 10:43 AM, Sid wrote:
Hello,
Can somebody help me with the below problem?
https://st
We have a single file directory that's being used by both the file
generator/publisher and the Spark job consumer. When using microbatch
files in structured streaming, we encountered the following problems:
1. We would like to have a Spark streaming job consume only data files
after a prede
example in Spark.
https://spark.apache.org/docs/latest/mllib-clustering.html#streaming-k-means
On Tue, Mar 15, 2022, 3:46 PM Artemis User
wrote:
Has anyone done any experiments of training an ML model using
stream
data? especially for unsupervised models? Any
://spark.apache.org/docs/latest/mllib-clustering.html#streaming-k-means
On Tue, Mar 15, 2022, 3:46 PM Artemis User wrote:
Has anyone done any experiments of training an ML model using stream
data? especially for unsupervised models? Any
suggestions/references
are highly appreciated
I guess it's really depends on your configuration. The Hive metastore
is providing just the metadata/schema data for your database, not actual
data storage. Hive is running on top of Hadoop. If you configure your
Spark to run on the same Hadoop cluster using Yarn, your SQL dataframe
in Spark
Has anyone done any experiments of training an ML model using stream
data? especially for unsupervised models? Any suggestions/references
are highly appreciated...
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
On Thu, Mar 10, 2022, 12:05 PM Rafał Wojdyła
wrote:
Because I can't (and should not) know ahead of time which
jobs will be executed, that's the job of the orchestration
layer (and can be dynamic). I know I can specify multiple
will be executed, that's the job of the orchestration layer
(and can be dynamic). I know I can specify multiple packages.
Also not worried about memory.
On Thu, 10 Mar 2022 at 13:54, Artemis User
wrote:
If changing packages or jars isn'
It must be some misconfiguration in your environment. Do you perhaps
have a hardwired $SPARK_HOME env variable in your shell? An easy test
would be to place the spark-avro jar file you downloaded in the jars
directory of Spark and run spark-shell again without the packages
option. This will
the "hard-reset" workaround,
copy-pasting from the issue:
```
s: SparkSession = ...
# Hard reset:
s.stop()
s._sc._gateway.shutdown()
s._sc._gateway.proc.stdin.close()
SparkContext._gateway = None
SparkContext._jvm = None
```
Cheers - Rafal
On 2022/03/09 15:39:58 Artemis User wrote:
&
I am not sure what column/properties you are referring to. But the
event log in Spark deals with application level "events', not JVM-level
metrics. To retrieve the JVM metrics, you need to use the REST API
provided in Spark. Please see
https://spark.apache.org/docs/latest/monitoring.html for
This is indeed a JVM issue, not a Spark issue. You may want to ask
yourself why it is necessary to change the jar packages during runtime.
Changing package doesn't mean to reload the classes. There is no way to
reload the same class unless you customize the classloader of Spark. I
also don't
To be specific:
1. Check the log files on both master and worker and see if any errors.
2. If you are not running your browser on the same machine and the
Spark cluster, please use the host's external IP instead of
localhost IP when launching the worker
Hope this helps...
-- ND
On 3/9/22
We got a Spark program that iterates through a while loop on the same
input DataFrame and produces different results per iteration. I see
through Spark UI that the workload is concentrated on a single core of
the same worker. Is there anyway to distribute the workload to
different cores/worker
/22 9:37 AM, Michael Williams (SSI) wrote:
Thank you.
*From:* Artemis User [mailto:arte...@dtechspace.com]
*Sent:* Monday, February 21, 2022 8:23 AM
*To:* Michael Williams (SSI)
*Subject:* Re: Logging to determine why driver fails
Spark uses Log4j for logging. There is a log4j properties template
Spark uses log4j for logging. There is a log4j properties template file
in the conf directory. Just remove the "template" extension and change
the content of log4j.properties to meet your need. More info on log4j
can be found at logging.apache.org...
On 2/21/22 9:15 AM, Michael Williams (SS
Could someone recommend a Scala/Spark kernel for Jupyter/JupyterHub that
support the latest Spark version? Thanks!
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Please try these two corrections:
1. The --packages isn't the right command line argument for
spark-submit. Please use --conf spark.jars.packages=your-package to
specify Maven packages or define your configuration parameters in
the spark-defaults.conf file
2. Please check the version nu
There was a discussion on this issue couple of weeks ago. Basically if
you look at the CVE definition of Log4j, the vulnerability only affects
certain versions of log4j 2.x, not 1.x. Since Spark doesn't use any of
the affected log4j versions, this shouldn't be a concern..
https://lists.apach
provider API is still needed? Is there any
use cases for using the provider API instead of the dataframe
reader/writer when dealing with JDBC? Thanks!
On 1/6/22 9:09 AM, Sean Owen wrote:
There are 8 concrete implementations of it? OracleConnectionProvider, etc
On Wed, Jan 5, 2022 at 9:26 PM Artemis
Could someone provide some insight/examples on the usage of this API?
https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/jdbc/JdbcConnectionProvider.html
Why is it needed since this is an abstract class and there isn't any
concrete implementation of it? Thanks a lot in advanc
Did you install and configure the proper Spark kernel (SparkMagic) on
your Jupyter Lab or Hub? See
https://github.com/jupyter/jupyter/wiki/Jupyter-kernels for more info...
On 1/5/22 4:01 AM, 流年以东” wrote:
In the process of using pyspark,there is no spark context when opening
jupyter and inp
y?
You can compute it directly, pretty easily, in any event, either by
just writing up a few lines of code or using the .mllib model inside
the .ml model object anyway.
On Mon, Nov 29, 2021 at 2:50 PM Artemis User
wrote:
The RDD-based org.apache.spark.mllib.clustering.KMeansModel cla
The RDD-based org.apache.spark.mllib.clustering.KMeansModel class
defines a method called computeCost that is used to calculate the WCSS
error of K-Means clusters
(https://spark.apache.org/docs/latest/api/scala/org/apache/spark/mllib/clustering/KMeansModel.html).
Is there an equivalent method o
Spark is good with SQL type of structured data, not image data. Unless
you algorithms don' t require dealing with image data directly. I guess
your best option would be to go with Tensorflow since it has image
classification models built-in and can integrate with NVidia GPUs out of
the box. Th
Unfortunately the answer you got from the forum is true. The current
Spark-rapids package doesn't support RDD. Please see
https://nvidia.github.io/spark-rapids/docs/FAQ.html#what-parts-of-apache-spark-are-accelerated
I guess to be able to use spark-rapids, one option you have would be to
con
se from relying on this email's technical content is explicitly
disclaimed. The author will in no case be liable for any monetary
damages arising from such loss, damage or destruction.
On Tue, 24 Aug 2021 at 23:37, Artemis User <mailto:arte...@dtechspace.com>> wrote:
Is th
Frame API too.
No jobs can't communicate with each other.
On Tue, Aug 24, 2021 at 9:51 PM Artemis User <mailto:arte...@dtechspace.com>> wrote:
Thanks Daniel. I guess you were suggesting using DStream/RDD.
Would it be possible to use structured streaming/DataFrames f
wrote:
Yeah. Build up the streams as a collection and map that query to the
start() invocation and map those results to awaitTermination() or
whatever other blocking mechanism you’d like to use.
On Tue, Aug 24, 2021 at 4:37 PM Artemis User <mailto:arte...@dtechspace.com>> wrote:
I
Is there a way to run multiple streams in a single Spark job using
Structured Streaming? If not, is there an easy way to perform inter-job
communications (e.g. referencing a dataframe among concurrent jobs) in
Spark? Thanks a lot in advance!
-- ND
---
Looks like your problem is related to not setting up a hive.xml file
properly. The standard Spark distribution doesn't include a hive.xml
template file in the conf directory. You will have to create one by
yourself. Please refer to the Spark user doc and Hive metastore config
guide for detai
Looks like PySpark can't initiate a JVM in the backend. How did you set
up Java and Spark on your machine? Some suggestions that may help solve
your issue:
1. Use OpenJDK instead of Apple JDK since Spark was developed using
OpenJDK, not Apple's. You can use homebrew to install OpenJDK (I
Without seeing the code and the whole stack trace, just a wild guess if
you set the config param for enabling arrow
(spark.sql.execution.arrow.pyspark.enabled)? If not in your code, you
would have to set it in the spark-default.conf. Please note that the
parameter spark.sql.execution.arrow.e
Apparently you were not using the right formatting string. For
sub-second formatting, use capital S instead of lower case s. See
Spark's doc at
https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html. Hope
this helps...
-- ND
On 8/4/21 4:42 PM, Tzahi File wrote:
Hi All,
I'm us
I am not sure why you need to create an RDD first. You can create a
data frame directly from csv file, for instance:
spark.read.format("csv").option("header","true").schema(yourSchema).load(ftpUrl)
-- ND
On 8/5/21 3:14 AM, igyu wrote:
val ftpUrl ="ftp://test:test@ip:21/upload/test/_temporary/
Assuming you are running Linux, an easy option would be just to use the
Linux tail command to extract the last line (or last couple of lines) of
a file and save them to a different file/directory, before feeding it to
Spark. It shouldn't be hard to write a shell script that executes tail
on al
ell who are paid and supported by companies
towards whom you are being so unkind
Regards,
Gourav Sengupta
On Fri, Jul 30, 2021 at 4:02 PM Artemis User <mailto:arte...@dtechspace.com>> wrote:
Thanks Gourav for the info. Actually I am looking for concrete
experiences and detailed b
community, but surely Ray also has to win as well and nothing better
than to ride on the success of SPARK. But I may be wrong, and SPARK
community may still be developing those integrations.
Regards,
Gourav Sengupta
On Fri, Jul 30, 2021 at 2:46 AM Artemis User <mailto:arte...@dtechspac
Has anyone had any experience with running Spark-Rapids on a GPU-powered
cluster (https://github.com/NVIDIA/spark-rapids)? I am very interested
in knowing:
1. What is the hardware/software platform and the type of Spark cluster
you are using to run Spark-Rapids?
2. How easy was the installa
PySpark still uses Spark dataframe underneath (it wraps java code). Use
PySpark when you have to deal with big data ETL and analytics so you can
leverage the distributed architecture in Spark. If you job is simple,
dataset is relatively small, and doesn't require distributed processing,
use Pa
Can you please post the error log/exception messages? There is not
enough info to help diagnose what the real problem is
On 7/29/21 8:55 AM, Big data developer need help relat to spark gateway
roles in 2.0 wrote:
Hi Team ,
We are facing issue in production where we are getting frequent
As Mich mentioned, no need to use jdbc API, using the DataFrameWriter's
saveAsTable method is the way to go. JDBC Driver is for a JDBC client
(a Java client for instance) to access the Hive tables in Spark via the
Thrift server interface.
-- ND
On 7/19/21 2:42 AM, Badrinath Patchikolla wrot
We are trying to switch from Postgres to the Spark's built-in Hive with
Thrift server as the data sink to persist the ML result data, with the
hope that Hive would improve the ML pipeline performance. However, it
turned out that it took significantly longer for Hive to persist
dataframes (via t
Looks like you didn't set up your environment properly. I assume you
are running this from a standalone python program instead of from the
pyspark shell. I would first run your code from the pyspark shell, then
follow the spark python installation guide to set up your python
environment prope
Thanks Johnny for sharing your experience. Have you tried to use S3A
committer? Looks like this one is introduced in the latest Hadoop for
solving problems with other committers.
https://hadoop.apache.org/docs/r3.1.1/hadoop-aws/tools/hadoop-aws/committers.html
- ND
On 6/22/21 6:41 PM, Johnn
We have a feature engineering transformer defined as a custom class with
UDF as follows:
class FeatureModder extends Transformer with DefaultParamsWritable with
DefaultParamsReadable[FeatureModder] {
val uid: String = "FeatureModder"+randomUUID
final val inputCol: Param[String] = new
rewriting it to ensure that it isn't used
in the function.
On Tue, Feb 2, 2021 at 2:32 PM Artemis User <mailto:arte...@dtechspace.com>> wrote:
We tried to standardize the SQL data source management using the
Avro schema, but encountered some serialization exceptions when
We tried to standardize the SQL data source management using the Avro
schema, but encountered some serialization exceptions when trying to use
the data. The interesting part is that we didn't have any problems in
reading the Avro schema JSON file and converting the Avro schema into a
SQL Struc
We are trying to create a customized transformer for a ML pipeline and
also want to persist the trained pipeline and retrieve it for
production. To enable persistency, we will have to implement read/write
functions. However, this is not feasible in Scala since the read/write
methods are priva
First some background:
* We want to use the k-means model for anomaly detection against a
multi-dimensional dataset. The current k-means implementation in
Spark is designed for clustering purpose, not exactly for anomaly
detection. Once a model is trained and pipeline is instantiated,
Could you please clarify what do you mean by 1)? Driver is only
responsible for submitting Spark job, not performing.
-- ND
On 1/9/21 9:35 AM, András Kolbert wrote:
Hi,
I would like to get your advice on my use case.
I have a few spark streaming applications where I need to keep
updating a da
Hmm, looks like Spark 2.3+ does support stream-to-stream join. But the
online doc doesn't provide any examples. If anyone could provide some
concrete reference, I'd really appreciate. Thanks! -- ND
On 12/22/20 9:57 AM, Artemis User wrote:
Is there anyway to integrate/fuse multiple
Is there anyway to integrate/fuse multiple streaming sources into a
single stream process? In other words, the current structured streaming
API dictates a single a streaming source and sink. We'd like to have a
stream process that interfaces with multiple stream sources, perform a
join and di
Wheel is used for package management and setting up your virtual
environment , not used as a library package. To run spark-submit in a
virtual env, use the --py-files option instead. Usage:
--py-files PY_FILES Comma-separated list of .zip, .egg, or .py
files to place on the PYTHONPAT
E/lib.
Artemis User mailto:arte...@dtechspace.com>>
于2020年12月11日周五 上午5:21写道:
What happened was that you made the mysql jar file only available
to the spark driver, not the executors. Use the --jars parameter
instead of driver-class-path to specify your third-party jar
files,
What happened was that you made the mysql jar file only available to the
spark driver, not the executors. Use the --jars parameter instead of
driver-class-path to specify your third-party jar files, or copy the
third-party jar files to the jars directory for Spark in your HDFS, and
specify the
We have a Spark job that produces a result data frame, say DF-1 at the
end of the pipeline (i.e. Proc-1). From DF-1, we need to create two or
more dataf rames, say DF-2 and DF-3 via additional SQL or ML processes,
i.e. Proc-2 and Proc-3. Ideally, we would like to perform Proc-2 and
Proc-3 in
ther property which
may arise from relying on this email's technical content is explicitly
disclaimed. The author will in no case be liable for any monetary
damages arising from such loss, damage or destruction.
On Wed, 2 Dec 2020 at 23:11, Artemis User <mailto:arte...@dtechspace.com>
Apparently this is a OS dynamic lib link error. Make sure you have the
LD_LIBRARY_PATH (in Linux) or PATH (windows) set up properly for the
right .so or .dll file...
On 12/2/20 5:31 PM, Mich Talebzadeh wrote:
Hi,
I have a simple code that tries to create Hive derby database as follows:
from
of
these files
in executors can be accessed via
SparkFiles.get(fileName).
-- ND
On 11/25/20 9:51 PM, Artemis User wrote:
This is a typical file sharing problem in Spark. Just setting up HDFS
won't solve the problem unless you make your local machine as pa
This is a typical file sharing problem in Spark. Just setting up HDFS
won't solve the problem unless you make your local machine as part of
the cluster. Spark server doesn't share files with your local machine
without mounting drives to each other. The best/easiest way to share
the data betw
I guess I misread your message. The archive directory shall contain
only jar files, not tar.gz files...
On 11/14/20 10:11 AM, Artemis User wrote:
Assuming you were using hadoop for your yarn cluster. You can specify
the spark parameters spark.yarn.archive or spark.yarn.jars to contain
the
Assuming you were using hadoop for your yarn cluster. You can specify
the spark parameters spark.yarn.archive or spark.yarn.jars to contain
the jar directory or jar files so that hadoop can find them by default.
See Spark online doc for details
(http://spark.apache.org/docs/latest/running-on-
Spark distribute loads to executors and the executors are usually
pre-configured with the number of cores. You may want to check with
your Spark admin on how many executors (or slaves) your Spark cluster is
configured with and how many cores are pre-configured for executors.
The debugging too
The best option certainly would be to recompile the Spark Connector for
MS SQL server using the Spark 3.0.1/Scala 2.12 dependencies, and just
fix the compiler errors as you go. The code is open source on github
(https://github.com/microsoft/sql-spark-connector). Looks like this
connector is us
By default Spark will build with Hive 2.3.7, according to the Spark
build doc. If you want to replace it with a different hive jar, you
need to change the Maven pom.xml file.
-- ND
On 10/22/20 11:35 AM, Ravi Shankar wrote:
Hello all,
I am trying to understand how the Spark SQL integration wi
Is there anyway to access the Data Frames content directly/interactively
via some client access APIs? Some background info:
1. We have a Java client application that uses spark launcher to submit
a spark job to a spark master.
2. The default spark launcher API has only a handle API that prov
If it was running fine before and stops working now, one thing I could
think of may be your disk was full. Check your disk space and clean up
your old log files might help...
On 10/18/20 12:06 PM, rajat kumar wrote:
Hello Everyone,
My spark streaming job is running too slow, it is having bat
and let me know if it helps.
On Fri, Oct 16, 2020 at 10:37 AM Artemis User <mailto:arte...@dtechspace.com>> wrote:
Thank you all for the responses. Basically we were dealing with
file source (not Kafka, therefore no topics involved) and dumping
csv files (about 1000 lines, 3
your data in one node, and then run ML
transformations in parallel
*From: *Artemis User
*Date: *Friday, October 16, 2020 at 3:52 PM
*To: *"user@spark.apache.org"
*Subject: *RE: [EXTERNAL] How to Scale Streaming Application to
Multiple Workers
*CAUTION*: This email originated
mpler
solutions.
*From: *Artemis User
*Date: *Friday, October 16, 2020 at 2:19 PM
*Cc: *user
*Subject: *RE: [EXTERNAL] How to Scale Streaming Application to
Multiple Workers
*CAUTION*: This email originated from outside of the organization. Do
not click links or open attachments unless yo
any monetary damages arising from such loss, damage or destruction.
On Thu, 15 Oct 2020 at 20:02, Artemis User mailto:arte...@dtechspace.com>> wrote:
Thanks for the input. What I am interested is how to have
multiple
workers to read and process the small files in pa
1 - 100 of 104 matches
Mail list logo