Enable SSL debug and analyze the log (not an easy task, but better than
getting stuck)
https://docs.oracle.com/javase/8/docs/technotes/guides/security/jsse/ReadDebug.html
Java version used? 8? 11? Could you provide all the stacktrace?
El vie, 4 oct 2024, 17:34, Stefano Bovina escribió:
> Hi,
>
s is related or not, but somehow the executor's shutdown
> hook is being called.
> Can you check the driver logs to see if driver's shutdown hook is
> accidentally being called?
>
>
> On Mon, Jul 13, 2015 at 9:23 AM, Luis Ángel Vicente Sánchez <
> langel.gro...@gmai
I forgot to mention that this is a long running job, actually a spark
streaming job, and it's using mesos coarse mode. I'm still using the
unreliable kafka receiver.
2015-07-13 17:15 GMT+01:00 Luis Ángel Vicente Sánchez <
langel.gro...@gmail.com>:
> I have just upgrade one of
I have just upgrade one of my spark jobs from spark 1.2.1 to spark 1.4.0
and after deploying it to mesos, it's not working anymore.
The upgrade process was quite easy:
- Create a new docker container for spark 1.4.0.
- Upgrade spark job to use spark 1.4.0 as a dependency and create a new
fatjar.
ntermittent problem. You can just add
> an exclude:
>
>
> mergeStrategy in assembly := {
>
> case PathList("org", "apache", "spark", "unused",
> "UnusedStubClass.class") => MergeStrategy.first
>
> case x => (mer
I have just upgraded to spark 1.4.0 and it seems that spark-streaming-kafka
has a dependency on org.spark-project.spark unused 1.0.0 but it also embeds
that jar in its artifact, causing a problem while creating a fatjar.
This is the error:
[Step 1/1] (*:assembly) deduplicate: different file conte
You have a window operation; I have seen that behaviour before with window
operations in spark streaming. My solution was to move away from window
operations using probabilistic data structures; it might not be an option
for you.
2015-04-20 10:29 GMT+01:00 Marius Soutier :
> The processing speed
I have a simple and probably dumb question about foreachRDD.
We are using spark streaming + cassandra to compute concurrent users every
5min. Our batch size is 10secs and our block interval is 2.5secs.
At the end of the world we are using foreachRDD to join the data in the RDD
with existing data
inTime,
maxTime, rdd)
}
everything works as expected.
2015-01-21 14:18 GMT+00:00 Sean Owen :
> It looks like you are trying to use the RDD in a distributed operation,
> which won't work. The context will be null.
> On Jan 21, 2015 1:50 PM, "Luis Ángel Vicente Sánchez" <
that.
2015-01-21 12:35 GMT+00:00 Luis Ángel Vicente Sánchez <
langel.gro...@gmail.com>:
> The following functions,
>
> def sink(data: DStream[((GameID, Category), ((TimeSeriesKey, Platform),
> HLL))]): Unit = {
> data.foreachRDD { rdd =>
> rdd.cache()
>
The following functions,
def sink(data: DStream[((GameID, Category), ((TimeSeriesKey, Platform),
HLL))]): Unit = {
data.foreachRDD { rdd =>
rdd.cache()
val (minTime, maxTime): (Long, Long) =
rdd.map {
case (_, ((TimeSeriesKey(_, time), _), _)) => (time, time)
It seems to be slightly related to this:
https://issues.apache.org/jira/browse/SPARK-1340
But in this case, it's not the Task that is failing but the entire executor
where the Kafka Receiver resides.
2014-12-16 16:53 GMT+00:00 Luis Ángel Vicente Sánchez <
langel.gro...@gmail.com>
Dear spark community,
We were testing a spark failure scenario where the executor that is running
a Kafka Receiver dies.
We are running our streaming jobs on top of mesos and we killed the mesos
slave that was running the executor ; a new executor was created on another
mesos-slave but according
My main complain about the WAL mechanism in the new reliable kafka receiver
is that you have to enable checkpointing and for some reason, even if
spark.cleaner.ttl is set to a reasonable value, only the metadata is
cleaned periodically. In my tests, using a folder in my filesystem as the
checkpoint
rds,
Luis
2014-11-21 15:17 GMT+00:00 Luis Ángel Vicente Sánchez <
langel.gro...@gmail.com>:
> I have seen the same behaviour while testing the latest spark 1.2.0
> snapshot.
>
> I'm trying the ReliableKafkaReceiver and it works quite well but the
> checkpoints fold
Are there any news about this issue? I have checked again maven central and
the artefacts are still not there.
Regards,
Luis
2014-11-27 10:42 GMT+00:00 Luis Ángel Vicente Sánchez <
langel.gro...@gmail.com>:
> I have just read on the website that spark 1.1.1 has been released but
I have just read on the website that spark 1.1.1 has been released but when
I upgraded my project to use 1.1.1 I discovered that the artefacts are not
on maven yet.
[info] Resolving org.apache.spark#spark-streaming-kafka_2.10;1.1.1 ...
>
> [warn] module not found: org.apache.spark#spark-streaming-
I have seen the same behaviour while testing the latest spark 1.2.0
snapshot.
I'm trying the ReliableKafkaReceiver and it works quite well but the
checkpoints folder is always increasing in size. The receivedMetaData
folder remains almost constant in size but the receivedData folder is
always incr
https://issues.apache.org/jira/browse/SPARK-1825
I've had the following problems to make Windows+Pyspark+YARN work properly:
1. net.ScriptBasedMapping: Exception running
/etc/hadoop/conf.cloudera.yarn/topology.py
FIX? Comment the "net.topology.script.file.name" property configuration in
the fil
2. send a message to each actor with configuration details to create a
SparkContext/StreamingContext.
3. send a message to each actor to start the job and streaming context.
2014-09-16 13:29 GMT+01:00 Luis Ángel Vicente Sánchez <
langel.gro...@gmail.com>:
> When I said scheduler I mea
When I said scheduler I meant executor backend.
2014-09-16 13:26 GMT+01:00 Luis Ángel Vicente Sánchez <
langel.gro...@gmail.com>:
> It seems that, as I have a single scala application, the scheduler is the
> same and there is a collision between executors of both spark context. Is
&
It seems that, as I have a single scala application, the scheduler is the
same and there is a collision between executors of both spark context. Is
there a way to change how the executor ID is generated (maybe an uuid
instead of a sequential number..?)
2014-09-16 13:07 GMT+01:00 Luis Ángel
I have a standalone spark cluster and from within the same scala
application I'm creating 2 different spark context to run two different
spark streaming jobs as SparkConf is different for each of them.
I'm getting this error that... I don't really understand:
14/09/16 11:51:35 ERROR OneForOneStra
control the input data rate to not inject
> so fast, you can try “spark.straming.receiver.maxRate” to control the
> inject rate.
>
>
>
> Thanks
>
> Jerry
>
>
>
> *From:* Luis Ángel Vicente Sánchez [mailto:langel.gro...@gmail.com]
> *Sent:* Wednesday, S
The executors of my spark streaming application are being killed due to
memory issues. The memory consumption is quite high on startup because is
the first run and there are quite a few events on the kafka queues that are
consumed at a rate of 100K events per sec.
I wonder if it's recommended to u
If you take into account what streaming means in spark, your goal doesn't
really make sense; you have to assume that your streams are infinite and
you will have to process them till the end of the days. Operations on a
DStream define what you want to do with each element of each RDD, but spark
stre
tion and it didn't work for that reason.
> Surely there has to be a way to do this, as I imagine this is a commonly
> desired goal in streaming applications?
>
>
> On Wed, Jul 16, 2014 at 10:10 AM, Luis Ángel Vicente Sánchez <
> langel.gro...@gmail.com> wrote:
>
>
I'm joining several kafka dstreams using the join operation but you have
the limitation that the duration of the batch has to be same,i.e. 1 second
window for all dstreams... so it would not work for you.
2014-07-16 18:08 GMT+01:00 Walrus theCat :
> Hi,
>
> My application has multiple dstreams o
Yes, I'm using it to count concurrent users from a kafka stream of events
without problems. I'm currently testing it using the local mode but any
serialization problem would have already appeared so I don't expect any
serialization issue when I deployed to my cluster.
2014-07-09 15:39 GMT+01:00 R
Is MyType serializable? Everything inside the foreachRDD closure has to be
serializable.
2014-07-09 14:24 GMT+01:00 RodrigoB :
> Hi all,
>
> I am currently trying to save to Cassandra after some Spark Streaming
> computation.
>
> I call a myDStream.foreachRDD so that I can collect each RDD in th
I have a basic spark streaming job that is watching a folder, processing
any new file and updating a column family in cassandra using the new
cassandra-spark-driver.
I think there is a problem with SparkStreamingContext.textFileStream... if
I start my job in local mode with no files in the folder
Maybe reducing the batch duration would help :\
2014-07-01 17:57 GMT+01:00 Chen Song :
> In my use case, if I need to stop spark streaming for a while, data would
> accumulate a lot on kafka topic-partitions. After I restart spark streaming
> job, the worker's heap will go out of memory on the f
.setJars(Seq("/tmp/spark-test-0.1-SNAPSHOT.jar"))
Now I'm getting this error on my worker:
4/06/17 17:03:40 WARN TaskSchedulerImpl: Initial job has not accepted any
resources; check your cluster UI to ensure that workers are registered and
have sufficient memory
2014-06-
.setJars(Seq("/tmp/spark-test-0.1-SNAPSHOT.jar"))
Now I'm getting this error on my worker:
4/06/17 17:03:40 WARN TaskSchedulerImpl: Initial job has not accepted any
resources; check your cluster UI to ensure that workers are registered and
have sufficient memory
2014-06-
Now I need to find why that state changed message is sent... I will
continue updating this thread until I found the problem :D
2014-06-16 18:25 GMT+01:00 Luis Ángel Vicente Sánchez <
langel.gro...@gmail.com>:
> I'm playing with a modified version of the TwitterPopularTags example and
*Registered in England & Wales, 07916412. VAT No. 130595328*
>
>
> This email and any files transmitted with it are confidential and may also
> be privileged. It is intended only for the person to whom it is addressed.
> If you have received this email in error, please inform t
Did you manage to make it work? I'm facing similar problems and this a
serious blocker issue. spark-submit seems kind of broken to me if you can
use it for spark-streaming.
Regards,
Luis
2014-06-11 1:48 GMT+01:00 lannyripple :
> I am using Spark 1.0.0 compiled with Hadoop 1.2.1.
>
> I have a t
I'm playing with a modified version of the TwitterPopularTags example and
when I tried to submit the job to my cluster, workers keep dying with this
message:
14/06/16 17:11:16 INFO DriverRunner: Launch Command: "java" "-cp"
"/opt/spark-1.0.0-bin-hadoop1/work/driver-20140616171115-0014/spark-test-0
Type alias aren't safe as you could use any string as a name or id.
On 20 Apr 2014 14:18, "Surendranauth Hiraman"
wrote:
> If the purpose is only aliasing, rather than adding additional methods and
> avoiding runtime allocation, what about type aliases?
>
> type ID = String
> type Name = String
>
39 matches
Mail list logo