Re: spark TLS connection to hive metastore

2024-10-04 Thread Ángel
Enable SSL debug and analyze the log (not an easy task, but better than getting stuck) https://docs.oracle.com/javase/8/docs/technotes/guides/security/jsse/ReadDebug.html Java version used? 8? 11? Could you provide all the stacktrace? El vie, 4 oct 2024, 17:34, Stefano Bovina escribió: > Hi, >

Re: Problems after upgrading to spark 1.4.0

2015-07-14 Thread Luis Ángel Vicente Sánchez
s is related or not, but somehow the executor's shutdown > hook is being called. > Can you check the driver logs to see if driver's shutdown hook is > accidentally being called? > > > On Mon, Jul 13, 2015 at 9:23 AM, Luis Ángel Vicente Sánchez < > langel.gro...@gmai

Re: Problems after upgrading to spark 1.4.0

2015-07-13 Thread Luis Ángel Vicente Sánchez
I forgot to mention that this is a long running job, actually a spark streaming job, and it's using mesos coarse mode. I'm still using the unreliable kafka receiver. 2015-07-13 17:15 GMT+01:00 Luis Ángel Vicente Sánchez < langel.gro...@gmail.com>: > I have just upgrade one of

Problems after upgrading to spark 1.4.0

2015-07-13 Thread Luis Ángel Vicente Sánchez
I have just upgrade one of my spark jobs from spark 1.2.1 to spark 1.4.0 and after deploying it to mesos, it's not working anymore. The upgrade process was quite easy: - Create a new docker container for spark 1.4.0. - Upgrade spark job to use spark 1.4.0 as a dependency and create a new fatjar.

Re: Duplicated UnusedStubClass in assembly

2015-07-13 Thread Luis Ángel Vicente Sánchez
ntermittent problem. You can just add > an exclude: > > > mergeStrategy in assembly := { > > case PathList("org", "apache", "spark", "unused", > "UnusedStubClass.class") => MergeStrategy.first > > case x => (mer

Duplicated UnusedStubClass in assembly

2015-07-13 Thread Luis Ángel Vicente Sánchez
I have just upgraded to spark 1.4.0 and it seems that spark-streaming-kafka has a dependency on org.spark-project.spark unused 1.0.0 but it also embeds that jar in its artifact, causing a problem while creating a fatjar. This is the error: [Step 1/1] (*:assembly) deduplicate: different file conte

Re: Streaming problems running 24x7

2015-04-20 Thread Luis Ángel Vicente Sánchez
You have a window operation; I have seen that behaviour before with window operations in spark streaming. My solution was to move away from window operations using probabilistic data structures; it might not be an option for you. 2015-04-20 10:29 GMT+01:00 Marius Soutier : > The processing speed

foreachRDD execution

2015-03-25 Thread Luis Ángel Vicente Sánchez
I have a simple and probably dumb question about foreachRDD. We are using spark streaming + cassandra to compute concurrent users every 5min. Our batch size is 10secs and our block interval is 2.5secs. At the end of the world we are using foreachRDD to join the data in the RDD with existing data

Re: NullPointer when access rdd.sparkContext (Spark 1.1.1)

2015-01-21 Thread Luis Ángel Vicente Sánchez
inTime, maxTime, rdd) } everything works as expected. 2015-01-21 14:18 GMT+00:00 Sean Owen : > It looks like you are trying to use the RDD in a distributed operation, > which won't work. The context will be null. > On Jan 21, 2015 1:50 PM, "Luis Ángel Vicente Sánchez" <

Re: NullPointer when access rdd.sparkContext (Spark 1.1.1)

2015-01-21 Thread Luis Ángel Vicente Sánchez
that. 2015-01-21 12:35 GMT+00:00 Luis Ángel Vicente Sánchez < langel.gro...@gmail.com>: > The following functions, > > def sink(data: DStream[((GameID, Category), ((TimeSeriesKey, Platform), > HLL))]): Unit = { > data.foreachRDD { rdd => > rdd.cache() >

NullPointer when access rdd.sparkContext (Spark 1.1.1)

2015-01-21 Thread Luis Ángel Vicente Sánchez
The following functions, def sink(data: DStream[((GameID, Category), ((TimeSeriesKey, Platform), HLL))]): Unit = { data.foreachRDD { rdd => rdd.cache() val (minTime, maxTime): (Long, Long) = rdd.map { case (_, ((TimeSeriesKey(_, time), _), _)) => (time, time)

Re: Kafka Receiver not recreated after executor died

2014-12-16 Thread Luis Ángel Vicente Sánchez
It seems to be slightly related to this: https://issues.apache.org/jira/browse/SPARK-1340 But in this case, it's not the Task that is failing but the entire executor where the Kafka Receiver resides. 2014-12-16 16:53 GMT+00:00 Luis Ángel Vicente Sánchez < langel.gro...@gmail.com>

Kafka Receiver not recreated after executor died

2014-12-16 Thread Luis Ángel Vicente Sánchez
Dear spark community, We were testing a spark failure scenario where the executor that is running a Kafka Receiver dies. We are running our streaming jobs on top of mesos and we killed the mesos slave that was running the executor ; a new executor was created on another mesos-slave but according

Re: Low Level Kafka Consumer for Spark

2014-12-03 Thread Luis Ángel Vicente Sánchez
My main complain about the WAL mechanism in the new reliable kafka receiver is that you have to enable checkpointing and for some reason, even if spark.cleaner.ttl is set to a reasonable value, only the metadata is cleaned periodically. In my tests, using a folder in my filesystem as the checkpoint

Re: RDD data checkpoint cleaning

2014-11-28 Thread Luis Ángel Vicente Sánchez
rds, Luis 2014-11-21 15:17 GMT+00:00 Luis Ángel Vicente Sánchez < langel.gro...@gmail.com>: > I have seen the same behaviour while testing the latest spark 1.2.0 > snapshot. > > I'm trying the ReliableKafkaReceiver and it works quite well but the > checkpoints fold

Re: Spark 1.1.1 released but not available on maven repositories

2014-11-28 Thread Luis Ángel Vicente Sánchez
Are there any news about this issue? I have checked again maven central and the artefacts are still not there. Regards, Luis 2014-11-27 10:42 GMT+00:00 Luis Ángel Vicente Sánchez < langel.gro...@gmail.com>: > I have just read on the website that spark 1.1.1 has been released but

Spark 1.1.1 released but not available on maven repositories

2014-11-27 Thread Luis Ángel Vicente Sánchez
I have just read on the website that spark 1.1.1 has been released but when I upgraded my project to use 1.1.1 I discovered that the artefacts are not on maven yet. [info] Resolving org.apache.spark#spark-streaming-kafka_2.10;1.1.1 ... > > [warn] module not found: org.apache.spark#spark-streaming-

Re: RDD data checkpoint cleaning

2014-11-21 Thread Luis Ángel Vicente Sánchez
I have seen the same behaviour while testing the latest spark 1.2.0 snapshot. I'm trying the ReliableKafkaReceiver and it works quite well but the checkpoints folder is always increasing in size. The receivedMetaData folder remains almost constant in size but the receivedData folder is always incr

Windows+Pyspark+YARN

2014-11-20 Thread Ángel Álvarez Pascua
https://issues.apache.org/jira/browse/SPARK-1825 I've had the following problems to make Windows+Pyspark+YARN work properly: 1. net.ScriptBasedMapping: Exception running /etc/hadoop/conf.cloudera.yarn/topology.py FIX? Comment the "net.topology.script.file.name" property configuration in the fil

Re: Spark Streaming: CoarseGrainedExecutorBackend: Slave registration failed: Duplicate executor ID

2014-09-16 Thread Luis Ángel Vicente Sánchez
2. send a message to each actor with configuration details to create a SparkContext/StreamingContext. 3. send a message to each actor to start the job and streaming context. 2014-09-16 13:29 GMT+01:00 Luis Ángel Vicente Sánchez < langel.gro...@gmail.com>: > When I said scheduler I mea

Re: Spark Streaming: CoarseGrainedExecutorBackend: Slave registration failed: Duplicate executor ID

2014-09-16 Thread Luis Ángel Vicente Sánchez
When I said scheduler I meant executor backend. 2014-09-16 13:26 GMT+01:00 Luis Ángel Vicente Sánchez < langel.gro...@gmail.com>: > It seems that, as I have a single scala application, the scheduler is the > same and there is a collision between executors of both spark context. Is &

Re: Spark Streaming: CoarseGrainedExecutorBackend: Slave registration failed: Duplicate executor ID

2014-09-16 Thread Luis Ángel Vicente Sánchez
It seems that, as I have a single scala application, the scheduler is the same and there is a collision between executors of both spark context. Is there a way to change how the executor ID is generated (maybe an uuid instead of a sequential number..?) 2014-09-16 13:07 GMT+01:00 Luis Ángel

Spark Streaming: CoarseGrainedExecutorBackend: Slave registration failed: Duplicate executor ID

2014-09-16 Thread Luis Ángel Vicente Sánchez
I have a standalone spark cluster and from within the same scala application I'm creating 2 different spark context to run two different spark streaming jobs as SparkConf is different for each of them. I'm getting this error that... I don't really understand: 14/09/16 11:51:35 ERROR OneForOneStra

Re: spark.cleaner.ttl and spark.streaming.unpersist

2014-09-10 Thread Luis Ángel Vicente Sánchez
control the input data rate to not inject > so fast, you can try “spark.straming.receiver.maxRate” to control the > inject rate. > > > > Thanks > > Jerry > > > > *From:* Luis Ángel Vicente Sánchez [mailto:langel.gro...@gmail.com] > *Sent:* Wednesday, S

spark.cleaner.ttl and spark.streaming.unpersist

2014-09-09 Thread Luis Ángel Vicente Sánchez
The executors of my spark streaming application are being killed due to memory issues. The memory consumption is quite high on startup because is the first run and there are quite a few events on the kafka queues that are consumed at a rate of 100K events per sec. I wonder if it's recommended to u

Re: Spark streaming: size of DStream

2014-09-09 Thread Luis Ángel Vicente Sánchez
If you take into account what streaming means in spark, your goal doesn't really make sense; you have to assume that your streams are infinite and you will have to process them till the end of the days. Operations on a DStream define what you want to do with each element of each RDD, but spark stre

Re: using multiple dstreams together (spark streaming)

2014-07-16 Thread Luis Ángel Vicente Sánchez
tion and it didn't work for that reason. > Surely there has to be a way to do this, as I imagine this is a commonly > desired goal in streaming applications? > > > On Wed, Jul 16, 2014 at 10:10 AM, Luis Ángel Vicente Sánchez < > langel.gro...@gmail.com> wrote: > >

Re: using multiple dstreams together (spark streaming)

2014-07-16 Thread Luis Ángel Vicente Sánchez
I'm joining several kafka dstreams using the join operation but you have the limitation that the duration of the batch has to be same,i.e. 1 second window for all dstreams... so it would not work for you. 2014-07-16 18:08 GMT+01:00 Walrus theCat : > Hi, > > My application has multiple dstreams o

Re: Cassandra driver Spark question

2014-07-09 Thread Luis Ángel Vicente Sánchez
Yes, I'm using it to count concurrent users from a kafka stream of events without problems. I'm currently testing it using the local mode but any serialization problem would have already appeared so I don't expect any serialization issue when I deployed to my cluster. 2014-07-09 15:39 GMT+01:00 R

Re: Cassandra driver Spark question

2014-07-09 Thread Luis Ángel Vicente Sánchez
Is MyType serializable? Everything inside the foreachRDD closure has to be serializable. 2014-07-09 14:24 GMT+01:00 RodrigoB : > Hi all, > > I am currently trying to save to Cassandra after some Spark Streaming > computation. > > I call a myDStream.foreachRDD so that I can collect each RDD in th

Possible bug in Spark Streaming :: TextFileStream

2014-07-07 Thread Luis Ángel Vicente Sánchez
I have a basic spark streaming job that is watching a folder, processing any new file and updating a column family in cassandra using the new cassandra-spark-driver. I think there is a problem with SparkStreamingContext.textFileStream... if I start my job in local mode with no files in the folder

Re: spark streaming rate limiting from kafka

2014-07-01 Thread Luis Ángel Vicente Sánchez
Maybe reducing the batch duration would help :\ 2014-07-01 17:57 GMT+01:00 Chen Song : > In my use case, if I need to stop spark streaming for a while, data would > accumulate a lot on kafka topic-partitions. After I restart spark streaming > job, the worker's heap will go out of memory on the f

Re: spark streaming, kafka, SPARK_CLASSPATH

2014-06-17 Thread Luis Ángel Vicente Sánchez
.setJars(Seq("/tmp/spark-test-0.1-SNAPSHOT.jar")) Now I'm getting this error on my worker: 4/06/17 17:03:40 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory 2014-06-

Re: Worker dies while submitting a job

2014-06-17 Thread Luis Ángel Vicente Sánchez
.setJars(Seq("/tmp/spark-test-0.1-SNAPSHOT.jar")) Now I'm getting this error on my worker: 4/06/17 17:03:40 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory 2014-06-

Re: Worker dies while submitting a job

2014-06-17 Thread Luis Ángel Vicente Sánchez
Now I need to find why that state changed message is sent... I will continue updating this thread until I found the problem :D 2014-06-16 18:25 GMT+01:00 Luis Ángel Vicente Sánchez < langel.gro...@gmail.com>: > I'm playing with a modified version of the TwitterPopularTags example and

Re: spark streaming, kafka, SPARK_CLASSPATH

2014-06-17 Thread Luis Ángel Vicente Sánchez
*Registered in England & Wales, 07916412. VAT No. 130595328* > > > This email and any files transmitted with it are confidential and may also > be privileged. It is intended only for the person to whom it is addressed. > If you have received this email in error, please inform t

Re: spark streaming, kafka, SPARK_CLASSPATH

2014-06-16 Thread Luis Ángel Vicente Sánchez
Did you manage to make it work? I'm facing similar problems and this a serious blocker issue. spark-submit seems kind of broken to me if you can use it for spark-streaming. Regards, Luis 2014-06-11 1:48 GMT+01:00 lannyripple : > I am using Spark 1.0.0 compiled with Hadoop 1.2.1. > > I have a t

Worker dies while submitting a job

2014-06-16 Thread Luis Ángel Vicente Sánchez
I'm playing with a modified version of the TwitterPopularTags example and when I tried to submit the job to my cluster, workers keep dying with this message: 14/06/16 17:11:16 INFO DriverRunner: Launch Command: "java" "-cp" "/opt/spark-1.0.0-bin-hadoop1/work/driver-20140616171115-0014/spark-test-0

Re: Anyone using value classes in RDDs?

2014-04-20 Thread Luis Ángel Vicente Sánchez
Type alias aren't safe as you could use any string as a name or id. On 20 Apr 2014 14:18, "Surendranauth Hiraman" wrote: > If the purpose is only aliasing, rather than adding additional methods and > avoiding runtime allocation, what about type aliases? > > type ID = String > type Name = String >