Re: Spark streaming alerting

2015-03-24 Thread Helena Edelson
; >> And the alert() function could be anything triggering an email or sending an >> SMS alert. >> >> Thanks >> Best Regards >> >> On Sun, Mar 22, 2015 at 1:52 AM, Mohit Anchlia > <mailto:mohitanch...@gmail.com>> wrote: >> Is there a

Re: Spark streaming alerting

2015-03-24 Thread Anwar Rizal
;) > val dist = data.filter(_.contains("ERROR")).foreachRDD(rdd => > alert("Errors :" + rdd.count())) > > And the alert() function could be anything triggering an email or sending > an SMS alert. > > Thanks > Best Regards > > On Sun, Mar 22,

Re: Spark streaming alerting

2015-03-24 Thread Helena Edelson
which you could do: >>> >>> val data = ssc.textFileStream("sigmoid/") >>> val dist = data.filter(_.contains("ERROR")).foreachRDD(rdd => >>> alert("Errors :" + rdd.count())) >>> >>> And the alert() function co

spark streaming driver hang

2015-03-27 Thread Chen Song
I ran a spark streaming job. 100 executors 30G heap per executor 4 cores per executor The version I used is 1.3.0-cdh5.1.0. The job is reading from a directory on HDFS (with files incoming continuously) and does some join on the data. I set batch interval to be 15 minutes and the job worked

Spark Streaming program questions

2015-04-04 Thread nickos168
I have two questions: 1) In a Spark Streaming program, after the various DStream transformations have being setup, the ssc.start() method is called to start the computation. Can the underlying DAG change (ie. add another map or maybe a join) after ssc.start() has been called (and maybe

Re: Pseudo Spark Streaming ?

2015-04-05 Thread Jörn Franke
Hallo, Only because you receive the log files hourly it means that you have to use Spark Streaming. Spark streaming is often used if you receive new events each minute /second potentially at an irregular frequency. Of course your analysis window can be larger. I think your use case justifies

Spark Streaming and SQL

2015-04-08 Thread Vadim Bichutskiy
Hi all, I am using Spark Streaming to monitor an S3 bucket for objects that contain JSON. I want to import that JSON into Spark SQL DataFrame. Here's my current code: *from pyspark import SparkContext, SparkConf* *from pyspark.streaming import StreamingContext* *import json* *from pyspar

spark streaming with kafka

2015-04-15 Thread Shushant Arora
Hi I want to understand the flow of spark streaming with kafka. In spark Streaming is the executor nodes at each run of streaming interval same or At each stream interval cluster manager assigns new executor nodes for processing this batch input. If yes then at each batch interval new executors

Join two Spark Streaming

2014-07-08 Thread Bill Jay
Hi all, I am working on a pipeline that needs to join two Spark streams. The input is a stream of integers. And the output is the number of integer's appearance divided by the total number of unique integers. Suppose the input is: 1 2 3 1 2 2 There are 3 unique integers and 1 appears twice. Ther

Spark-streaming-kafka error

2014-07-08 Thread Bill Jay
Hi all, I used sbt to package a code that uses spark-streaming-kafka. The packaging succeeded. However, when I submitted to yarn, the job ran for 10 seconds and there was an error in the log file as follows: Caused by: java.lang.NoClassDefFoundError: org/apache/spark/streaming/kafka/KafkaUtils

Spark Streaming and Storm

2014-07-08 Thread xichen_tju@126
hi all I am a newbie to Spark Streaming, and used Strom before.Have u test the performance both of them and which one is better? xichen_tju@126

Spark Streaming timing considerations

2014-07-11 Thread Laeeq Ahmed
Hi, In the spark streaming paper, "slack time" has been suggested for delaying the batch creation in case of external timestamps. I don't see any such option in streamingcontext. Is it available in the API? Also going through the previous posts, queueStream has been suggest

Spark Streaming, external windowing?

2014-07-16 Thread Sargun Dhillon
Does anyone here have a way to do Spark Streaming with external timing for windows? Right now, it relies on the wall clock of the driver to determine the amount of time that each batch read lasts. We have a Kafka, and HDFS ingress into our Spark Streaming pipeline where the events are annotated

Re: Spark Streaming timestamps

2014-07-16 Thread Tathagata Das
Answers inline. On Wed, Jul 16, 2014 at 5:39 PM, Bill Jay wrote: > Hi all, > > I am currently using Spark Streaming to conduct a real-time data > analytics. We receive data from Kafka. We want to generate output files > that contain results that are based on the data we receive

Re: Spark Streaming timestamps

2014-07-17 Thread Bill Jay
Hi Tathagata, Thanks for your answer. Please see my further question below: On Wed, Jul 16, 2014 at 6:57 PM, Tathagata Das wrote: > Answers inline. > > > On Wed, Jul 16, 2014 at 5:39 PM, Bill Jay > wrote: > >> Hi all, >> >> I am currently using Spark S

Re: Spark Streaming timestamps

2014-07-17 Thread Tathagata Das
PM, Tathagata Das < > tathagata.das1...@gmail.com> wrote: > >> Answers inline. >> >> >> On Wed, Jul 16, 2014 at 5:39 PM, Bill Jay >> wrote: >> >>> Hi all, >>> >>> I am currently using Spark Streaming to conduct a real-t

Re: Spark Streaming timestamps

2014-07-18 Thread Bill Jay
t; tathagata.das1...@gmail.com> wrote: >> >>> Answers inline. >>> >>> >>> On Wed, Jul 16, 2014 at 5:39 PM, Bill Jay >>> wrote: >>> >>>> Hi all, >>>> >>>> I am currently using Spark Streaming to conduct

Get Spark Streaming timestamp

2014-07-23 Thread Bill Jay
Hi all, I have a question regarding Spark streaming. When we use the saveAsTextFiles function and my batch is 60 seconds, Spark will generate a series of files such as: result-140614896, result-140614802, result-140614808, etc. I think this is the timestamp for the beginning of each

Re: Spark Streaming timestamps

2014-07-29 Thread Laeeq Ahmed
ta, >> >> >> >>Thanks for your answer. Please see my further question below:  >> >> >> >> >>On Wed, Jul 16, 2014 at 6:57 PM, Tathagata Das >>wrote: >> >>Answers inline. >>> >>> >>> >>> >&

Re: spark streaming kafka

2014-08-04 Thread Tathagata Das
WHERE state='Active'") > }) > > result.print() > //eventData.foreachRDD(rdd => registerRDDAsTable(rdd, "data")) > ssc.start() > ssc.awaitTermination() > > > > > > -- > View th

Spark Streaming Workflow Validation

2014-08-07 Thread Dan H.
groupByKey() and I get: <, Value1_summed by Keys> Are there more efficient approaches I should be considering, such as method.chaining or another technique to increase work flow efficiency? Thanks for your feedback in advance. DH -- View this message in context: http://apache-spark-user-lis

Spark Streaming worker underutilized?

2014-08-08 Thread maddenpj
as I start loading data into Kafka but the 3rd node remains free at 3G. Have I misconfigured something? I followed the standalone setup and see 3 workers registered with all cores being reported as used. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark

spark streaming - lamda architecture

2014-08-14 Thread salemi
Hi, How would you implement the batch layer of lamda architecture with spark/spark streaming? Thanks, Ali -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/spark-streaming-lamda-architecture-tp12142.html Sent from the Apache Spark User List mailing list

Spark Streaming Data Sharing

2014-08-18 Thread Levi Bowman
Based on my understanding something like this doesn't seem to be possible out of the box, but I thought I would write it up anyway in case someone has any ideas. We have conceptually one high volume input stream, each streaming job is either interested in a subset of the stream or the entire st

Spark Streaming: DStream - zipWithIndex

2014-08-27 Thread Soumitra Kumar
Hello, If I do: DStream transform { rdd.zipWithIndex.map { Is the index guaranteed to be unique across all RDDs here? } } Thanks, -Soumitra.

Spark Streaming reset state

2014-08-29 Thread Eko Susilo
Hi all, I would like to ask some advice about resetting spark stateful operation. so i tried like this: JavaStreamingContext jssc = new JavaStreamingContext(context, new Duration(5000)); jssc.remember(Duration(5*60*1000)); jssc.checkpoint(ApplicationConstants.HDFS_STREAM_DIRECTORIES); JavaPairRec

Spark Streaming into HBase

2014-09-03 Thread kpeng1
I have been trying to understand how spark streaming and hbase connect, but have not been successful. What I am trying to do is given a spark stream, process that stream and store the results in an hbase table. So far this is what I have: import org.apache.spark.SparkConf import

Stable spark streaming app

2014-09-12 Thread Tim Smith
Hi, Anyone have a stable streaming app running in "production"? Can you share some overview of the app and setup like number of nodes, events per second, broad stream processing workflow, config highlights etc? Thanks, Tim - To

Spark Streaming and ReactiveMongo

2014-09-18 Thread t1ny
Hello all,Spark newbie here.We are trying to use Spark Streaming (unfortunately stuck on version 0.9.1 of Spark) to stream data out of MongoDB.ReactiveMongo (http://reactivemongo.org/) is a scala driver that enables you to stream a MongoDB capped collection (in our case, the Oplog).Given that

Spark streaming twitter exception

2014-09-20 Thread Maisnam Ns
, "org.apache.spark" %% "spark-streaming" % "1.0.1" , "org.apache.spark" %% "spark-streaming-twitter" % "1.0.1" ) The exception , I am getting -> Exception in thread "main" java.lang.NoClassDefFoundError: org/apach

Re: Spark Streaming + Actors

2014-09-26 Thread Madabhattula Rajesh Kumar
Hi Team, Could you please respond on my below request. Regards, Rajesh On Thu, Sep 25, 2014 at 11:38 PM, Madabhattula Rajesh Kumar < mrajaf...@gmail.com> wrote: > Hi Team, > > Can I use Actors in Spark Streaming based on events type? Could you please > review below Test

Re: android + spark streaming?

2014-10-03 Thread ll
any comment/feedback/advice on this is much appreciated! thanks. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/android-spark-streaming-tp15661p15735.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: android + spark streaming?

2014-10-06 Thread Akhil Das
comment/feedback/advice on this is much appreciated! thanks. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/android-spark-streaming-tp15661p15735.html > Sent from the Apache Spark User List maili

Re: Spark Streaming saveAsNewAPIHadoopFiles

2014-10-06 Thread Sean Owen
Here's an example: https://github.com/OryxProject/oryx/blob/master/oryx-lambda/src/main/java/com/cloudera/oryx/lambda/BatchLayer.java#L131 On Mon, Oct 6, 2014 at 7:39 PM, Abraham Jacob wrote: > Hi All, > > Would really appreciate from the community if anyone has implemented the > saveAsNewAPIHad

Re: Spark Streaming saveAsNewAPIHadoopFiles

2014-10-06 Thread Abraham Jacob
tputFormat.class; writableDStream.saveAsNewAPIHadoopFiles(dataDirString + "/oryx", "data", keyWritableClass, messageWritableClass, outputFormatClass, streamingContext.sparkContext().hadoopConfiguration()); I was just having a hard time with the OutputFormatClass parameter. The scal

Re: Spark Streaming saveAsNewAPIHadoopFiles

2014-10-07 Thread Abraham Jacob
Hi All, Continuing on this discussion... Is there a good reason why the def of "saveAsNewAPIHadoopFiles" in org/apache/spark/streaming/api/java/JavaPairDStream.scala is defined like this - def saveAsNewAPIHadoopFiles( prefix: String, suffix: String, keyCl

Spark Streaming Fault Tolerance (?)

2014-10-07 Thread Massimiliano Tomassi
Reading the Spark Streaming Programming Guide I found a couple of interesting points. First of all, while talking about receivers, it says: *"If the number of cores allocated to the application is less than or equal to the number of input DStreams / receivers, then the system will receive

Spark Streaming KafkaUtils Issue

2014-10-10 Thread Abraham Jacob
Hi Folks, I am seeing some strange behavior when using the Spark Kafka connector in Spark streaming. I have a Kafka topic which has 8 partitions. I have a kafka producer that pumps some messages into this topic. On the consumer side I have a spark streaming application that that has 8 executors

Spark Streaming scheduling control

2014-10-19 Thread davidkl
ssage in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-scheduling-control-tp16778.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-

Re: Spark Streaming Applications

2014-10-22 Thread Sameer Farooqui
Hi Saiph, Patrick McFadin and Helena Edelson from DataStax taught a tutorial at NYC Strata last week where they created a prototype Spark Streaming + Kafka application for time series data. You can see the code here: https://github.com/killrweather/killrweather On Tue, Oct 21, 2014 at 4:33 PM

Re: Spark Streaming Applications

2014-10-23 Thread Saiph Kappa
ooqui wrote: > Hi Saiph, > > Patrick McFadin and Helena Edelson from DataStax taught a tutorial at NYC > Strata last week where they created a prototype Spark Streaming + Kafka > application for time series data. > > You can see the code here: > https://github.com/killrweat

Re: Spark Streaming Applications

2014-10-23 Thread Tathagata Das
Do you know if the slides of that tutorial are > available somewhere? > > Thanks! > > On Wed, Oct 22, 2014 at 6:58 PM, Sameer Farooqui > wrote: > >> Hi Saiph, >> >> Patrick McFadin and Helena Edelson from DataStax taught a tutorial at NYC >> Strata

Re: Spark Streaming Applications

2014-10-27 Thread Akhil
You can check this project out <https://github.com/sigmoidanalytics/spork-streaming/> (it is a bit outdated, but works) It is basically the integration of Pig on SparkStreaming. You can write pig scripts and they are underneath executed as spark streaming job. To get you started quickly,

Re: Spark Streaming Applications

2014-10-28 Thread sivarani
Hi tdas, is it possible to run spark 24/7, i am using updateStateByKey and i am streaming 3lac records in 1/2 hr, i am not getting the correct result also i am not not able to run spark streaming for 24/7 after hew hrs i get array out of bound exception even if i am not streaming anything? btw

Spark Streaming from Kafka

2014-10-28 Thread Harold Nguyen
Hi, Just wondering if you've seen the following error when reading from Kafka: ERROR ReceiverTracker: Deregistered receiver for stream 0: Error starting receiver 0 - java.lang.NoClassDefFoundError: scala/reflect/ClassManifest at kafka.utils.Log4jController$.(Log4jController.scala:29) at kafka.uti

Spark Streaming with Kinesis

2014-10-29 Thread Harold Nguyen
Hi all, I followed the guide here: http://spark.apache.org/docs/latest/streaming-kinesis-integration.html But got this error: Exception in thread "main" java.lang.NoClassDefFoundError: com/amazonaws/auth/AWSCredentialsProvider Would you happen to know what dependency or jar is needed ? Harold

Re: Spark Streaming getOrCreate

2014-11-04 Thread sivarani
urn Optional.of(newSum); } } }; -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-getOrCreate-tp18060p18139.html Sent from the Apache Spark User List mailing list

Issue in Spark Streaming

2014-11-04 Thread Suman S Patil
I am trying to run the Spark streaming program as given in the Spark streaming Programming guide<https://spark.apache.org/docs/latest/streaming-programming-guide.html>, in the interactive shell. I am getting an error as shown here as an intermediate step. It resumes the run on its ow

Re: Spark Streaming getOrCreate

2014-11-05 Thread Yana
the intial startup >From the error it appears that your application is unable to (re?)connect to the master upon checkpoint restart -- I wonder if the reason it needs to restart is that the master went down... -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com

Help with Spark Streaming

2014-11-15 Thread Bahubali Jain
Hi, Trying to use spark streaming, but I am struggling with word count :( I want consolidate output of the word count (not on a per window basis), so I am using updateStateByKey(), but for some reason this is not working. The function it self is not being invoked(do not see the sysout output on

Spark streaming batch overrun

2014-11-17 Thread facboy
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-batch-overrun-tp19061.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail:

Spark streaming on Yarn

2014-11-17 Thread kpeng1
Hi, I have been using spark streaming in standalone mode and now I want to migrate to spark running on yarn, but I am not sure how you would you would go about designating a specific node in the cluster to act as an avro listener since I am using flume based push approach with spark. -- View

Re: Spark Streaming Metrics

2014-11-21 Thread Gerard Maas
: > As the Spark Streaming tuning guide indicates, the key indicators of a > healthy streaming job are: > - Processing Time > - Total Delay > > The Spark UI page for the Streaming job [1] shows these two indicators but > the metrics source for Spark Streaming (StreamingSource

Re: Spark Streaming Metrics

2014-11-21 Thread andy petrella
ll when jobs are running in production. > > I've created Spark-4537 <https://issues.apache.org/jira/browse/SPARK-4537> > to track this issue. > > -kr, Gerard. > > On Thu, Nov 20, 2014 at 9:25 PM, Gerard Maas > wrote: > >> As the Spark Streaming tuning guide ind

Spark Streaming with Python

2014-11-23 Thread Venkat, Ankam
I am trying to run network_wordcount.py example mentioned at https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/network_wordcount.py on CDH5.2 Quickstart VM. Getting below error. Traceback (most recent call last): File "/usr/lib/spark/examples/lib/network_wordcoun

2 spark streaming questions

2014-11-23 Thread tian zhang
Hi, Dear Spark Streaming Developers and Users, We are prototyping using spark streaming and hit the following 2 issues thatI would like to seek your expertise. 1) We have a spark streaming application in scala, that reads  data from Kafka intoa DStream, does some processing and output a

Spark Streaming in Production

2014-12-11 Thread twizansk
Hi, I'm looking for resources and examples for the deployment of spark streaming in production. Specifically, I would like to know how high availability and fault tolerance of receivers is typically achieved. The workers are managed by the spark framework and are therefore fault tol

Spark streaming - Data Ingestion

2022-08-17 Thread Akash Vellukai
Dear sir I have tried a lot on this could you help me with this? Data ingestion from MySql to Hive with spark- streaming? Could you give me an overview. Thanks and regards Akash P

Re: Error - Spark STREAMING

2022-09-21 Thread Anupam Singh
Which version of spark are you using? On Tue, Sep 20, 2022, 1:57 PM Akash Vellukai wrote: > Hello, > > > py4j.protocol.Py4JJavaError: An error occurred while calling o80.load. : > java.lang.NoClassDefFoundError: > org/apache/spark/sql/internal/connector/SimpleTableProvider > > > May anyone hel

Spark Streaming: NullPointerException when restoring Spark Streaming job from hdfs/s3 checkpoint

2017-05-16 Thread Richard Moorhead
Im having some difficulty reliably restoring a streaming job from a checkpoint. When restoring a streaming job constructed from the following snippet, I receive NullPointerException's when `map` is called on the the restored RDD. lazy val ssc = StreamingContext.getOrCreate(checkpointDir, crea

[Spark Streaming] Starting Spark Streaming application from a specific position in Kinesis stream

2017-02-19 Thread Neil Maheshwari
Hello, I am building a Spark streaming application that ingests data from an Amazon Kinesis stream. My application keeps track of the minimum price over a window for groups of similar tickets. When I deploy the application, I would like it to start processing at the start of the previous

Flume with Spark Streaming Sink

2016-03-20 Thread Daniel Haviv
Hi, I'm trying to use the Spark Sink with Flume but it seems I'm missing some of the dependencies. I'm running the following code: ./bin/spark-shell --master yarn --jars /home/impact/flumeStreaming/spark-streaming-flume_2.10-1.6.1.jar,/home/impact/flumeStreaming/flume-ng-core-

Spark streaming rawSocketStream with protobuf

2016-04-01 Thread lokeshkumar
I am trying the spark streaming and listening to a socket, I am using the rawSocketStream method to create a receiver and a DStream. But when I print the DStream I get the below exception.*Code to create a DStream:*JavaSparkContext jsc = new JavaSparkContext("Master", "app");

Spark Streaming - NotSerializableException: Methods & Closures:

2016-04-04 Thread mpawashe
Hi all, I am using Spark Streaming API (I'm on version 2.10 for spark and streaming), and I am running into a function serialization issue that I do not run into when using Spark in batch (non-streaming) mode. If I wrote code like this: def run(): Unit = { val newStream = stream.

Spark Streaming, Broadcast variables, java.lang.ClassCastException

2016-04-25 Thread mwol
ream.defaultReadObject(ObjectInputStream.java:500) at org.apache.spark.Accumulable$$anonfun$readObject$1.apply$mcV$sp(Accumulators.scala:152) at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1205) ... 30 more -- View this message in context: http://a

Issue with Spark Streaming UI

2016-05-13 Thread Sachin Janani
Hi, I'm trying to run a simple spark streaming application with File Streaming and its working properly but when I try to monitor the number of events in the Streaming Ui it shows that as 0.Is this a issue and are there any plans to fix this. Regards, SJ

Re: Spark Streaming S3 Error

2016-05-21 Thread Ted Yu
ileSystem.java:328) > at > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2696) > at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94) > at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2733) > at org.apache.

Re: Spark Streaming S3 Error

2016-05-21 Thread Benjamin Kim
che.hadoop.fs.FileSystem.access$200(FileSystem.java:94) > at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2733) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2715) > at org.apache.hadoop.fs.FileSystem.get(File

Re: Spark Streaming S3 Error

2016-05-21 Thread Benjamin Kim
leSystem.java:94) > at > org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2733) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2715) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:382) > at org.a

Re: Spark Streaming with Redis

2016-05-24 Thread Sachin Aggarwal
Hi, yahoo benchmark uses redis with spark, have a look at this https://github.com/yahoo/streaming-benchmarks/blob/master/spark-benchmarks/src/main/scala/AdvertisingSpark.scala On Tue, May 24, 2016 at 1:28 PM, Pariksheet Barapatre wrote: > Hello All, > > I am trying to use Redis as a data stor

Re: Spark Streaming with Redis

2016-05-24 Thread Pariksheet Barapatre
Thanks Sachin. Link that you mentioned uses native connection library JedisPool. I am looking if I can use https://github.com/RedisLabs/spark-redis for same functionality. Regards Pari On 24 May 2016 at 13:33, Sachin Aggarwal wrote: > Hi, > > yahoo benchmark uses redis with spark, > > have a l

Re: Spark Streaming with Kafka

2016-05-24 Thread Rasika Pohankar
/. Regards, Rasika. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-with-Kafka-tp21222p27014.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To

Re: Spark Streaming - Kafka - java.nio.BufferUnderflowException

2016-05-25 Thread Cody Koeninger
or while trying to consume message from Kafka > through Spark streaming (Kafka direct API). This used to work OK when using > Spark standalone cluster manager. We're just switching to using Cloudera 5.7 > using Yarn to manage Spark cluster and started to see the below error. > > Few de

Logistic Regression in Spark Streaming

2016-05-27 Thread kundan kumar
Hi , Do we have a streaming version of Logistic Regression in Spark ? I can see its there for the Linear Regression. Has anyone used logistic regression on streaming data, it would be really helpful if you share your insights on how to train the incoming data. In my use case I am trying to use l

Re: Join two Spark Streaming

2016-06-07 Thread vinay453
.nabble.com/Join-two-Spark-Streaming-tp9052p27108.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h

Re: Spark Streaming getting slower

2016-06-09 Thread John Simon
Sorry, forgot to mention that I don't use broadcast variables. That's why I'm puzzled here. -- John Simon On Thu, Jun 9, 2016 at 11:09 AM, John Simon wrote: > Hi, > > I'm running Spark Streaming with Kafka Direct Stream, batch interval > is 10 seconds. > Afte

RE: restarting of spark streaming

2016-06-15 Thread Chen, Yan I
Could anyone answer my question? _ From: Chen, Yan I Sent: 2016, June, 14 1:34 PM To: 'user@spark.apache.org' Subject: restarting of spark streaming Hi, I notice that in the process of restarting, spark streaming will try to recover/repl

[Spark 1.6] Spark Streaming - java.lang.AbstractMethodError

2016-01-07 Thread Walid LEZZAR
Hi, We have been using spark streaming for a little while now. Until now, we were running our spark streaming jobs in spark 1.5.1 and it was working well. Yesterday, we upgraded to spark 1.6.0 without any changes in the code. But our streaming jobs are not working any more. We are getting an

Re: Spark Streaming on mesos

2016-01-18 Thread Iulian Dragoș
cheduler is getting >>>> revamped now. >>>> >>>> Tim >>>> >>>> On Nov 28, 2015, at 7:31 PM, Renjie Liu >>>> wrote: >>>> >>>> Hi, Nagaraj: >>>> Thanks for the response, but this does not solve my

visualize data from spark streaming

2016-01-20 Thread patcharee
Hi, How to visualize realtime data (in graph/chart) from spark streaming? Any tools? Best, Patcharee - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

spark streaming input rate strange

2016-01-22 Thread patcharee
Hi, I have a streaming application with - 1 sec interval - accept data from a simulation through MulticastSocket The simulation sent out data using multiple clients/threads every 1 sec interval. The input rate accepted by the streaming looks strange. - When clients = 10,000 the event rate raise

FAIR scheduler in Spark Streaming

2016-01-26 Thread Sebastian Piu
Hi, I'm trying to get *FAIR *scheduling to work in a spark streaming app (1.6.0). I've found a previous mailing list where it is indicated to do: dstream.foreachRDD { rdd => rdd.sparkContext.setLocalProperty("spark.scheduler.pool", "pool1") // set the pool rdd.c

Spark Streaming from existing RDD

2016-01-29 Thread Sateesh Karuturi
Anyone please help me out how to create a DStream from existing RDD. My code is: JavaSparkContext ctx = new JavaSparkContext(conf);JavaRDD rddd = ctx.parallelize(arraylist); Now i need to use these *rddd* as input to *JavaStreamingContext*.

Re: Spark streaming and ThreadLocal

2016-01-29 Thread Shixiong(Ryan) Zhu
Of cause. If you use a ThreadLocal in a long living thread and forget to remove it, it's definitely a memory leak. On Thu, Jan 28, 2016 at 9:31 PM, N B wrote: > Hello, > > Does anyone know if there are any potential pitfalls associated with using > ThreadLocal variables in

Re: Spark streaming and ThreadLocal

2016-01-29 Thread N B
Thanks for the response Ryan. So I would say that it is in fact the purpose of a ThreadLocal i.e. to have a copy of the variable as long as the thread lives. I guess my concern is around usage of threadpools and whether Spark streaming will internally create many threads that rotate between tasks

Re: Spark streaming and ThreadLocal

2016-01-29 Thread Shixiong(Ryan) Zhu
Spark Streaming uses threadpools so you need to remove ThreadLocal when it's not used. On Fri, Jan 29, 2016 at 12:55 PM, N B wrote: > Thanks for the response Ryan. So I would say that it is in fact the > purpose of a ThreadLocal i.e. to have a copy of the variable as long as the >

Re: Spark streaming and ThreadLocal

2016-01-29 Thread N B
SomeClass(); } }; somefunc(p, d.get()); d.remove(); return p; }; ); Will this make sure that all threads inside the worker clean up the ThreadLocal once they are done with processing this task? Thanks NB On Fri, Jan 29, 2016 at 1:00 PM, Shixiong(Ryan) Zhu wrote: > Spark St

Re: Spark streaming and ThreadLocal

2016-01-29 Thread N B
g this task? > > Thanks > NB > > > On Fri, Jan 29, 2016 at 1:00 PM, Shixiong(Ryan) Zhu < > shixi...@databricks.com> wrote: > >> Spark Streaming uses threadpools so you need to remove ThreadLocal when >> it's not used. >> >> On Fri, Jan 29,

Re: Spark streaming and ThreadLocal

2016-01-29 Thread Shixiong(Ryan) Zhu
> ThreadLocal once they are done with processing this task? >> >> Thanks >> NB >> >> >> On Fri, Jan 29, 2016 at 1:00 PM, Shixiong(Ryan) Zhu < >> shixi...@databricks.com> wrote: >> >>> Spark Streaming uses threadpools so you need

Re: Spark streaming and ThreadLocal

2016-01-29 Thread N B
t;>> somefunc(p, d.get()); >>> d.remove(); >>> return p; >>> }; ); >>> >>> Will this make sure that all threads inside the worker clean up the >>> ThreadLocal once they are done with processing this task? >>> >>> Thank

Re: Spark streaming and ThreadLocal

2016-01-29 Thread Shixiong(Ryan) Zhu
d = new ThreadLocal<>() { >>>> public SomeClass initialValue() { return new SomeClass(); } >>>> }; >>>> somefunc(p, d.get()); >>>> d.remove(); >>>> return p; >>>> }; ); >>>> >>

Spark Streaming application designing question

2016-02-01 Thread Vinti Maheshwari
Hi, I am new in spark. I wanted to do spark streaming setup to retrieve key value pairs of below format files: file: info1 Note: Each info file will have around of 1000 of these records. And our system continuously generating info files. So Through spark streaming i wanted to aggregate result

Re: Spark streaming and ThreadLocal

2016-02-01 Thread N B
t;>>>> dstream.map( p -> { ThreadLocal d = new ThreadLocal<>() { >>>>> public SomeClass initialValue() { return new SomeClass(); } >>>>> }; >>>>> somefunc(p, d.get()); >>>>> d.remove(); >>&

Re: Spark Streaming with Druid?

2016-02-07 Thread Hemant Bhanawat
You may want to have a look at spark druid project already in progress: https://github.com/SparklineData/spark-druid-olap You can also have a look at SnappyData <https://github.com/SnappyDataInc/snappydata>, which is a low latency store tightly integrated with Spark, Spark SQL and Spark Str

Re: Spark Streaming with Druid?

2016-02-08 Thread Umesh Kacha
d-olap > > You can also have a look at SnappyData > <https://github.com/SnappyDataInc/snappydata>, which is a low latency > store tightly integrated with Spark, Spark SQL and Spark Streaming. You can > find the 0.1 Preview release's documentation here. > <htt

Re: Spark Streaming with Druid?

2016-02-08 Thread Hemant Bhanawat
park druid project already in progress: >> https://github.com/SparklineData/spark-druid-olap >> >> You can also have a look at SnappyData >> <https://github.com/SnappyDataInc/snappydata>, which is a low latency >> store tightly integrated with Spark, Spark S

Skip empty batches - spark streaming

2016-02-11 Thread Sebastian Piu
I was wondering if there is there any way to skip batches with zero events when streaming? By skip I mean avoid the empty rdd from being created at all?

Spark Streaming with Kafka DirectStream

2016-02-16 Thread Cyril Scetbon
Hi guys, I'm making some tests with Spark and Kafka using a Python script. I use the second method that doesn't need any receiver (Direct Approach). It should adapt the number of RDDs to the number of partitions in the topic. I'm trying to verify it. What's the easiest way to verify it ? I also

kill Spark Streaming job gracefully

2016-03-11 Thread Shams ul Haque
Hi, I want to kill a Spark Streaming job gracefully, so that whatever Spark has picked from Kafka have processed. My Spark version is: 1.6.0 When i tried killing a Spark Streaming Job from Spark UI dosen't stop app completely. In Spark-UI job is moved to COMPLETED section, but in l

Re: spark streaming 1.3 issues

2015-07-20 Thread Shushant Arora
call for getting offsets of each partition separately or in single call it gets all partitions new offsets ? I mean will reducing no of partitions oin kafka help improving the performance? On Mon, Jul 20, 2015 at 4:52 PM, Shushant Arora wrote: > Hi > > 1.I am using spark streamin

<    1   2   3   4   5   6   7   8   9   10   >