Re: Spark Streaming ElasticSearch

2020-10-05 Thread jainshasha
Hi Siva To emit data into ES using spark structured streaming job you need to used ElasticSearch jar which has support for sink for spark structured streaming job. For this you can use this one my branch where we have integrated ES with spark 3.0 and scala 2.12 compatible https://github.com/Thales

Re: Spark Streaming ElasticSearch

2020-10-05 Thread Siva Samraj
Hi Jainshasha, I need to read each row from Dataframe and made some changes to it before inserting it into ES. Thanks Siva On Mon, Oct 5, 2020 at 8:06 PM jainshasha wrote: > Hi Siva > > To emit data into ES using spark structured streaming job you need to used > ElasticSearch jar which has sup

Re: Spark Streaming ElasticSearch

2020-10-06 Thread jainshasha
Hi Siva In that case u can use structured streaming foreach / foreachBatch function which can help you process each record and write it into some sink -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To u

Spark Streaming with Files

2021-04-23 Thread ayan guha
Hi In one of the spark summit demo, it is been alluded that we should think batch jobs in streaming pattern, using "run once" in a schedule. I find this idea very interesting and I understand how this can be achieved for sources like kafka, kinesis or similar. in fact we have implemented this mode

spark streaming to jdbc

2021-09-03 Thread igyu
val lines = spark.readStream .format("socket") // .schema(StructType(schemas)) .option("host", "10.3.87.23") .option("port", ) .load() .selectExpr("CAST(value AS STRING)").as[(String)]DF = lines.map(x => { val obj = JSON.parseObject(x) val ls = new util.ArrayList() (obj.g

Kafka to spark streaming

2022-01-28 Thread Amit Sharma
Hello everyone, we have spark streaming application. We send request to stream through Akka actor using Kafka topic. We wait for response as it is real time. Just want a suggestion is there any better option like Livy where we can send and receive request to spark streaming. Thanks Amit

Foreachpartition in spark streaming

2017-03-19 Thread Diwakar Dhanuskodi
Just wanted to clarify!!! Is foreachPartition in spark an output operation? Which one is better use mapPartitions or foreachPartitions? Regards Diwakar

Spark streaming - TIBCO EMS

2017-05-15 Thread Pradeep
What is the best way to connect to TIBCO EMS using spark streaming? Do we need to write custom receivers or any libraries already exist. Thanks, Pradeep - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Spark Streaming 2.1 recovery

2017-05-16 Thread Dominik Safaric
Hi, currently I am exploring Spark’s fault tolerance capabilities in terms of fault recovery. Namely I run a Spark 2.1 standalone cluster on a master and four worker nodes. The application pulls data using the Kafka direct stream API from a Kafka topic over a (sliding) window of time, and write

Spark Streaming Job Stuck

2017-06-05 Thread Jain, Nishit
I have a very simple spark streaming job running locally in standalone mode. There is a customer receiver which reads from database and pass it to the main job which prints the total. Not an actual use case but I am playing around to learn. Problem is that job gets stuck forever, logic is very

Spark Streaming Design Suggestion

2017-06-13 Thread Shashi Vishwakarma
Hi I have to design a spark streaming application with below use case. I am looking for best possible approach for this. I have application which pushing data into 1000+ different topics each has different purpose . Spark streaming will receive data from each topic and after processing it will

Spark streaming data loss

2017-06-19 Thread vasanth kumar
Hi, I have spark kafka streaming job running in Yarn cluster mode with spark.task.maxFailures=4 (default) spark.yarn.max.executor.failures=8 number of executor=1 spark.streaming.stopGracefullyOnShutdown=false checkpointing enabled - When there is RuntimeException in a batch in executor then same

Spark Streaming job statistics

2017-08-08 Thread KhajaAsmath Mohammed
Hi, I am running spark streaming job which receives data from azure iot hub. I am not sure if the connection was successful and receving any data. does the input column show how much data it has read if the connection was successful? [image: Inline image 1]

Chaining Spark Streaming Jobs

2017-08-21 Thread Sunita Arvind
Hello Spark Experts, I have a design question w.r.t Spark Streaming. I have a streaming job that consumes protocol buffer encoded real time logs from a Kafka cluster on premise. My spark application runs on EMR (aws) and persists data onto s3. Before I persist, I need to strip header and convert

Spark streaming for CEP

2017-10-18 Thread anna stax
Hello all, Has anyone used spark streaming for CEP (Complex Event processing). Any CEP libraries that works well with spark. I have a use case for CEP and trying to see if spark streaming is a good fit. Currently we have a data pipeline using Kafka, Spark streaming and Cassandra for data

Spark Streaming Kerberos Issue

2017-11-22 Thread KhajaAsmath Mohammed
Hi, I have written spark stream job and job is running successfully for more than 36 hours. After around 36 hours job gets failed with kerberos issue. Any solution on how to resolve it. org.apache.spark.SparkException: Task failed while wri\ ting rows. at org.apache.spark.sql.hi

NLTK with Spark Streaming

2017-11-25 Thread ashish rawat
Hi, Has someone tried running NLTK (python) with Spark Streaming (scala)? I was wondering if this is a good idea and what are the right Spark operators to do this? The reason we want to try this combination is that we don't want to run our transformations in python (pyspark), but afte

Spark Streaming with Confluent

2017-12-13 Thread Arkadiusz Bicz
Hi, I try to test spark streaming 2.2.0 version with confluent 3.3.0 I have got lot of error during compilation this is my sbt: lazy val sparkstreaming = (project in file(".")) .settings( name := "sparkstreaming", organization := "org.arek", version :=

Spark Streaming Cluster queries

2018-01-27 Thread puneetloya
Hi All, A cluster of one spark driver and multiple executors(5) is setup with redis for spark processed data storage and s3 is used for checkpointing. I have a couple of queries about this setup. 1) How to analyze what part of code executes on Spark Driver and what part of code executes on the ex

Re: Spark Streaming withWatermark

2018-02-06 Thread Vishnu Viswanath
Hi 20 second corresponds to when the window state should be cleared. For the late message to be dropped, it should come in after you receive a message with event time >= window end time + 20 seconds. I wrote a post on this recently: http://vishnuviswanath.com/spark_structured_streaming.html#water

Re: Spark Streaming withWatermark

2018-02-06 Thread Vishnu Viswanath
Yes, that is correct. On Tue, Feb 6, 2018 at 4:56 PM, Jiewen Shao wrote: > Vishnu, thanks for the reply > so "event time" and "window end time" have nothing to do with current > system timestamp, watermark moves with the higher value of "timestamp" > field of the input and never moves down, is t

Re: Spark Streaming withWatermark

2018-02-06 Thread Jiewen Shao
Ok, Thanks for confirmation. So based on my code, I have messages with following timestamps (converted to more readable format) in the following order: 2018-02-06 12:00:00 2018-02-06 12:00:01 2018-02-06 12:00:02 2018-02-06 12:00:03 2018-02-06 11:59:00 <-- this message should not be counted, righ

Re: Spark Streaming withWatermark

2018-02-06 Thread Vishnu Viswanath
Could it be that these messages were processed in the same micro batch? In that case, watermark will be updated only after the batch finishes which did not have any effect of the late data in the current batch. On Tue, Feb 6, 2018 at 4:18 PM Jiewen Shao wrote: > Ok, Thanks for confirmation. > >

Re: Spark Streaming withWatermark

2018-02-06 Thread Tathagata Das
That may very well be possible. The watermark delay guarantees that any record newer than or equal to watermark (that is, max event time seen - 20 seconds), will be considered and never be ignored. It does not guarantee the other way, that is, it does NOT guarantee that records older than the wate

Testing spark streaming action

2018-04-10 Thread Guillermo Ortiz
I have a unitTest in SparkStreaming which has an input parameters. -DStream[String] Inside of the code I want to update an LongAccumulator. When I execute the test I get an NullPointerException because the accumulator doesn't exist. Is there any way to test this? My accumulator is updated in diff

[Spark Streaming] Measure latency

2018-06-26 Thread Daniele Foroni
Hi all, I am using spark streaming and I need to evaluate the latency of the standard aggregations (avg, min, max, …) provided by the spark APIs. Any way to do it in the code? Thanks in advance, --- Daniele

Question of spark streaming

2018-07-27 Thread utkarsh rathor
I am following the book *Spark the Definitive Guide* The following code is *executed locally using spark-shell* Procedure: Started the spark-shell without any other options val static = spark.read.json("/part-00079-tid-730451297822678341-1dda7027-2071-4d73-a0e2-7fb6a91e1d1f-0-c000.json") val da

Re: jdbc spark streaming

2018-12-28 Thread Thakrar, Jayesh
Yes, you can certainly use spark streaming, but reading from the original source table may still be time consuming and resource intensive. Having some context on the RDBMS platform, data size/volumes involved and the tolerable lag (between changes being created and it being processed by Spark

Re: jdbc spark streaming

2018-12-28 Thread Nicolas Paris
now : CDC, Batch, spark Streaming and also apache livy. CDC (such debezium) looks interesting. It can be combined with triggers to populate some table to be then fetched with spark streaming; kafka and so on. However this approach is quite complex and add some processing/storage on the RDBMS side

Spark Streaming concurrent calls

2019-08-13 Thread Amit Sharma
I am using kafka spark streming. My UI application send request to streaming through kafka. Problem is streaming handles one request at a time so if multiple users send request at the same time they have to wait till earlier request are done. Is there any way it can handle multiple request. Thank

Re: spark streaming exception

2019-10-17 Thread Amit Sharma
Please update me if any one knows about it. Thanks Amit On Thu, Oct 10, 2019 at 3:49 PM Amit Sharma wrote: > Hi , we have spark streaming job to which we send a request through our UI > using kafka. It process and returned the response. We are getting below > error and this staremi

Re: spark streaming exception

2019-11-10 Thread Akshay Bhardwaj
2019 at 3:49 PM Amit Sharma wrote: > >> Hi , we have spark streaming job to which we send a request through our >> UI using kafka. It process and returned the response. We are getting below >> error and this stareming is not processing any request. >> >> Li

Spark Streaming with mapGroupsWithState

2020-03-02 Thread Something Something
I am writing a Stateful Streaming application in which I am using mapGroupsWithState to create aggregates for Groups but I need to create *Groups based on more than one column in the Input Row*. All the examples in the 'Spark: The Definitive Guide' use only one column such as 'User' or 'Device'. I

spark streaming and computation

2015-05-10 Thread skippi
Assuming a web server access log shall be analyzed and target of computation shall be csv-files per time, e.g. one per day containing the minute-statistics and one per month containing the hour statistics. Incoming statistics are computed as discretized streams using spark streaming context

Re: spark streaming doubt

2015-05-19 Thread Akhil Das
It will be a single job running at a time by default (you can also configure the spark.streaming.concurrentJobs to run jobs parallel which is not recommended to put in production). Now, your batch duration being 1 sec and processing time being 2 minutes, if you are using a receiver based streaming

Re: spark streaming doubt

2015-05-19 Thread Akhil Das
spark.streaming.concurrentJobs takes an integer value, not boolean. If you set it as 2 then 2 jobs will run parallel. Default value is 1 and the next job will start once it completes the current one. > Actually, in the current implementation of Spark Streaming and under > default configu

Re: spark streaming doubt

2015-05-19 Thread Shushant Arora
So for Kafka+spark streaming, Receiver based streaming used highlevel api and non receiver based streaming used low level api. 1.In high level receiver based streaming does it registers consumers at each job start(whenever a new job is launched by streaming application say at each second)? 2.No

Re: spark streaming doubt

2015-05-19 Thread Akhil Das
On Tue, May 19, 2015 at 8:10 PM, Shushant Arora wrote: > So for Kafka+spark streaming, Receiver based streaming used highlevel api > and non receiver based streaming used low level api. > > 1.In high level receiver based streaming does it registers consumers at > each job start

Re: spark streaming doubt

2015-05-19 Thread Dibyendu Bhattacharya
gt; wrote: > >> So for Kafka+spark streaming, Receiver based streaming used highlevel api >> and non receiver based streaming used low level api. >> >> 1.In high level receiver based streaming does it registers consumers at >> each job start(whenever a new job is launch

Re: spark streaming doubt

2015-05-19 Thread Shushant Arora
age/dibbhatt/kafka-spark-consumer > > > Regards, > Dibyendu > > On Tue, May 19, 2015 at 9:00 PM, Akhil Das > wrote: > >> >> On Tue, May 19, 2015 at 8:10 PM, Shushant Arora < >> shushantaror...@gmail.com> wrote: >> >>> So for Kafka+spark

Spark Streaming to Kafka

2015-05-19 Thread twinkle sachdeva
Hi, As Spark streaming is being nicely integrated with consuming messages from Kafka, so I thought of asking the forum, that is there any implementation available for pushing data to Kafka from Spark Streaming too? Any link(s) will be helpful. Thanks and Regards, Twinkle

Re: spark streaming doubt

2015-05-20 Thread Akhil Das
Kafka >> Low Level Consumer API. >> >> http://spark-packages.org/package/dibbhatt/kafka-spark-consumer >> >> >> Regards, >> Dibyendu >> >> On Tue, May 19, 2015 at 9:00 PM, Akhil Das >> wrote: >> >>> >>> On Tue, May 19, 2015

Re: spark streaming doubt

2015-05-20 Thread Shushant Arora
;>> Just to add, there is a Receiver based Kafka consumer which uses Kafka >>> Low Level Consumer API. >>> >>> http://spark-packages.org/package/dibbhatt/kafka-spark-consumer >>> >>> >>> Regards, >>> Dibyendu >>> >>&g

Re: spark streaming doubt

2015-05-20 Thread Akhil Das
19, 2015 at 9:29 PM, Dibyendu Bhattacharya < >>> dibyendu.bhattach...@gmail.com> wrote: >>> >>>> Just to add, there is a Receiver based Kafka consumer which uses Kafka >>>> Low Level Consumer API. >>>> >>>> http://spark-package

Spark Streaming and Drools

2015-05-22 Thread Antonio Giambanco
Hi All, I'm deploying and architecture that uses flume for sending log information in a sink. Spark streaming read from this sink (pull strategy) e process al this information, during this process I would like to make some event processing. . . for example: Log appender writes information

Re: SPARK STREAMING PROBLEM

2015-05-28 Thread Sourav Chandra
You must start the StreamingContext by calling ssc.start() On Thu, May 28, 2015 at 6:57 PM, Animesh Baranawal < animeshbarana...@gmail.com> wrote: > Hi, > > I am trying to extract the filenames from which a Dstream is generated by > parsing the toDebugString method on RDD > I am implementing the

Fwd: SPARK STREAMING PROBLEM

2015-05-28 Thread Animesh Baranawal
I also started the streaming context by running ssc.start() but still apart from logs nothing of g gets printed. -- Forwarded message -- From: Animesh Baranawal Date: Thu, May 28, 2015 at 6:57 PM Subject: SPARK STREAMING PROBLEM To: user@spark.apache.org Hi, I am trying to

Spark streaming with kafka

2015-05-28 Thread boci
Hi guys, I using spark streaming with kafka... In local machine (start as java application without using spark-submit) it's work, connect to kafka and do the job (*). I tried to put into spark docker container (hadoop 2.6, spark 1.3.1, try spark submit wil local[5] and yarn-client too ) bu

Re: SPARK STREAMING PROBLEM

2015-05-28 Thread Sourav Chandra
om logs nothing of g gets printed. > > -- Forwarded message -- > From: Animesh Baranawal > Date: Thu, May 28, 2015 at 6:57 PM > Subject: SPARK STREAMING PROBLEM > To: user@spark.apache.org > > > Hi, > > I am trying to extract the filenames from which a Dst

Re: spark streaming - checkpoint

2015-06-27 Thread Tathagata Das
Do you have SPARK_CLASSPATH set in both cases? Before and after checkpoint? If yes, then you should not be using SPARK_CLASSPATH, it has been deprecated since Spark 1.0 because of its ambiguity. Also where do you have spark.executor.extraClassPath set? I dont see it in the spark-submit command. On

Re: spark streaming - checkpoint

2015-06-28 Thread ram kumar
SPARK_CLASSPATH=$CLASSPATH:/usr/hdp/2.2.0.0-2041/hadoop-mapreduce/* in spark-env.sh I think i am facing the same issue https://issues.apache.org/jira/browse/SPARK-6203 On Mon, Jun 29, 2015 at 11:38 AM, ram kumar wrote: > I am using Spark 1.2.0.2.2.0.0-82 (git revision de12451) built for Hadoop

Re: spark streaming - checkpoint

2015-06-29 Thread ram kumar
on using yarn-cluster, it works good On Mon, Jun 29, 2015 at 12:07 PM, ram kumar wrote: > SPARK_CLASSPATH=$CLASSPATH:/usr/hdp/2.2.0.0-2041/hadoop-mapreduce/* > in spark-env.sh > > I think i am facing the same issue > https://issues.apache.org/jira/browse/SPARK-6203 > > > > On Mon, Jun 29, 2015 a

Re: spark streaming performance

2015-07-09 Thread Tathagata Das
second? On Thu, Jul 9, 2015 at 12:21 AM, Michel Hubert wrote: > > > Hi, > > > > I’ve developed a POC Spark Streaming application. > > But it seems to perform better on my development machine than on our > cluster. > > I submit it to yarn on our cloudera clust

spark streaming kafka compatibility

2015-07-09 Thread Shushant Arora
Does spark streaming 1.3 requires kafka version 0.8.1.1 and is not compatible with kafka 0.8.2 ? As per maven dependency of spark streaming 1.3 with kafka org.apache.sparkspark-streaming_2.10 1.3.0 providedspark-core_2.10org.apache.spark org.apache.kafkakafka_2.10 0.8.1.1

Re: spark streaming performance

2015-07-09 Thread Tathagata Das
I am not sure why you are getting node_local and not process_local. Also there is probably not a good documentation other than that configuration page - http://spark.apache.org/docs/latest/configuration.html (search for locality) On Thu, Jul 9, 2015 at 5:51 AM, Michel Hubert wrote: > > > > > > >

Spark Streaming Python APIs?

2014-12-14 Thread Xiaoyong Zhu
Hi spark experts Are there any Python APIs for Spark Streaming? I didn't find the Python APIs in Spark Streaming programming guide.. http://spark.apache.org/docs/latest/streaming-programming-guide.html Xiaoyong

spark streaming python + kafka

2014-12-19 Thread Oleg Ruchovets
Hi , I've just seen that streaming spark supports python from 1.2 version. Question, does spark streaming (python version ) supports kafka integration? Thanks Oleg.

Spark Streaming Threading Model

2014-12-19 Thread Asim Jalis
Q: In Spark Streaming if your DStream transformation and output action take longer than the batch duration will the system process the next batch in another thread? Or will it just wait until the first batch’s RDD is processed? In other words does it build up a queue of buffered RDDs awaiting

Tuning Spark Streaming jobs

2014-12-22 Thread Gerard Maas
Hi, After facing issues with the performance of some of our Spark Streaming jobs, we invested quite some effort figuring out the factors that affect the performance characteristics of a Streaming job. We defined an empirical model that helps us reason about Streaming jobs and applied it to tune

Re: Kafka + Spark streaming

2014-12-30 Thread Tathagata Das
gards, > Sam > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Kafka-Spark-streaming-tp20914.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > -

Re: Kafka + Spark streaming

2014-12-31 Thread Samya Maiti
al will go in the same > > block. > > > > 2. If a worker goes down which runs the Receiver for Kafka, Will the > > receiver be restarted on some other worker? > > > > Regards, > > Sam > > > > > > > > -- > > View this message

Spark Streaming with Kafka

2015-01-18 Thread Rasika Pohankar
I am using Spark Streaming to process data received through Kafka. The Spark version is 1.2.0. I have written the code in Java and am compiling it using sbt. The program runs and receives data from Kafka and processes it as well. But it stops receiving data suddenly after some time( it has run for

spark streaming kinesis issue

2015-01-20 Thread Hafiz Mujadid
Hi experts! I am using spark streaming with kinesis and getting this exception while running program java.lang.LinkageError: loader (instance of org/apache/spark/executor/ChildExecutorURLClassLoader$userClassLoader$): attempted duplicate class definition for name: "com/amazonaws/ser

spark streaming with checkpoint

2015-01-20 Thread balu.naren
I am a beginner to spark streaming. So have a basic doubt regarding checkpoints. My use case is to calculate the no of unique users by day. I am using reduce by key and window for this. Where my window duration is 24 hours and slide duration is 5 mins. I am updating the processed record to mongodb

Spark streaming on ec2

2014-02-27 Thread Aureliano Buendia
Hi, Does the ec2 support for spark 0.9 also include spark streaming? If not, is there an equivalent?

Spark Streaming Maven Build

2014-03-04 Thread Bin Wang
Hi there, I tried the Kafka WordCount example and it works perfect and the code is pretty straightforward to understand. Can anyone show to me how to start your own maven project with the KafkaWordCount example using minimum-effort. 1. How the pom file should look like (including jar-plugin? ass

Spark streaming kafka _output_

2014-03-21 Thread Benjamin Black
Howdy, folks! Anybody out there having a working kafka _output_ for Spark streaming? Perhaps one that doesn't involve instantiating a new producer for every batch? Thanks! b

Spark Streaming - Shared hashmaps

2014-03-26 Thread Bryan Bryan
Hi there, I have read about the two fundamental shared features in spark (broadcasting variables and accumulators), but this is what i need. I'm using spark streaming in order to get requests from Kafka, these requests may launch long-running tasks, and i need to control them: 1) Keep them

Java Spark Streaming - SparkFlumeEvent

2014-04-28 Thread Kulkarni, Vikram
Hi Spark-users, Within my Spark Streaming program, I am able to ingest data sent by my Flume Avro Client. I configured a 'spooling directory source' to write data to a Flume Avro Sink (the Spark Streaming Driver program in this case). The default deserializer i.e. LINE is used to

Re: spark streaming question

2014-05-04 Thread Chris Fregly
great questions, weide. in addition, i'd also like to hear more about how to horizontally scale a spark-streaming cluster. i've gone through the samples (standalone mode) and read the documentation, but it's still not clear to me how to scale this puppy out under high load. i as

spark streaming kafka output

2014-05-04 Thread Weide Zhang
Hi , Is there any code to implement a kafka output for spark streaming? My use case is all the output need to be dumped back to kafka cluster again after data is processed ? What will be guideline to implement such function ? I heard foreachRDD will create one instance of producer per batch ? If

Spark Streaming and JMS

2014-05-05 Thread Patrick McGloin
Hi all, Is there a "best practice" for subscribing to JMS with Spark Streaming? I have searched but not found anything conclusive. In the absence of a standard practice the solution I was thinking of was to use Akka + Camel (akka.camel.Consumer) to create a subscription for a Spark

Re: spark streaming question

2014-05-05 Thread Tathagata Das
One main reason why Spark Streaming can achieve higher throughput than Storm is because Spark Streaming operates in coarser-grained batches - second-scale massive batches - which reduce per-tuple of overheads in shuffles, and other kinds of data movements, etc. Note that, this is also true that

NotSerializableException in Spark Streaming

2014-05-14 Thread Diana Carroll
Hey all, trying to set up a pretty simple streaming app and getting some weird behavior. First, a non-streaming job that works fine: I'm trying to pull out lines of a log file that match a regex, for which I've set up a function: def getRequestDoc(s: String): String = { "KBDOC-[0-9]*".r.find

Spark Streaming NeteorkReceiver problems

2014-06-05 Thread zzzzzqf12345
, mem 6G. NetWork : 1Gb/bps Any suggestions will be appreciated. thanks, QingFeng -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-NeteorkReceiver-problems-tp7109.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Spark with Spark Streaming

2014-06-07 Thread b0c1
Hi! There are any way to use spark with spark streaming together to create real time architecture? How can I merge the spark and spark streaming result at realtime (and drop streaming result if spark result generated)? Thanks -- View this message in context: http://apache-spark-user-list

Spark-Streaming window processing

2014-06-09 Thread Yingjun Wu
Dear all, I just run the window processing job using Spark-Streaming, and I have two questions. First, how can I measure the latency of Spark-Streaming? Is there any APIs that I can call directly? Second, is it true that the latency of Spark-Streaming grows linearly with the window size? It seems

Shark over Spark-Streaming

2014-06-10 Thread praveshjain1991
-user-list.1001560.n3.nabble.com/Shark-over-Spark-Streaming-tp7307.html Sent from the Apache Spark User List mailing list archive at Nabble.com.

Problem in Spark Streaming

2014-06-10 Thread nilmish
I am running a spark streaming job to count top 10 hashtags over last 5 mins window, querying every 1 sec. It is taking approx <1.4 sec (end-to-end-delay) to answer most of the query but there are few instances in between when it takes considerable more amount of time (like around 15 sec) due

Re: Spark Streaming socketTextStream

2014-06-10 Thread Akhil Das
You can use the master's IP address (Or whichever machine you chose to run the nc command) instead of localhost.

Re: Spark Streaming socketTextStream

2014-06-10 Thread fredwolfinger
email. From: "Akhil Das-2 [via Apache Spark User List]" Date: Tuesday, June 10, 2014 10:16 AM To: Fred Wolfinger Subject: Re: Spark Streaming socketTextStream You can use the master's IP address (Or whichever machine you chose to run the nc command) instead of localhost.

spark streaming, kafka, SPARK_CLASSPATH

2014-06-10 Thread lannyripple
I am using Spark 1.0.0 compiled with Hadoop 1.2.1. I have a toy spark-streaming-kafka program. It reads from a kafka queue and does stream .map {case (k, v) => (v, 1)} .reduceByKey(_ + _) .print() using a 1 second interval on the stream. The docs say to make Spark

Fwd: spark streaming questions

2014-06-16 Thread Chen Song
Hey I am new to spark streaming and apologize if these questions have been asked. * In StreamingContext, reduceByKey() seems to only work on the RDDs of the current batch interval, not including RDDs of previous batches. Is my understanding correct? * If the above statement is correct, what

Re: spark streaming questions

2014-06-17 Thread Anwar Rizal
On Tue, Jun 17, 2014 at 5:39 PM, Chen Song wrote: > Hey > > I am new to spark streaming and apologize if these questions have been > asked. > > * In StreamingContext, reduceByKey() seems to only work on the RDDs of the > current batch interval, not including RDDs of pr

broadcast in spark streaming

2014-06-20 Thread Hahn Jiang
I want to use broadcast in spark streaming, but I found there is no this function. How can I use global variable in spark streaming? thanks

Re: spark streaming questions

2014-06-25 Thread Chen Song
Thanks Anwar. On Tue, Jun 17, 2014 at 11:54 AM, Anwar Rizal wrote: > > On Tue, Jun 17, 2014 at 5:39 PM, Chen Song wrote: > >> Hey >> >> I am new to spark streaming and apologize if these questions have been >> asked. >> >> * In StreamingContext, re

semi join spark streaming

2014-06-25 Thread Chen Song
Is there a easy way to do semi join in spark streaming? Here is my problem briefly, I have a DStream that will generate a set of values. I would like to check the existence in this set in other DStreams. Is there a easy and standard way to model this problem. If not, can I write spark streaming

Spark Streaming RDD transformation

2014-06-26 Thread Bill Jay
Hi all, I am current working on a project that requires to transform each RDD in a DStream to a Map. Basically, when we get a list of data in each batch, we would like to update the global map. I would like to return the map as a single RDD. I am currently trying to use the function *transform*.

Spark Streaming with HBase

2014-06-29 Thread N . Venkata Naga Ravi
Hi, Is there any example provided for Spark Streaming with Input provided from HBase table content. Thanks, Ravi

spark streaming counter metrics

2014-06-30 Thread Chen Song
I am new to spark streaming and wondering if spark streaming tracks counters (e.g., how many rows in each consumer, how many rows routed to an individual reduce task, etc.) in any form so I can get an idea of how data is skewed? I checked spark job page but don't seem to find any. -- Chen Song

Spark Streaming testing strategies

2015-03-01 Thread Marcin Kuthan
I have started using Spark and Spark Streaming and I'm wondering how do you test your applications? Especially Spark Streaming application with window based transformations. After some digging I found ManualClock class to take full control over stream processing. Unfortunately the class i

Spark Streaming Switchover Time

2015-03-03 Thread Nastooh Avessta (navesta)
Hi On a standalone, Spark 1.0.0, with 1 master and 2 workers, operating in client mode, running a udp streaming application, I am noting around 2 second elapse time on switchover, upon shutting down the streaming worker, where streaming window length is 1 sec. I am wondering what parameters are

SQL with Spark Streaming

2015-03-10 Thread Mohit Anchlia
Does Spark Streaming also supports SQLs? Something like how Esper does CEP.

Writing Spark Streaming Programs

2015-03-19 Thread James King
Hello All, I'm using Spark for streaming but I'm unclear one which implementation language to use Java, Scala or Python. I don't know anything about Python, familiar with Scala and have been doing Java for a long time. I think the above shouldn't influence my decision on which language to use be

Visualizing Spark Streaming data

2015-03-20 Thread Harut
e, etc. Thanks in advance. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Visualizing-Spark-Streaming-data-tp22160.html Sent from the Apache Spark User List mailing list archive at Nabble.com. -

Re: Spark streaming alerting

2015-03-22 Thread Akhil Das
And the alert() function could be anything triggering an email or sending an SMS alert. Thanks Best Regards On Sun, Mar 22, 2015 at 1:52 AM, Mohit Anchlia wrote: > Is there a module in spark streaming that lets you listen to > the alerts/conditions as they happen in the streaming mo

Re: Spark streaming alerting

2015-03-23 Thread Jeffrey Jedele
s("ERROR")).foreachRDD(rdd => > alert("Errors :" + rdd.count())) > > And the alert() function could be anything triggering an email or sending > an SMS alert. > > Thanks > Best Regards > > On Sun, Mar 22, 2015 at 1:52 AM, Mohit Anchlia > wrote: > >&

Re: Spark streaming alerting

2015-03-23 Thread Khanderao Kand Gmail
= data.filter(_.contains("ERROR")).foreachRDD(rdd => > alert("Errors :" + rdd.count())) > > And the alert() function could be anything triggering an email or sending an > SMS alert. > > Thanks > Best Regards > >> On Sun, Mar 22, 2015 at

Re: Spark streaming alerting

2015-03-23 Thread Mohit Anchlia
h which you could do: > > val data = ssc.textFileStream("sigmoid/") > val dist = data.filter(_.contains("ERROR")).foreachRDD(rdd => > alert("Errors :" + rdd.count())) > > And the alert() function could be anything triggering an email or

Re: Spark streaming alerting

2015-03-23 Thread Tathagata Das
mean you can't send it directly from spark workers? Here's a >> simple approach which you could do: >> >> val data = ssc.textFileStream("sigmoid/") >> val dist = data.filter(_.contains("ERROR")).foreachRDD(rdd => >> alert("

<    1   2   3   4   5   6   7   8   9   10   >