Re: Question about Spark Streaming checkpoint interval

2015-12-18 Thread Shixiong Zhu
You are right. "checkpointInterval" is only for data checkpointing. "metadata checkpoint" is done for each batch. Feel free to send a PR to add the missing doc. Best Regards, Shixiong Zhu 2015-12-18 8:26 GMT-08:00 Lan Jiang : > Need some clarification about the documentation. According to Spark

Question about Spark Streaming checkpoint interval

2015-12-18 Thread Lan Jiang
Need some clarification about the documentation. According to Spark doc "the default interval is a multiple of the batch interval that is at least 10 seconds. It can be set by using dstream.checkpoint(checkpointInterval). Typically, a checkpoint interval of 5 - 10 sliding intervals of a DStream

Re: question about spark streaming

2015-08-10 Thread Dean Wampler
polyglotprogramming.com On Mon, Aug 10, 2015 at 5:24 AM, sequoiadb wrote: > hi guys, > > i have a question about spark streaming. > There’s an application keep sending transaction records into spark stream > with about 50k tps > The record represents a sales information in

question about spark streaming

2015-08-10 Thread sequoiadb
hi guys, i have a question about spark streaming. There’s an application keep sending transaction records into spark stream with about 50k tps The record represents a sales information including customer id / product id / time / price columns The application is required to monitor the change

Re: Question about Spark Streaming Receiver Failure

2015-03-16 Thread Dibyendu Bhattacharya
Yes.. Auto restart is enabled in my low level consumer ..when there is some unhandled exception comes... Even if you see KafkaConsumer.java, for some cases ( like broker failure, kafka leader changes etc ) it can even refresh the Consumer (The Coordinator which talks to a Leader) which will recove

Re: Question about Spark Streaming Receiver Failure

2015-03-16 Thread Jun Yang
I have checked Dibyendu's code, it looks that his implementation has auto-restart mechanism: src/main/java/consumer/kafka

Re: Question about Spark Streaming Receiver Failure

2015-03-16 Thread Akhil Das
As i seen, once i kill my receiver on one machine, it will automatically spawn another receiver on another machine or on the same machine. Thanks Best Regards On Mon, Mar 16, 2015 at 1:08 PM, Jun Yang wrote: > Dibyendu, > > Thanks for the reply. > > I am reading your project homepage now. > > O

Re: Question about Spark Streaming Receiver Failure

2015-03-16 Thread Akhil Das
There should be something in the worker/driver logs regarding the failure. For receiver failures, you can try the lowlevel kafka consumer as Dibyendu suggested, You need to have a high-availability setup with Monitoring enabled (nagios etc config

Re: Question about Spark Streaming Receiver Failure

2015-03-16 Thread Jun Yang
Dibyendu, Thanks for the reply. I am reading your project homepage now. One quick question I care about is: If the receivers failed for some reasons(for example, killed brutally by someone else), is there any mechanism for the receiver to fail over automatically? On Mon, Mar 16, 2015 at 3:25 P

Re: Question about Spark Streaming Receiver Failure

2015-03-16 Thread Jun Yang
Akhil, I have checked the logs. There isn't any clue as to why the 5 receivers failed. That's why I just take it for granted that it will be a common issue for receiver failures, and we need to figure out a way to detect this kind of failure and do fail-over. Thanks On Mon, Mar 16, 2015 at 3:1

Re: Question about Spark Streaming Receiver Failure

2015-03-16 Thread Dibyendu Bhattacharya
Which version of Spark you are running ? You can try this Low Level Consumer : http://spark-packages.org/package/dibbhatt/kafka-spark-consumer This is designed to recover from various failures and have very good fault recovery mechanism built in. This is being used by many users and at present we

Re: Question about Spark Streaming Receiver Failure

2015-03-16 Thread Akhil Das
You need to figure out why the receivers failed in the first place. Look in your worker logs and see what really happened. When you run a streaming job continuously for longer period mostly there'll be a lot of logs (you can enable log rotation etc.) and if you are doing a groupBy, join, etc type o

Question about Spark Streaming Receiver Failure

2015-03-16 Thread Jun Yang
Guys, We have a project which builds upon Spark streaming. We use Kafka as the input stream, and create 5 receivers. When this application runs for around 90 hour, all the 5 receivers failed for some unknown reasons. In my understanding, it is not guaranteed that Spark streaming receiver will d

Re: Re: Question about spark streaming+Flume

2015-02-16 Thread bit1...@163.com
Hi Arush, With your code, I still didn't see the output "Received X flumes events".. bit1...@163.com From: bit1...@163.com Date: 2015-02-17 14:08 To: Arush Kharbanda CC: user Subject: Re: Re: Question about spark streaming+Flume Ok, you are missing a letter in foreachRDD..

Re: Re: Question about spark streaming+Flume

2015-02-16 Thread bit1...@163.com
Ok, you are missing a letter in foreachRDD.. let me proceed.. bit1...@163.com From: Arush Kharbanda Date: 2015-02-17 14:31 To: bit1...@163.com CC: user Subject: Re: Question about spark streaming+Flume Hi Can you try this val lines = FlumeUtils.createStream(ssc,"localhost&

Re: Re: Question about spark streaming+Flume

2015-02-16 Thread bit1...@163.com
14:31 To: bit1...@163.com CC: user Subject: Re: Question about spark streaming+Flume Hi Can you try this val lines = FlumeUtils.createStream(ssc,"localhost",) // Print out the count of events received from this server in each batch lines.count().map(cnt => "Received

Re: Question about spark streaming+Flume

2015-02-16 Thread Arush Kharbanda
Hi Can you try this val lines = FlumeUtils.createStream(ssc,"localhost",) // Print out the count of events received from this server in each batch lines.count().map(cnt => "Received " + cnt + " flume events. at " + System.currentTimeMillis() ) lines.forechRDD(_.foreach(println)) Than

Question about spark streaming+Flume

2015-02-16 Thread bit1...@163.com
Hi, I am trying Spark Streaming + Flume example: 1. Code object SparkFlumeNGExample { def main(args : Array[String]) { val conf = new SparkConf().setAppName("SparkFlumeNGExample") val ssc = new StreamingContext(conf, Seconds(10)) val lines = FlumeUtils.createStream(ssc,"localhost"