Re: Spark Streaming from Kafka, deal with initial heavy load.

2017-03-20 Thread Cody Koeninger
You want spark.streaming.kafka.maxRatePerPartition for the direct stream. On Sat, Mar 18, 2017 at 3:37 PM, Mal Edwin wrote: > > Hi, > You can enable backpressure to handle this. > > spark.streaming.backpressure.enabled > spark.streaming.receiver.maxRate > > Thanks, >

Re: Spark Streaming from Kafka, deal with initial heavy load.

2017-03-18 Thread Mal Edwin
Hi, You can enable backpressure to handle this. spark.streaming.backpressure.enabled spark.streaming.receiver.maxRate Thanks, Edwin On Mar 18, 2017, 12:53 AM -0400, sagarcasual . , wrote: > Hi, we have spark 1.6.1 streaming from Kafka (0.10.1) topic using direct >

Spark Streaming from Kafka, deal with initial heavy load.

2017-03-17 Thread sagarcasual .
Hi, we have spark 1.6.1 streaming from Kafka (0.10.1) topic using direct approach. The streaming part works fine but when we initially start the job, we have to deal with really huge Kafka message backlog, millions of messages, and that first batch runs for over 40 hours, and after 12 hours or so

Re: Spark streaming from Kafka best fit

2016-03-07 Thread pratik khadloya
Would using mapPartitions instead of map help here? ~Pratik On Tue, Mar 1, 2016 at 10:07 AM Cody Koeninger wrote: > You don't need an equal number of executor cores to partitions. An > executor can and will work on multiple partitions within a batch, one after > the other.

Re: Spark streaming from Kafka best fit

2016-03-01 Thread Cody Koeninger
You don't need an equal number of executor cores to partitions. An executor can and will work on multiple partitions within a batch, one after the other. The real issue is whether you are able to keep your processing time under your batch time, so that delay doesn't increase. On Tue, Mar 1,

Re: Spark streaming from Kafka best fit

2016-03-01 Thread Jatin Kumar
Thanks Cody! I understand what you said and if I am correct it will be using 224 executor cores just for fetching + stage-1 processing of 224 partitions. I will obviously need more cores for processing further stages and fetching next batch. I will start with higher number of executor cores and

Re: Spark streaming from Kafka best fit

2016-03-01 Thread Cody Koeninger
> "How do I keep a balance of executors which receive data from Kafka and which process data" I think you're misunderstanding how the direct stream works. The executor which receives data is also the executor which processes data, there aren't separate receivers. If it's a single stage worth of

Re: How to monitor Spark Streaming from Kafka?

2015-06-02 Thread Ruslan Dautkhanov
/adopted approached to monitoring Spark Streaming from Kafka? I see that there are things like http://quantifind.github.io/KafkaOffsetMonitor, for example. Do they all assume that Receiver-based streaming is used? Then Note that one disadvantage of this approach (Receiverless Approach, #2

How to monitor Spark Streaming from Kafka?

2015-06-01 Thread dgoldenberg
Hi, What are some of the good/adopted approached to monitoring Spark Streaming from Kafka? I see that there are things like http://quantifind.github.io/KafkaOffsetMonitor, for example. Do they all assume that Receiver-based streaming is used? Then Note that one disadvantage of this approach

Re: How to monitor Spark Streaming from Kafka?

2015-06-01 Thread Cody Koeninger
. TD On Mon, Jun 1, 2015 at 2:23 PM, dgoldenberg dgoldenberg...@gmail.com wrote: Hi, What are some of the good/adopted approached to monitoring Spark Streaming from Kafka? I see that there are things like http://quantifind.github.io/KafkaOffsetMonitor, for example. Do they all assume

Re: How to monitor Spark Streaming from Kafka?

2015-06-01 Thread Tathagata Das
are some of the good/adopted approached to monitoring Spark Streaming from Kafka? I see that there are things like http://quantifind.github.io/KafkaOffsetMonitor, for example. Do they all assume that Receiver-based streaming is used? Then Note that one disadvantage of this approach (Receiverless

Re: How to monitor Spark Streaming from Kafka?

2015-06-01 Thread Otis Gospodnetic
/22/monitoring-stream-processing-tools-cassandra-kafka-and-spark/ Otis On Mon, Jun 1, 2015 at 5:23 PM, dgoldenberg dgoldenberg...@gmail.com wrote: Hi, What are some of the good/adopted approached to monitoring Spark Streaming from Kafka? I see that there are things like http

Re: How to monitor Spark Streaming from Kafka?

2015-06-01 Thread Dmitry Goldenberg
are some of the good/adopted approached to monitoring Spark Streaming from Kafka? I see that there are things like http://quantifind.github.io/KafkaOffsetMonitor, for example. Do they all assume that Receiver-based streaming is used? Then Note that one disadvantage of this approach

Re: Spark Streaming from Kafka - no receivers and spark.streaming.receiver.maxRate?

2015-05-27 Thread Tathagata Das
as not to overwhelm the Spark consumers? What would be some of the ways to throttle the streamed messages so that the consumers don't run out of memory? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-from-Kafka-no-receivers-and-spark-streaming

Re: Spark Streaming from Kafka - no receivers and spark.streaming.receiver.maxRate?

2015-05-27 Thread Dmitry Goldenberg
the Spark consumers? What would be some of the ways to throttle the streamed messages so that the consumers don't run out of memory? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-from-Kafka-no-receivers-and-spark-streaming-receiver

Re: Spark Streaming from Kafka - no receivers and spark.streaming.receiver.maxRate?

2015-05-27 Thread Ted Yu
-Streaming-from-Kafka-no-receivers-and-spark-streaming-receiver-maxRate-tp23061.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional

Spark Streaming from Kafka - no receivers and spark.streaming.receiver.maxRate?

2015-05-27 Thread dgoldenberg
? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-from-Kafka-no-receivers-and-spark-streaming-receiver-maxRate-tp23061.html Sent from the Apache Spark User List mailing list archive at Nabble.com

spark streaming from kafka real time + batch processing in java

2015-02-06 Thread Mohit Durgapal
I want to write a spark streaming consumer for kafka in java. I want to process the data in real-time as well as store the data in hdfs in year/month/day/hour/ format. I am not sure how to achieve this. Should I write separate kafka consumers, one for writing data to HDFS and one for spark

Re: spark streaming from kafka real time + batch processing in java

2015-02-06 Thread Andrew Psaltis
Mohit, I want to process the data in real-time as well as store the data in hdfs in year/month/day/hour/ format. Are you wanting to process it and then put it into HDFS or just put the raw data into HDFS? If the later then why not just use Camus ( https://github.com/linkedin/camus), it will

Re: spark streaming from kafka real time + batch processing in java

2015-02-06 Thread Charles Feduke
Good questions, some of which I'd like to know the answer to. Is it okay to update a NoSQL DB with aggregated counts per batch interval or is it generally stored in hdfs? This depends on how you are going to use the aggregate data. 1. Is there a lot of data? If so, and you are going to use

spark streaming from kafka real time + batch processing in java

2015-02-05 Thread Mohit Durgapal
I want to write a spark streaming consumer for kafka in java. I want to process the data in real-time as well as store the data in hdfs in year/month/day/hour/ format. I am not sure how to achieve this. Should I write separate kafka consumers, one for writing data to HDFS and one for spark

Spark Streaming from Kafka

2014-10-29 Thread Harold Nguyen
Hi, Just wondering if you've seen the following error when reading from Kafka: ERROR ReceiverTracker: Deregistered receiver for stream 0: Error starting receiver 0 - java.lang.NoClassDefFoundError: scala/reflect/ClassManifest at kafka.utils.Log4jController$.init(Log4jController.scala:29) at

Re: Spark Streaming from Kafka

2014-10-29 Thread Akhil Das
Looks like the kafka jar that you are using isn't compatible with your scala version. Thanks Best Regards On Wed, Oct 29, 2014 at 11:50 AM, Harold Nguyen har...@nexgate.com wrote: Hi, Just wondering if you've seen the following error when reading from Kafka: ERROR ReceiverTracker:

Re: Spark Streaming from Kafka

2014-10-29 Thread harold
Thanks! How do I find out which Kafka jar to use for scala 2.10.4? — Sent from Mailbox On Wed, Oct 29, 2014 at 12:26 AM, Akhil Das ak...@sigmoidanalytics.com wrote: Looks like the kafka jar that you are using isn't compatible with your scala version. Thanks Best Regards On Wed, Oct 29,

Re: Spark Streaming from Kafka

2014-10-29 Thread harold
I using kafka_2.10-1.1.0.jar on spark 1.1.0 — Sent from Mailbox On Wed, Oct 29, 2014 at 12:31 AM, null har...@nexgate.com wrote: Thanks! How do I find out which Kafka jar to use for scala 2.10.4? — Sent from Mailbox On Wed, Oct 29, 2014 at 12:26 AM, Akhil Das ak...@sigmoidanalytics.com