Spark Streaming - Increasing number of executors slows down processing rate

2017-06-19 Thread Mal Edwin
Hi All, I am struggling with an odd issue and would like your help in addressing it. Environment AWS Cluster (40 Spark Nodes & 4 node Kafka cluster) Spark Kafka Streaming submitted in Yarn cluster mode Kafka - Single topic, 400 partitions Spark 2.1 on Cloudera Kafka 10.0 on Cloudera We have zero

Re: Spark Streaming from Kafka, deal with initial heavy load.

2017-03-18 Thread Mal Edwin
Hi, You can enable backpressure to handle this. spark.streaming.backpressure.enabled spark.streaming.receiver.maxRate Thanks, Edwin On Mar 18, 2017, 12:53 AM -0400, sagarcasual . , wrote: > Hi, we have spark 1.6.1 streaming from Kafka (0.10.1) topic using direct >

RE: RE: Fast write datastore...

2017-03-16 Thread Mal Edwin
Hi All, I believe here what we are looking for is a serving layer where user queries can be executed on a subset of processed data. In this scenario, we are using Impala for this as it provides a layered caching, in our use case it caches some set in memory and then some in HDFS and the full