6 kafka partitions will result in 6 spark partitions, not 6 spark rdds. The question of whether you will have a backlog isn't just a matter of having 1 executor per partition. If a single executor can process all of the partitions fast enough to complete a batch in under the required time, you won't have a backlog.
On Thu, Jan 21, 2016 at 5:35 AM, Guillermo Ortiz <konstt2...@gmail.com> wrote: > > I'm using Spark Streaming and Kafka with Direct Approach. I have created a > topic with 6 partitions so when I execute Spark there are six RDD. I > understand than ideally it should have six executors to process each one > one RDD. To do it, when I execute spark-submit (I use YARN) I specific the > number executors to six. > If I don't specific anything it just create one executor. Looking for > information I have read: > > "The --num-executors command-line flag or spark.executor.instances > configuration > property control the number of executors requested. Starting in CDH > 5.4/Spark 1.3, you will be able to avoid setting this property by turning > on dynamic allocation > <https://spark.apache.org/docs/latest/job-scheduling.html#dynamic-resource-allocation> > with > thespark.dynamicAllocation.enabled property. Dynamic allocation enables a > Spark application to request executors when there is a backlog of pending > tasks and free up executors when idle." > > I have this parameter enabled, I understand than if I don't set the > parameter --num-executors it must create six executors or am I wrong? >