Dang! Messed it up again! JIRA: https://issues.apache.org/jira/browse/SPARK-1341 Github PR: https://github.com/apache/spark/pull/945/files
On Fri, Jul 18, 2014 at 11:35 AM, Tathagata Das <tathagata.das1...@gmail.com > wrote: > Oops, wrong link! > JIRA: https://github.com/apache/spark/pull/945/files > Github PR: https://github.com/apache/spark/pull/945/files > > > On Fri, Jul 18, 2014 at 7:19 AM, Chen Song <chen.song...@gmail.com> wrote: > >> Thanks Tathagata, >> >> That would be awesome if Spark streaming can support receiving rate in >> general. I tried to explore the link you provided but could not find any >> specific JIRA related to this? Do you have the JIRA number for this? >> >> >> >> On Thu, Jul 17, 2014 at 9:21 PM, Tathagata Das < >> tathagata.das1...@gmail.com> wrote: >> >>> You can create multiple kafka stream to partition your topics across >>> them, which will run multiple receivers or multiple executors. This is >>> covered in the Spark streaming guide. >>> <http://spark.apache.org/docs/latest/streaming-programming-guide.html#level-of-parallelism-in-data-receiving> >>> >>> And for the purpose of this thread, to answer the original question, we now >>> have the ability >>> <https://issues.apache.org/jira/browse/SPARK-1854?jql=project%20%3D%20SPARK%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20Streaming%20ORDER%20BY%20priority%20DESC> >>> to limit the receiving rate. Its in the master branch, and will be >>> available in Spark 1.1. It basically sets the limits at the receiver level >>> (so applies to all sources) on what is the max records per second that can >>> will be received by the receiver. >>> >>> TD >>> >>> >>> On Thu, Jul 17, 2014 at 6:15 PM, Tobias Pfeiffer <t...@preferred.jp> >>> wrote: >>> >>>> Bill, >>>> >>>> are you saying, after repartition(400), you have 400 partitions on one >>>> host and the other hosts receive nothing of the data? >>>> >>>> Tobias >>>> >>>> >>>> On Fri, Jul 18, 2014 at 8:11 AM, Bill Jay <bill.jaypeter...@gmail.com> >>>> wrote: >>>> >>>>> I also have an issue consuming from Kafka. When I consume from Kafka, >>>>> there are always a single executor working on this job. Even I use >>>>> repartition, it seems that there is still a single executor. Does anyone >>>>> has an idea how to add parallelism to this job? >>>>> >>>>> >>>>> >>>>> On Thu, Jul 17, 2014 at 2:06 PM, Chen Song <chen.song...@gmail.com> >>>>> wrote: >>>>> >>>>>> Thanks Luis and Tobias. >>>>>> >>>>>> >>>>>> On Tue, Jul 1, 2014 at 11:39 PM, Tobias Pfeiffer <t...@preferred.jp> >>>>>> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> On Wed, Jul 2, 2014 at 1:57 AM, Chen Song <chen.song...@gmail.com> >>>>>>> wrote: >>>>>>>> >>>>>>>> * Is there a way to control how far Kafka Dstream can read on >>>>>>>> topic-partition (via offset for example). By setting this to a small >>>>>>>> number, it will force DStream to read less data initially. >>>>>>>> >>>>>>> >>>>>>> Please see the post at >>>>>>> >>>>>>> http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201406.mbox/%3ccaph-c_m2ppurjx-n_tehh0bvqe_6la-rvgtrf1k-lwrmme+...@mail.gmail.com%3E >>>>>>> Kafka's auto.offset.reset parameter may be what you are looking for. >>>>>>> >>>>>>> Tobias >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Chen Song >>>>>> >>>>>> >>>>> >>>> >>> >> >> >> -- >> Chen Song >> >> >