Re: spark streaming rate limiting from kafka

Tathagata Das Fri, 18 Jul 2014 11:47:31 -0700

Dang! Messed it up again!

JIRA: https://issues.apache.org/jira/browse/SPARK-1341
Github PR: https://github.com/apache/spark/pull/945/files



On Fri, Jul 18, 2014 at 11:35 AM, Tathagata Das <tathagata.das1...@gmail.com
> wrote:

> Oops, wrong link!
> JIRA: https://github.com/apache/spark/pull/945/files
> Github PR: https://github.com/apache/spark/pull/945/files
>
>
> On Fri, Jul 18, 2014 at 7:19 AM, Chen Song <chen.song...@gmail.com> wrote:
>
>> Thanks Tathagata,
>>
>> That would be awesome if Spark streaming can support receiving rate in
>> general. I tried to explore the link you provided but could not find any
>> specific JIRA related to this? Do you have the JIRA number for this?
>>
>>
>>
>> On Thu, Jul 17, 2014 at 9:21 PM, Tathagata Das <
>> tathagata.das1...@gmail.com> wrote:
>>
>>> You can create multiple kafka stream to partition your topics across
>>> them, which will run multiple receivers or multiple executors. This is
>>> covered in the Spark streaming guide.
>>> <http://spark.apache.org/docs/latest/streaming-programming-guide.html#level-of-parallelism-in-data-receiving>
>>>
>>> And for the purpose of this thread, to answer the original question, we now
>>> have the ability
>>> <https://issues.apache.org/jira/browse/SPARK-1854?jql=project%20%3D%20SPARK%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20Streaming%20ORDER%20BY%20priority%20DESC>
>>> to limit the receiving rate. Its in the master branch, and will be
>>> available in Spark 1.1. It basically sets the limits at the receiver level
>>> (so applies to all sources) on what is the max records per second that can
>>> will be received by the receiver.
>>>
>>> TD
>>>
>>>
>>> On Thu, Jul 17, 2014 at 6:15 PM, Tobias Pfeiffer <t...@preferred.jp>
>>> wrote:
>>>
>>>> Bill,
>>>>
>>>> are you saying, after repartition(400), you have 400 partitions on one
>>>> host and the other hosts receive nothing of the data?
>>>>
>>>> Tobias
>>>>
>>>>
>>>> On Fri, Jul 18, 2014 at 8:11 AM, Bill Jay <bill.jaypeter...@gmail.com>
>>>> wrote:
>>>>
>>>>> I also have an issue consuming from Kafka. When I consume from Kafka,
>>>>> there are always a single executor working on this job. Even I use
>>>>> repartition, it seems that there is still a single executor. Does anyone
>>>>> has an idea how to add parallelism to this job?
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Jul 17, 2014 at 2:06 PM, Chen Song <chen.song...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Thanks Luis and Tobias.
>>>>>>
>>>>>>
>>>>>> On Tue, Jul 1, 2014 at 11:39 PM, Tobias Pfeiffer <t...@preferred.jp>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> On Wed, Jul 2, 2014 at 1:57 AM, Chen Song <chen.song...@gmail.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> * Is there a way to control how far Kafka Dstream can read on
>>>>>>>> topic-partition (via offset for example). By setting this to a small
>>>>>>>> number, it will force DStream to read less data initially.
>>>>>>>>
>>>>>>>
>>>>>>> Please see the post at
>>>>>>>
>>>>>>> http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201406.mbox/%3ccaph-c_m2ppurjx-n_tehh0bvqe_6la-rvgtrf1k-lwrmme+...@mail.gmail.com%3E
>>>>>>> Kafka's auto.offset.reset parameter may be what you are looking for.
>>>>>>>
>>>>>>> Tobias
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Chen Song
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>> --
>> Chen Song
>>
>>
>

Re: spark streaming rate limiting from kafka

Reply via email to