Re: spark streaming rate limiting from kafka

Chen Song Fri, 18 Jul 2014 07:21:00 -0700

Thanks Tathagata,

That would be awesome if Spark streaming can support receiving rate in
general. I tried to explore the link you provided but could not find any
specific JIRA related to this? Do you have the JIRA number for this?




On Thu, Jul 17, 2014 at 9:21 PM, Tathagata Das <tathagata.das1...@gmail.com>
wrote:

> You can create multiple kafka stream to partition your topics across them,
> which will run multiple receivers or multiple executors. This is covered in
> the Spark streaming guide.
> <http://spark.apache.org/docs/latest/streaming-programming-guide.html#level-of-parallelism-in-data-receiving>
>
> And for the purpose of this thread, to answer the original question, we now
> have the ability
> <https://issues.apache.org/jira/browse/SPARK-1854?jql=project%20%3D%20SPARK%20AND%20resolution%20%3D%20Unresolved%20AND%20component%20%3D%20Streaming%20ORDER%20BY%20priority%20DESC>
> to limit the receiving rate. Its in the master branch, and will be
> available in Spark 1.1. It basically sets the limits at the receiver level
> (so applies to all sources) on what is the max records per second that can
> will be received by the receiver.
>
> TD
>
>
> On Thu, Jul 17, 2014 at 6:15 PM, Tobias Pfeiffer <t...@preferred.jp> wrote:
>
>> Bill,
>>
>> are you saying, after repartition(400), you have 400 partitions on one
>> host and the other hosts receive nothing of the data?
>>
>> Tobias
>>
>>
>> On Fri, Jul 18, 2014 at 8:11 AM, Bill Jay <bill.jaypeter...@gmail.com>
>> wrote:
>>
>>> I also have an issue consuming from Kafka. When I consume from Kafka,
>>> there are always a single executor working on this job. Even I use
>>> repartition, it seems that there is still a single executor. Does anyone
>>> has an idea how to add parallelism to this job?
>>>
>>>
>>>
>>> On Thu, Jul 17, 2014 at 2:06 PM, Chen Song <chen.song...@gmail.com>
>>> wrote:
>>>
>>>> Thanks Luis and Tobias.
>>>>
>>>>
>>>> On Tue, Jul 1, 2014 at 11:39 PM, Tobias Pfeiffer <t...@preferred.jp>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> On Wed, Jul 2, 2014 at 1:57 AM, Chen Song <chen.song...@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> * Is there a way to control how far Kafka Dstream can read on
>>>>>> topic-partition (via offset for example). By setting this to a small
>>>>>> number, it will force DStream to read less data initially.
>>>>>>
>>>>>
>>>>> Please see the post at
>>>>>
>>>>> http://mail-archives.apache.org/mod_mbox/incubator-spark-user/201406.mbox/%3ccaph-c_m2ppurjx-n_tehh0bvqe_6la-rvgtrf1k-lwrmme+...@mail.gmail.com%3E
>>>>> Kafka's auto.offset.reset parameter may be what you are looking for.
>>>>>
>>>>> Tobias
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Chen Song
>>>>
>>>>
>>>
>>
>


-- 
Chen Song

Re: spark streaming rate limiting from kafka

Reply via email to