Alternatively, you may spread your kafka receivers to multiple machines as 
discussed in this blog post:How to spread receivers over worker hosts in Spark 
streaming

|   |
|   |   |   |   |   |
| How to spread receivers over worker hosts in Spark streamingIn Spark 
Streaming, you can spawn multiple receivers to increase parallelism, e.g., such 
that each receiver reads from one of the partitions in Kafka. Then you combine 
the resulting streams and process them by batches. The code is sketched as 
follows: val ssc =  new StreamingConte... |
|  |
| View on tmblr.co | Preview by Yahoo |
|  |
|   |

 
Du


     On Wednesday, May 13, 2015 9:16 AM, Dibyendu Bhattacharya 
<[email protected]> wrote:
   

 or you can use this Receiver as well : 
http://spark-packages.org/package/dibbhatt/kafka-spark-consumer
Where you can specify how many Receivers you need for your topic and it will 
divides the partitions among the Receiver and return the joined stream for you .
Say you specified 20 receivers , in that case each Receiver can handle 4 
partitions and you get consumer parallelism of 20 receivers . 
Dibyendu
On Wed, May 13, 2015 at 9:28 PM, 李森栋 <[email protected]> wrote:

thank you very much


来自 魅族 MX4 Pro

-------- 原始邮件 --------
发件人:Cody Koeninger <[email protected]>
时间:周三 5月13日 23:52
收件人:hotdog <[email protected]>
抄送:[email protected]
主题:Re: force the kafka consumer process to different machines

>I assume you're using the receiver based approach?  Have you tried the
>createDirectStream api?
>
>https://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html
>
>If you're sticking with the receiver based approach I think your only
>option would be to create more consumer streams and union them.  That
>doesn't give you control over where they're run, but should increase the
>consumer parallelism.
>
>On Wed, May 13, 2015 at 10:33 AM, hotdog <[email protected]> wrote:
>
>> I 'm using streaming integrated with streaming-kafka.
>>
>> My kafka topic has 80 partitions, while my machines have 40 cores. I found
>> that when the job is running, the kafka consumer processes are only deploy
>> to 2 machines, the bandwidth of the 2 machines will be very very high.
>>
>> I wonder is there any way to control the kafka consumer's dispatch?
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/force-the-kafka-consumer-process-to-different-machines-tp22872.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>




  

Reply via email to