We had some discussion if we can/should replace re-partitioning topic
via a direct network connection between instances. It's a tricky problem
though with many string attached... Thus, it comes with pros and cons
and it's still unclear what the exact trade-off is.

Thus, it might happen, but it's unclear atm if or when. No concrete road
map. But as an open-source project, we rely on user feedback. Thus, this
idea just got one more +1 :)


-Matthias

On 11/29/17 8:26 AM, Adrienne Kole wrote:
> Hi,
> 
> You misunderstood the focus of the post perhaps or I could not explain
> properly. I am not claiming the streams is limited to single node.
> Although the whole topology instance can be limited to a single node (each
> node run all topology), this is sth else.
> Also, I think that "moving 100s of GB data per day" claim is orthogonal
> and as this is not big/fast/ enough to reason.
> 
> The thing is that, for some use-cases streams-kafka-streams connection can
> be a bottleneck.  Yes, if I have 40GB/s or infiniband network bandwidth
> this might not be an issue.
> 
> Consider a simple topology with operators A>B->C. (B forces to re-partition)
>  Streams nodes are s1(A), s2 (B,C) and kafka resides on cluster k, which
> might be in different network switch.
> So, rather than transferring data k->s1->s2, we make a round trip
> k->s1->k->s2. If we know that s1 and s2 are in the same network and data
> transfer is fast between two, we should not go through another intermediate
> layer.
> 
> 
> Thanks.
> 
> 
> 
> On Wed, Nov 29, 2017 at 4:52 PM, Jan Filipiak <jan.filip...@trivago.com>
> wrote:
> 
>> Hey,
>>
>> you making some wrong assumptions here.
>> Kafka Streams is in no way single threaded or
>> limited to one physical instance.
>> Having connectivity issues to your brokers is IMO
>> a problem with the deployment and not at all
>> with how kafka streams is designed and works.
>>
>> Kafka Streams moves hundreds of GB per day for us.
>>
>> Hope this helps.
>>
>> Best Jan
>>
>>
>>
>> On 29.11.2017 15:10, Adrienne Kole wrote:
>>
>>> Hi,
>>>
>>> The purpose of this email is to get overall intuition for the future
>>> plans
>>> of streams library.
>>>
>>> The main question is that, will it be a single threaded application in the
>>> long run and serve microservices use-cases, or are there any plans to
>>> extend it to multi-node execution framework with less kafka dependency.
>>>
>>> Currently, each streams node 'talks' with kafka cluster and they can
>>> indirectly talk with each other again through kafka. However, especially
>>> if
>>> kafka is not in the same network with streams nodes (actually this can
>>> happen if they are in the same network as well) this will cause high
>>> network overhead and inefficiency.
>>>
>>> One solution for this (bypassing network overhead) is to deploy streams
>>> node on kafka cluster to ensure the data locality. However, this is not
>>> recommended as the library and kafka can affect each other's performance
>>> and  streams does not necessarily have to know the internal data
>>> partitioning of kafka.
>>>
>>> Another solution would be extending streams library to have a common
>>> runtime. IMO, preserving the current selling points of streams (like
>>> dynamic scale in/out) with this kind of extensions can be very good
>>> improvement.
>>>
>>> So my question is that, will streams in the long/short run, will extend
>>> its
>>> use-cases to massive and efficient stream processing (and compete with
>>> spark) or stay and strengthen its current position?
>>>
>>> Cheers,
>>> Adrienne
>>>
>>>
>>
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to