Hi, You misunderstood the focus of the post perhaps or I could not explain properly. I am not claiming the streams is limited to single node. Although the whole topology instance can be limited to a single node (each node run all topology), this is sth else. Also, I think that "moving 100s of GB data per day" claim is orthogonal and as this is not big/fast/ enough to reason.
The thing is that, for some use-cases streams-kafka-streams connection can be a bottleneck. Yes, if I have 40GB/s or infiniband network bandwidth this might not be an issue. Consider a simple topology with operators A>B->C. (B forces to re-partition) Streams nodes are s1(A), s2 (B,C) and kafka resides on cluster k, which might be in different network switch. So, rather than transferring data k->s1->s2, we make a round trip k->s1->k->s2. If we know that s1 and s2 are in the same network and data transfer is fast between two, we should not go through another intermediate layer. Thanks. On Wed, Nov 29, 2017 at 4:52 PM, Jan Filipiak <jan.filip...@trivago.com> wrote: > Hey, > > you making some wrong assumptions here. > Kafka Streams is in no way single threaded or > limited to one physical instance. > Having connectivity issues to your brokers is IMO > a problem with the deployment and not at all > with how kafka streams is designed and works. > > Kafka Streams moves hundreds of GB per day for us. > > Hope this helps. > > Best Jan > > > > On 29.11.2017 15:10, Adrienne Kole wrote: > >> Hi, >> >> The purpose of this email is to get overall intuition for the future >> plans >> of streams library. >> >> The main question is that, will it be a single threaded application in the >> long run and serve microservices use-cases, or are there any plans to >> extend it to multi-node execution framework with less kafka dependency. >> >> Currently, each streams node 'talks' with kafka cluster and they can >> indirectly talk with each other again through kafka. However, especially >> if >> kafka is not in the same network with streams nodes (actually this can >> happen if they are in the same network as well) this will cause high >> network overhead and inefficiency. >> >> One solution for this (bypassing network overhead) is to deploy streams >> node on kafka cluster to ensure the data locality. However, this is not >> recommended as the library and kafka can affect each other's performance >> and streams does not necessarily have to know the internal data >> partitioning of kafka. >> >> Another solution would be extending streams library to have a common >> runtime. IMO, preserving the current selling points of streams (like >> dynamic scale in/out) with this kind of extensions can be very good >> improvement. >> >> So my question is that, will streams in the long/short run, will extend >> its >> use-cases to massive and efficient stream processing (and compete with >> spark) or stay and strengthen its current position? >> >> Cheers, >> Adrienne >> >> >