Well depends upon use case. Say the metric you are evaluating is grouped by
a key and you want to parallelize the operation by adding more instances so
certain instance deal with only a particular group it is always better to
have partitioning also done on that key. This way a particular instance
w
No it wont work this way.
Say you have 9 partitions and 3 instances.
1 = {1, 2, 3}
2 = {4, 5, 6}
3 = (7, 8, 9}
And lets say a particular key (k1) is always written to partition 4.
Now say you increase partitions to 12 you may have:
1 = {1, 2, 3, 4}
2 = {5, 6, 7, 8}
3 = (9, 10, 11, 12}
Now it is po
@Sachin
>>The partition key is very important if you need to run multiple instances
of streams application and certain instance processing certain partitions
only.
Again, depending on partition key is optional. It's actually a feature
enabler, so we can use local state stores to improve throughput
@Sachin
>>is not elastic. You need to anticipate before hand on volume of data you
will have. Very difficult to add and reduce topic partitions later on.
Why do you say so Sachin? Kafka Streams will readjust once we add more
partitions to the Kafka topic. And when we add more machines, rebalancing
Kafka streams has a lower learning curve and if your source data is in
kafka topics it is pretty simple to integrate it with.
It can run like a library inside your main programs.
So as compared to spark streams
1. Is much simpler to implement.
2. Is not much heavy on hardware unlike spark.
On th
Hi,
Has anyone had an experience of using Kafka streams versus Spark?
I am not familiar with Kafka streams concept except that it is a set of
libraries.
Any feedback will be appreciated.
Regards,
Mich
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8