> Does the parallelism_hint set when a KafkaSpout is added to a topology, > need to match the number of partitions in a topic?
No. On 06/05/2016 11:26 AM, Matthias J. Sax wrote: > Hi Kanagha, > > For reading, KafkaSpout's internally used KafkaConsumer ensures that > data is received in-order per partition. Because the spout might read > multiple partitions, and emit only a single (logical) output stream, > within this output stream, data from multiple partitions interleave (the > relative order within each partition is preserved though). It depends on > the connection pattern of your spout-downstream bolt, how the partitions > are distributed... (If you use shuffleGrouping, data of a single > partition, is distributed over all downstream bolt instances -- still, > order is preserved within a partition, but you get only some data per > partition on each bolt instance. After the first bolt, the order is not > guaranteed by Storm any more, because the data of a single partition is > spread out over multiple parallels bolt is this case.) > > If you want each partition to be processed by a single bolt, you need to > extract the partitionId (ie, add it to the Storm tuple) in the spout and > use fieldsGrouping on partitionId for downstream bolts. I guess, > KafkaSpout does not support this out of the box -- you can either patch > KafakSpout itself, if inherit from it to build you own > "PartionKafkaSpout" to add the partitionId to the output tuples. > > (Or maybe ask at u...@storm.apache.org ;)) > > For writing, you are correct. KafkaBolt uses key-based partitioning on > write and if you use fieldsGrouping on the key, it should work as intended. > > > -Matthias > > On 06/05/2016 07:51 AM, Kanagha wrote: >> Hi, >> >> I'm looking at the documentation for using KafkaSpout/KafkaBolt. >> >> https://github.com/apache/storm/tree/master/external/storm-kafka >> >> How is ordering guaranteed while reading messages from Kafka using >> KafkaSpout? >> Does the parallelism_hint set when a KafkaSpout is added to a topology, >> need to match the number of partitions in a topic? >> >> Similarly while writing back to Kafka, I believe fieldsGrouping can be used >> so that tuples that have same field value will go to the same task and can >> be written to the same partition. >> Would like to get suggestions on this. Thanks! >> >> Thanks >> Kanagha >> >
signature.asc
Description: OpenPGP digital signature