Hi,

I would appreciate if anyone could explain the reason behind the following 
behaviour. 

I’m running a topology on a Storm cluster consisting of a nimbus and two 
workers nodes. The topology is comprised of a KafkaSpout reading messages from 
a Kafka topic having 8 partitions, and a KafkaBolt writing the same messages 
back to another Kafka topic having 8 partitions as well. The data is shuffled 
between the spout and bolt using shuffle grouping. Parallelism hint is set to 8 
for both the spout and bolt. The semantics of the topology shouldn’t be as that 
important, but I’ll explain if required why no additional processing takes 
place. When running the GetOffsetShell class on the Kafka cluster, in order to 
determine the number of messages per partition of the output topic, I see the 
following: 

bytes-1496949913:2:0
bytes-1496949913:5:0
bytes-1496949913:4:0
bytes-1496949913:7:0
bytes-1496949913:1:99999992
bytes-1496949913:3:0
bytes-1496949913:6:0
bytes-1496949913:0:0

As depicted above, the second partition of the topic has all of the messages 
creating a quite strange imbalance. 

What could the reason behind this be? 

Thanks in advance,
Dominik

Reply via email to