Hi 

We found the unpleasant consequences of each restart of the parsers: each time 
part of the events are reindexed again. Unfortunately, this was confirmed by 
several special tests.

Perhaps the reason for this is the method used to immediately stop the storm 
topology using "killTopologyWithOpts" with the option "set_wait_secs (0)". 
Because of this, the topology does not have time to commit to kafka the current 
offsets of already processed events.

After the parser starts, kafkaSpout starts reading uncommitted events and 
therefore some events are indexed twice.

So the question is: is there a more elegant way to stop the parser topology in 
order to avoid the problems described above? Of course, we are talking about 
changes to the source code, not some options or settings.

If such a solution exists and the problem can be fixed, then I can create the 
corresponding issue at https://issues.apache.org/jira/browse/METRON

Reply via email to