It seems that I misinformed our situation by writing about the option 
"set_wait_secs (0)".

We performed all our tests by disabling parsers in the Metron Management UI, so 
I suppose they were all stopped using "storm kill <name>" (and not "storm kill 
<name> -w 0"). And in these tests we got reindexing some messages.

And now the question: does stopping the parser with the "storm kill <name>" 
command mean that the topology will complete the processing of all current 
events that were read by kafkaSpout and commit the corresponding offset to 
kafka?

On 2019/12/11 06:39:28, Michael Miklavcic <[email protected]> wrote: 
> It only does that if the arg stopNow is true. It's always false per the
> previous snippets I shared.
> 
> On Tue, Dec 10, 2019, 10:54 PM Vladimir Mikhailov <
> [email protected]> wrote:
> 
> > Hi Michael
> >
> > I think the problem is not on the REST side, but in the "StormCLIWrapper",
> > which it uses:
> >
> >
> > https://github.com/apache/metron/blob/88f4d2cefe4bbb389732da3b4f5cbcf02b7b949a/metron-interface/metron-rest/src/main/java/org/apache/metron/rest/service/impl/StormCLIWrapper.java#L145
> >
> > Each of the "StormCLIWrapper" methods: stopParserTopology,
> > stopEnrichmentTopology and stopIndexingTopology simply stop the
> > corresponding topologies with command "storm kill <name> [-w 0]", leading
> > to the described unpleasant consequences with re-indexing.
> >
> > Perhaps, instead, we should give the topology a certain command to stop
> > and wait until it finishes processing current events and commits changes to
> > kafka?
> >
> >
> > On 2019/12/10 18:18:28, Michael Miklavcic <[email protected]>
> > wrote:
> > > Where are you seeing this? As far as I can tell, the UI and REST
> > endpoints
> > > default to a graceful shutdown.
> > >
> > https://github.com/apache/metron/blob/master/metron-interface/metron-config/src/app/service/storm.service.ts#L154
> > >
> > https://github.com/apache/metron/blob/master/metron-interface/metron-rest/src/main/java/org/apache/metron/rest/controller/StormController.java#L91
> > >
> > >
> > > On Tue, Dec 10, 2019 at 4:11 AM Vladimir Mikhailov <
> > > [email protected]> wrote:
> > >
> > > > Hi
> > > >
> > > > We found the unpleasant consequences of each restart of the parsers:
> > each
> > > > time part of the events are reindexed again. Unfortunately, this was
> > > > confirmed by several special tests.
> > > >
> > > > Perhaps the reason for this is the method used to immediately stop the
> > > > storm topology using "killTopologyWithOpts" with the option
> > "set_wait_secs
> > > > (0)". Because of this, the topology does not have time to commit to
> > kafka
> > > > the current offsets of already processed events.
> > > >
> > > > After the parser starts, kafkaSpout starts reading uncommitted events
> > and
> > > > therefore some events are indexed twice.
> > > >
> > > > So the question is: is there a more elegant way to stop the parser
> > > > topology in order to avoid the problems described above? Of course, we
> > are
> > > > talking about changes to the source code, not some options or settings.
> > > >
> > > > If such a solution exists and the problem can be fixed, then I can
> > create
> > > > the corresponding issue at
> > https://issues.apache.org/jira/browse/METRON
> > > >
> > >
> >
> 

Reply via email to