Re: Issue: reindexing of some events on parsers restart

Vladimir Mikhailov Tue, 17 Dec 2019 21:51:36 -0800

Yes, we specially did some more tests with EPS 100 and every time got a 10 
second window with reindexing


On 2019/12/17 18:42:17, Michael Miklavcic <[email protected]> wrote: 
> Is it always a 10 second window, or thereabouts?
> 
> On Sun, Dec 15, 2019 at 11:11 PM Vladimir Mikhailov <
> [email protected]> wrote:
> 
> > So, we conducted a number of additional tests that confirmed the problem.
> >
> > The test is very simple:
> >
> > we generated a very stable stream of events (2 EPS) and did a parser
> > restart and each time we received re-indexing of events with a 10 second
> > window.
> >
> > Parser is simple JSONMap without enrichment or TI rules.
> >
> > SENSOR PARSER CONFIG:
> >
> > {
> >         "parserClassName": "org.apache.metron.parsers.json.JSONMapParser",
> >         "filterClassName": null,
> >         "sensorTopic": "netflow-load-test-json",
> >         "outputTopic": null,
> >         "errorTopic": null,
> >         "writerClassName": null,
> >         "errorWriterClassName": null,
> >         "readMetadata": true,
> >         "mergeMetadata": true,
> >         "numWorkers": 2,
> >         "numAckers": 2,
> >         "spoutParallelism": 2,
> >         "spoutNumTasks": 2,
> >         "parserParallelism": 2,
> >         "parserNumTasks": 2,
> >         "errorWriterParallelism": 1,
> >         "errorWriterNumTasks": 1,
> >         "spoutConfig": {},
> >         "securityProtocol": null,
> >         "stormConfig": {},
> >         "parserConfig": {
> >                 "mapStrategy": "ALLOW",
> >                 "jsonpQuery": "$",
> >                 "wrapInEntityArray": false,
> >                 "overrideOriginalString": true
> >         },
> >         "fieldTransformations": [],
> >         "cacheConfig": {},
> >         "rawMessageStrategy": "DEFAULT",
> >         "rawMessageStrategyConfig": {}
> > }
> >
> > SENSOR ENRICHMENT CONFIG
> >
> > {
> >         "enrichment": {
> >                 "fieldMap": {},
> >                 "fieldToTypeMap": {},
> >                 "config": {}
> >         },
> >         "threatIntel": {
> >                 "fieldMap": {},
> >                 "fieldToTypeMap": {},
> >                 "config": {},
> >                 "triageConfig": {
> >                         "riskLevelRules": [],
> >                         "aggregator": "MAX",
> >                         "aggregationConfig": {}
> >                 }
> >         },
> >         "configuration": {}
> > }
> >
> > INDEXING CONFIGURATIONS
> >
> > {
> >         "hdfs": {
> >                 "batchSize": 1000,
> >                 "enabled": true,
> >                 "index": "netflow-load-test-json"
> >         },
> >         "elasticsearch": {
> >                 "batchSize": 1000,
> >                 "enabled": true,
> >                 "batchTimeout": 5,
> >                 "index": "netflow-load-test-json",
> >                 "fieldNameConverter": "NOOP"
> >         },
> >         "solr": {
> >                 "batchSize": 1,
> >                 "enabled": false,
> >                 "index": "netflow-load-test-json"
> >         }
> > }
> >
> >
> > Can anyone repeat this test and check if there are any recurring events in
> > the index?
> >
> >
> > On 2019/12/12 07:22:44, Vladimir Mikhailov <[email protected]>
> > wrote:
> > > Thanks for the clarification!
> > >
> > > So we need to conduct a few more tests to understand the cause of this
> > problem.
> > > I will write about the results.
> > >
> > > On 2019/12/11 14:01:45, Nick Allen <[email protected]> wrote:
> > > > > And now the question: does stopping the parser with the "storm kill
> > > > <name>" command mean that the topology will complete the processing of
> > all
> > > > current events that were read by kafkaSpout and commit the
> > corresponding
> > > > offset to kafka?
> > > >
> > > > Yes, it will wait as long as the topology's message timeout (by
> > default 30
> > > > seconds), which should be plenty of time to commit offsets.
> > > > http://storm.apache.org/releases/current/Command-line-client.html
> > > >
> > > > kill
> > > >
> > > > Syntax: storm kill topology-name [-w wait-time-secs]
> > > >
> > > > Kills the topology with the name topology-name. Storm will first
> > deactivate
> > > > the topology's spouts for the duration of the topology's message
> > timeout to
> > > > allow all messages currently being processed to finish processing.
> > Storm
> > > > will then shutdown the workers and clean up their state. You can
> > override
> > > > the length of time Storm waits between deactivation and shutdown with
> > the
> > > > -w flag.
> > > >
> > > >
> > > > On Wed, Dec 11, 2019 at 5:10 AM Vladimir Mikhailov <
> > > > [email protected]> wrote:
> > > >
> > > > > It seems that I misinformed our situation by writing about the option
> > > > > "set_wait_secs (0)".
> > > > >
> > > > > We performed all our tests by disabling parsers in the Metron
> > Management
> > > > > UI, so I suppose they were all stopped using "storm kill <name>"
> > (and not
> > > > > "storm kill <name> -w 0"). And in these tests we got reindexing some
> > > > > messages.
> > > > >
> > > > > And now the question: does stopping the parser with the "storm kill
> > > > > <name>" command mean that the topology will complete the processing
> > of all
> > > > > current events that were read by kafkaSpout and commit the
> > corresponding
> > > > > offset to kafka?
> > > > >
> > > > > On 2019/12/11 06:39:28, Michael Miklavcic <
> > [email protected]>
> > > > > wrote:
> > > > > > It only does that if the arg stopNow is true. It's always false
> > per the
> > > > > > previous snippets I shared.
> > > > > >
> > > > > > On Tue, Dec 10, 2019, 10:54 PM Vladimir Mikhailov <
> > > > > > [email protected]> wrote:
> > > > > >
> > > > > > > Hi Michael
> > > > > > >
> > > > > > > I think the problem is not on the REST side, but in the
> > > > > "StormCLIWrapper",
> > > > > > > which it uses:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > >
> > https://github.com/apache/metron/blob/88f4d2cefe4bbb389732da3b4f5cbcf02b7b949a/metron-interface/metron-rest/src/main/java/org/apache/metron/rest/service/impl/StormCLIWrapper.java#L145
> > > > > > >
> > > > > > > Each of the "StormCLIWrapper" methods: stopParserTopology,
> > > > > > > stopEnrichmentTopology and stopIndexingTopology simply stop the
> > > > > > > corresponding topologies with command "storm kill <name> [-w 0]",
> > > > > leading
> > > > > > > to the described unpleasant consequences with re-indexing.
> > > > > > >
> > > > > > > Perhaps, instead, we should give the topology a certain command
> > to stop
> > > > > > > and wait until it finishes processing current events and commits
> > > > > changes to
> > > > > > > kafka?
> > > > > > >
> > > > > > >
> > > > > > > On 2019/12/10 18:18:28, Michael Miklavcic <
> > [email protected]
> > > > > >
> > > > > > > wrote:
> > > > > > > > Where are you seeing this? As far as I can tell, the UI and
> > REST
> > > > > > > endpoints
> > > > > > > > default to a graceful shutdown.
> > > > > > > >
> > > > > > >
> > > > >
> > https://github.com/apache/metron/blob/master/metron-interface/metron-config/src/app/service/storm.service.ts#L154
> > > > > > > >
> > > > > > >
> > > > >
> > https://github.com/apache/metron/blob/master/metron-interface/metron-rest/src/main/java/org/apache/metron/rest/controller/StormController.java#L91
> > > > > > > >
> > > > > > > >
> > > > > > > > On Tue, Dec 10, 2019 at 4:11 AM Vladimir Mikhailov <
> > > > > > > > [email protected]> wrote:
> > > > > > > >
> > > > > > > > > Hi
> > > > > > > > >
> > > > > > > > > We found the unpleasant consequences of each restart of the
> > > > > parsers:
> > > > > > > each
> > > > > > > > > time part of the events are reindexed again. Unfortunately,
> > this
> > > > > was
> > > > > > > > > confirmed by several special tests.
> > > > > > > > >
> > > > > > > > > Perhaps the reason for this is the method used to
> > immediately stop
> > > > > the
> > > > > > > > > storm topology using "killTopologyWithOpts" with the option
> > > > > > > "set_wait_secs
> > > > > > > > > (0)". Because of this, the topology does not have time to
> > commit to
> > > > > > > kafka
> > > > > > > > > the current offsets of already processed events.
> > > > > > > > >
> > > > > > > > > After the parser starts, kafkaSpout starts reading
> > uncommitted
> > > > > events
> > > > > > > and
> > > > > > > > > therefore some events are indexed twice.
> > > > > > > > >
> > > > > > > > > So the question is: is there a more elegant way to stop the
> > parser
> > > > > > > > > topology in order to avoid the problems described above? Of
> > > > > course, we
> > > > > > > are
> > > > > > > > > talking about changes to the source code, not some options or
> > > > > settings.
> > > > > > > > >
> > > > > > > > > If such a solution exists and the problem can be fixed, then
> > I can
> > > > > > > create
> > > > > > > > > the corresponding issue at
> > > > > > > https://issues.apache.org/jira/browse/METRON
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Issue: reindexing of some events on parsers restart

Reply via email to