This is an automated email from the ASF dual-hosted git repository. mmiklavcic pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/metron.git
The following commit(s) were added to refs/heads/master by this push: new 54aa46e METRON-2066 Documentation and logging corrections (mmiklavc) closes apache/metron#1378 54aa46e is described below commit 54aa46ee44a329504559f417790324c175f5af6a Author: mmiklavc <michael.miklav...@gmail.com> AuthorDate: Wed Apr 10 13:04:03 2019 -0600 METRON-2066 Documentation and logging corrections (mmiklavc) closes apache/metron#1378 --- metron-platform/Performance-tuning-guide.md | 2 +- metron-platform/README.md | 2 +- metron-platform/metron-common/README.md | 18 +++++++++- metron-platform/metron-parsing/README.md | 35 ++++++++++++++----- .../java/org/apache/metron/parsers/GrokParser.java | 39 +++++++++++----------- 5 files changed, 64 insertions(+), 32 deletions(-) diff --git a/metron-platform/Performance-tuning-guide.md b/metron-platform/Performance-tuning-guide.md index bd5c126..fe1b01b 100644 --- a/metron-platform/Performance-tuning-guide.md +++ b/metron-platform/Performance-tuning-guide.md @@ -412,7 +412,7 @@ And we ran our bro parser topology with the following options. We did not need t though you could certainly do so if necessary. Notice that we only needed 1 worker. ``` -/usr/metron/0.7.1/bin/start_parser_topology.sh \ +$METRON_HOME/bin/start_parser_topology.sh \ -e ~metron/.storm/storm-bro.config \ -esc ~/.storm/spout-bro.config \ -k $BROKERLIST \ diff --git a/metron-platform/README.md b/metron-platform/README.md index feb30e5..e5a7e6a 100644 --- a/metron-platform/README.md +++ b/metron-platform/README.md @@ -27,4 +27,4 @@ Extensible set of Storm topologies and topology attributes for streaming, enrich # Documentation -Please see documentation within each individual module for description and usage instructions. Sample topologies are provided under Metron_Topologies to get you started with the framework. We pre-assume knowledge of Hadoop, Storm, Kafka, and HBase. +Please see documentation within each individual module for description and usage instructions. Sample topologies are provided under Metron_Topologies to get you started with the framework. We pre-assume knowledge of Hadoop, Storm, Kafka, Zookeeper, and HBase. diff --git a/metron-platform/metron-common/README.md b/metron-platform/metron-common/README.md index 20f0eef..cbea9dd 100644 --- a/metron-platform/metron-common/README.md +++ b/metron-platform/metron-common/README.md @@ -18,6 +18,7 @@ limitations under the License. # Contents * [Stellar Language](#stellar-language) +* [High Level Architecture](#high-level-architecture) * [Global Configuration](#global-configuration) * [Validation Framework](#validation-framework) * [Management Utility](#management-utility) @@ -109,6 +110,20 @@ If a field is managed via ambari, you should change the field via ambari. Otherwise, upon service restarts, you may find your update overwritten. +# High Level Architecture + +As already pointed out in the main project README, Apache Metron is a Kappa architecture (see [Navigating the Architecture](../../#navigating-the-architecture)) primarily backed by Storm and Kafka. We additionally leverage: +* Zookeeper for dynamic configuration updates to running Storm topologies. This enables us to push updates to our Storm topologies without restarting them. +* HBase primarily for enrichments. But we also use it to store user state for our UI's. +* HDFS for long term storage. Our parsed and enriched messages land here, along with any reported exceptions or errors encountered along the way. +* Solr and Elasticsearch (plus Kibana) for real-time access. We provide out of the box compatibility with both Solr and Elasticsearch, and custom dashboards for data exploration in Kibana. +* Zeppelin for providing dashboards to do custom analytics. + +Getting data "into" Metron is accomplished by setting up a Kafka topic for parsers to read from. There are a variety of options, including, but not limited to: +* [Bro Kafka plugin](https://github.com/apache/metron-bro-plugin-kafka) +* [Fastcapa](../../metron-sensors/fastcapa) +* [NiFi](https://nifi.apache.org) + # Validation Framework Inside of the global configuration, there is a validation framework in @@ -336,7 +351,8 @@ Errors generated in Metron topologies are transformed into JSON format and follo "error_hash": "f7baf053f2d3c801a01d196f40f3468e87eea81788b2567423030100865c5061", "error_type": "parser_error", "message": "Unable to parse Message: {\"http\": {\"ts\":1488809627.000000.31915,\"uid\":\"C9JpSd2vFAWo3mXKz1\", ...", - "timestamp": 1488809630698 + "timestamp": 1488809630698, + "guid": "bf9fb8d1-2507-4a41-a5b2-42f75f6ddc63" } ``` diff --git a/metron-platform/metron-parsing/README.md b/metron-platform/metron-parsing/README.md index b8f44cb..e5368fe 100644 --- a/metron-platform/metron-parsing/README.md +++ b/metron-platform/metron-parsing/README.md @@ -15,8 +15,22 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. --> + # Parsers +## Contents + +* [Introduction](#introduction) +* [Parser Error Routing](#parser-error-routing) +* [Filtering](#filtering) +* [Parser Architecture](#parser-architecture) +* [Message Format](#message-format) +* [Global Configuration](#global-configuration) +* [Parser Configuration](#parser-configuration) +* [Parser Adapters](#parser-adapters) +* [Kafka Queue](#kafka-queue) +* [JSON Path](#json-path) + ## Introduction Parsers are pluggable components which are used to transform raw data @@ -27,12 +41,12 @@ There are two general types types of parsers: * A parser written in Java which conforms to the `MessageParser` interface. This kind of parser is optimized for speed and performance and is built for use with higher velocity topologies. These parsers are not easily modifiable and in order to make changes to them the entire topology need to be recompiled. * A general purpose parser. This type of parser is primarily designed for lower-velocity topologies or for quickly standing up a parser for a new telemetry before a permanent Java parser can be written for it. As of the time of this writing, we have: * Grok parser: `org.apache.metron.parsers.GrokParser` with possible `parserConfig` entries of - * `grokPath` : The path in HDFS (or in the Jar) to the grok statement + * `grokPath` : The path in HDFS (or in the Jar) to the grok statement. By default attempts to load from HDFS, then falls back to the classpath, and finally throws an exception if unable to load a pattern. * `patternLabel` : The pattern label to use from the grok statement * `multiLine` : The raw data passed in should be handled as a long with multiple lines, with each line to be parsed separately. This setting's valid values are 'true' or 'false'. The default if unset is 'false'. When set the parser will handle multiple lines with successfully processed lines emitted normally, and lines with errors sent to the error topic. - * `timestampField` : The field to use for timestamp - * `timeFields` : A list of fields to be treated as time - * `dateFormat` : The date format to use to parse the time fields + * `timestampField` : The field to use for timestamp. If your data does not have a field exactly named "timestamp" this field is required, otherwise the record will not pass validation. If the timestampField is also included in the list of timeFields, it will first be parsed using the provided dateFormat. + * `timeFields` : A list of fields to be treated as time. + * `dateFormat` : The date format to use to parse the time fields. Default is "yyyy-MM-dd HH:mm:ss.S z". * `timezone` : The timezone to use. `UTC` is default. * The Grok parser supports either 1 line to parse per incoming message, or incoming messages with multiple log lines, and will produce a json message per line * CSV Parser: `org.apache.metron.parsers.csv.CSVParser` with possible `parserConfig` entries of @@ -154,10 +168,13 @@ messages or marking messages as invalid. There are two reasons a message will be marked as invalid: * Fail [global validation](../../metron-common#validation-framework) -* Fail the parser's validate function (generally that means to not have a `timestamp` field or a `original_string` field. +* Fail the parser's validate function. Generally, that means not having a `timestamp` field or an `original_string` field. -Those messages which are marked as invalid are sent to the error queue -with an indication that they are invalid in the error message. +Those messages which are marked as invalid are sent to the error queue with an indication that they +are invalid in the error message. The messages will contain "error_type":"parser_invalid". Note, +you will not see additional exceptions in the logs for this type of failure, rather the error messages +are written directly to the configured error topic. See [Topology Errors](../../metron-common#topology-errors) +for more. ### Parser Errors @@ -166,7 +183,7 @@ parse, are sent along to the error queue with a message indicating that there was an error in parse along with a stacktrace. This is to distinguish from the invalid messages. -## Filtered +## Filtering One can also filter a message by specifying a `filterClassName` in the parser config. Filtered messages are just dropped rather than passed @@ -261,7 +278,7 @@ The document is structured in the following way } ``` -* `sensorTopic` : The kafka topic to send the parsed messages to. If the topic is prefixed and suffixed by `/` +* `sensorTopic` : The kafka topic to that the parser will read messages from. If the topic is prefixed and suffixed by `/` then it is assumed to be a regex and will match any topic matching the pattern (e.g. `/bro.*/` would match `bro_cust0`, `bro_cust1` and `bro_cust2`) * `readMetadata` : Boolean indicating whether to read metadata or not (The default is raw message strategy dependent). See below for a discussion about metadata. * `mergeMetadata` : Boolean indicating whether to merge metadata with the message or not (The default is raw message strategy dependent). See below for a discussion about metadata. diff --git a/metron-platform/metron-parsing/metron-parsers-common/src/main/java/org/apache/metron/parsers/GrokParser.java b/metron-platform/metron-parsing/metron-parsers-common/src/main/java/org/apache/metron/parsers/GrokParser.java index f64b4af..616639c 100644 --- a/metron-platform/metron-parsing/metron-parsers-common/src/main/java/org/apache/metron/parsers/GrokParser.java +++ b/metron-platform/metron-parsing/metron-parsers-common/src/main/java/org/apache/metron/parsers/GrokParser.java @@ -20,19 +20,6 @@ package org.apache.metron.parsers; import com.google.common.base.Joiner; import com.google.common.base.Splitter; -import oi.thekraken.grok.api.Grok; -import oi.thekraken.grok.api.Match; -import org.apache.commons.lang3.StringUtils; -import org.apache.hadoop.conf.Configuration; -import org.apache.hadoop.fs.FileSystem; -import org.apache.hadoop.fs.Path; -import org.apache.metron.common.Constants; -import org.apache.metron.parsers.interfaces.MessageParser; -import org.apache.metron.parsers.interfaces.MessageParserResult; -import org.json.simple.JSONObject; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - import java.io.BufferedReader; import java.io.IOException; import java.io.InputStream; @@ -50,6 +37,18 @@ import java.util.List; import java.util.Map; import java.util.Optional; import java.util.TimeZone; +import oi.thekraken.grok.api.Grok; +import oi.thekraken.grok.api.Match; +import org.apache.commons.lang3.StringUtils; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.metron.common.Constants; +import org.apache.metron.parsers.interfaces.MessageParser; +import org.apache.metron.parsers.interfaces.MessageParserResult; +import org.json.simple.JSONObject; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; public class GrokParser implements MessageParser<JSONObject>, Serializable { @@ -96,9 +95,11 @@ public class GrokParser implements MessageParser<JSONObject>, Serializable { public InputStream openInputStream(String streamName) throws IOException { FileSystem fs = FileSystem.get(new Configuration()); Path path = new Path(streamName); - if(fs.exists(path)) { + if (fs.exists(path)) { + LOG.info("Loading {} from HDFS.", streamName); return fs.open(path); } else { + LOG.info("File not found in HDFS, attempting to load {} from classpath using classloader for {}.", streamName, getClass()); return getClass().getResourceAsStream(streamName); } } @@ -108,7 +109,7 @@ public class GrokParser implements MessageParser<JSONObject>, Serializable { grok = new Grok(); try { InputStream commonInputStream = openInputStream(patternsCommonDir); - LOG.debug("Grok parser loading common patterns from: {}", patternsCommonDir); + LOG.info("Grok parser loading common patterns from: {}", patternsCommonDir); if (commonInputStream == null) { throw new RuntimeException( @@ -116,7 +117,7 @@ public class GrokParser implements MessageParser<JSONObject>, Serializable { } grok.addPatternFromReader(new InputStreamReader(commonInputStream)); - LOG.debug("Loading parser-specific patterns from: {}", grokPath); + LOG.info("Loading parser-specific patterns from: {}", grokPath); InputStream patterInputStream = openInputStream(grokPath); if (patterInputStream == null) { @@ -125,14 +126,12 @@ public class GrokParser implements MessageParser<JSONObject>, Serializable { } grok.addPatternFromReader(new InputStreamReader(patterInputStream)); - if (LOG.isDebugEnabled()) { - LOG.debug("Grok parser set the following grok expression: {}", grok.getNamedRegexCollectionById(patternLabel)); - } + LOG.info("Grok parser set the following grok expression for '{}': {}", patternLabel, grok.getPatterns().get(patternLabel)); String grokPattern = "%{" + patternLabel + "}"; grok.compile(grokPattern); - LOG.debug("Compiled grok pattern {}", grokPattern); + LOG.info("Compiled grok pattern {}", grokPattern); } catch (Throwable e) { LOG.error(e.getMessage(), e);