[metron] branch master updated: METRON-2066 Documentation and logging corrections (mmiklavc) closes apache/metron#1378

mmiklavcic Wed, 10 Apr 2019 12:05:19 -0700

This is an automated email from the ASF dual-hosted git repository.

mmiklavcic pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/metron.git



The following commit(s) were added to refs/heads/master by this push:
     new 54aa46e  METRON-2066 Documentation and logging corrections (mmiklavc) 
closes apache/metron#1378
54aa46e is described below

commit 54aa46ee44a329504559f417790324c175f5af6a
Author: mmiklavc <michael.miklav...@gmail.com>
AuthorDate: Wed Apr 10 13:04:03 2019 -0600

    METRON-2066 Documentation and logging corrections (mmiklavc) closes 
apache/metron#1378
---
 metron-platform/Performance-tuning-guide.md        |  2 +-
 metron-platform/README.md                          |  2 +-
 metron-platform/metron-common/README.md            | 18 +++++++++-
 metron-platform/metron-parsing/README.md           | 35 ++++++++++++++-----
 .../java/org/apache/metron/parsers/GrokParser.java | 39 +++++++++++-----------
 5 files changed, 64 insertions(+), 32 deletions(-)

diff --git a/metron-platform/Performance-tuning-guide.md 
b/metron-platform/Performance-tuning-guide.md
index bd5c126..fe1b01b 100644
--- a/metron-platform/Performance-tuning-guide.md
+++ b/metron-platform/Performance-tuning-guide.md
@@ -412,7 +412,7 @@ And we ran our bro parser topology with the following 
options. We did not need t
 though you could certainly do so if necessary. Notice that we only needed 1 
worker.
 
 ```
-/usr/metron/0.7.1/bin/start_parser_topology.sh \
+$METRON_HOME/bin/start_parser_topology.sh \
     -e ~metron/.storm/storm-bro.config \
     -esc ~/.storm/spout-bro.config \
     -k $BROKERLIST \
diff --git a/metron-platform/README.md b/metron-platform/README.md
index feb30e5..e5a7e6a 100644
--- a/metron-platform/README.md
+++ b/metron-platform/README.md
@@ -27,4 +27,4 @@ Extensible set of Storm topologies and topology attributes 
for streaming, enrich
 
 # Documentation
 
-Please see documentation within each individual module for description and 
usage instructions. Sample topologies are provided under Metron_Topologies to 
get you started with the framework. We pre-assume knowledge of Hadoop, Storm, 
Kafka, and HBase.
+Please see documentation within each individual module for description and 
usage instructions. Sample topologies are provided under Metron_Topologies to 
get you started with the framework. We pre-assume knowledge of Hadoop, Storm, 
Kafka, Zookeeper, and HBase.
diff --git a/metron-platform/metron-common/README.md 
b/metron-platform/metron-common/README.md
index 20f0eef..cbea9dd 100644
--- a/metron-platform/metron-common/README.md
+++ b/metron-platform/metron-common/README.md
@@ -18,6 +18,7 @@ limitations under the License.
 # Contents
 
 * [Stellar Language](#stellar-language)
+* [High Level Architecture](#high-level-architecture)
 * [Global Configuration](#global-configuration)
 * [Validation Framework](#validation-framework)
 * [Management Utility](#management-utility)
@@ -109,6 +110,20 @@ If a field is managed via ambari, you should change the 
field via
 ambari.  Otherwise, upon service restarts, you may find your update
 overwritten.
 
+# High Level Architecture
+
+As already pointed out in the main project README, Apache Metron is a Kappa 
architecture (see [Navigating the 
Architecture](../../#navigating-the-architecture)) primarily backed by Storm 
and Kafka. We additionally leverage:
+* Zookeeper for dynamic configuration updates to running Storm topologies. 
This enables us to push updates to our Storm topologies without restarting them.
+* HBase primarily for enrichments. But we also use it to store user state for 
our UI's.
+* HDFS for long term storage. Our parsed and enriched messages land here, 
along with any reported exceptions or errors encountered along the way.
+* Solr and Elasticsearch (plus Kibana) for real-time access. We provide out of 
the box compatibility with both Solr and Elasticsearch, and custom dashboards 
for data exploration in Kibana.
+* Zeppelin for providing dashboards to do custom analytics.
+
+Getting data "into" Metron is accomplished by setting up a Kafka topic for 
parsers to read from. There are a variety of options, including, but not 
limited to:
+* [Bro Kafka plugin](https://github.com/apache/metron-bro-plugin-kafka)
+* [Fastcapa](../../metron-sensors/fastcapa)
+* [NiFi](https://nifi.apache.org)
+
 # Validation Framework
 
 Inside of the global configuration, there is a validation framework in
@@ -336,7 +351,8 @@ Errors generated in Metron topologies are transformed into 
JSON format and follo
   "error_hash": 
"f7baf053f2d3c801a01d196f40f3468e87eea81788b2567423030100865c5061",
   "error_type": "parser_error",
   "message": "Unable to parse Message: {\"http\": 
{\"ts\":1488809627.000000.31915,\"uid\":\"C9JpSd2vFAWo3mXKz1\", ...",
-  "timestamp": 1488809630698
+  "timestamp": 1488809630698,
+  "guid": "bf9fb8d1-2507-4a41-a5b2-42f75f6ddc63"
 }
 ```
 
diff --git a/metron-platform/metron-parsing/README.md 
b/metron-platform/metron-parsing/README.md
index b8f44cb..e5368fe 100644
--- a/metron-platform/metron-parsing/README.md
+++ b/metron-platform/metron-parsing/README.md
@@ -15,8 +15,22 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express 
or implied.
 See the License for the specific language governing permissions and
 limitations under the License.
 -->
+
 # Parsers
 
+## Contents
+
+* [Introduction](#introduction)
+* [Parser Error Routing](#parser-error-routing)
+* [Filtering](#filtering)
+* [Parser Architecture](#parser-architecture)
+* [Message Format](#message-format)
+* [Global Configuration](#global-configuration)
+* [Parser Configuration](#parser-configuration)
+* [Parser Adapters](#parser-adapters)
+* [Kafka Queue](#kafka-queue)
+* [JSON Path](#json-path)
+
 ## Introduction
 
 Parsers are pluggable components which are used to transform raw data
@@ -27,12 +41,12 @@ There are two general types types of parsers:
 * A parser written in Java which conforms to the `MessageParser` interface.  
This kind of parser is optimized for speed and performance and is built for use 
with higher velocity topologies.  These parsers are not easily modifiable and 
in order to make changes to them the entire topology need to be recompiled.  
 * A general purpose parser.  This type of parser is primarily designed for 
lower-velocity topologies or for quickly standing up a parser for a new 
telemetry before a permanent Java parser can be written for it.  As of the time 
of this writing, we have:
     * Grok parser: `org.apache.metron.parsers.GrokParser` with possible 
`parserConfig` entries of
-        * `grokPath` : The path in HDFS (or in the Jar) to the grok statement
+        * `grokPath` : The path in HDFS (or in the Jar) to the grok statement. 
By default attempts to load from HDFS, then falls back to the classpath, and 
finally throws an exception if unable to load a pattern.
         * `patternLabel` : The pattern label to use from the grok statement
         * `multiLine` : The raw data passed in should be handled as a long 
with multiple lines, with each line to be parsed separately. This setting's 
valid values are 'true' or 'false'.  The default if unset is 'false'. When set 
the parser will handle multiple lines with successfully processed lines emitted 
normally, and lines with errors sent to the error topic.
-        * `timestampField` : The field to use for timestamp
-        * `timeFields` : A list of fields to be treated as time
-        * `dateFormat` : The date format to use to parse the time fields
+        * `timestampField` : The field to use for timestamp. If your data does 
not have a field exactly named "timestamp" this field is required, otherwise 
the record will not pass validation. If the timestampField is also included in 
the list of timeFields, it will first be parsed using the provided dateFormat.
+        * `timeFields` : A list of fields to be treated as time.
+        * `dateFormat` : The date format to use to parse the time fields. 
Default is "yyyy-MM-dd HH:mm:ss.S z".
         * `timezone` : The timezone to use. `UTC` is default.
         * The Grok parser supports either 1 line to parse per incoming 
message, or incoming messages with multiple log lines, and will produce a json 
message per line
     * CSV Parser: `org.apache.metron.parsers.csv.CSVParser` with possible 
`parserConfig` entries of
@@ -154,10 +168,13 @@ messages or marking messages as invalid.
 
 There are two reasons a message will be marked as invalid:
 * Fail [global validation](../../metron-common#validation-framework)
-* Fail the parser's validate function (generally that means to not have a 
`timestamp` field or a `original_string` field.
+* Fail the parser's validate function. Generally, that means not having a 
`timestamp` field or an `original_string` field.
 
-Those messages which are marked as invalid are sent to the error queue
-with an indication that they are invalid in the error message.
+Those messages which are marked as invalid are sent to the error queue with an 
indication that they
+are invalid in the error message.  The messages will contain 
"error_type":"parser_invalid". Note,
+you will not see additional exceptions in the logs for this type of failure, 
rather the error messages
+are written directly to the configured error topic. See [Topology 
Errors](../../metron-common#topology-errors)
+for more.
 
 ### Parser Errors
 
@@ -166,7 +183,7 @@ parse, are sent along to the error queue with a message 
indicating that
 there was an error in parse along with a stacktrace.  This is to
 distinguish from the invalid messages.
 
-## Filtered
+## Filtering
 
 One can also filter a message by specifying a `filterClassName` in the
 parser config.  Filtered messages are just dropped rather than passed
@@ -261,7 +278,7 @@ The document is structured in the following way
         }
         ```
 
-* `sensorTopic` : The kafka topic to send the parsed messages to.  If the 
topic is prefixed and suffixed by `/`
+* `sensorTopic` : The kafka topic to that the parser will read messages from.  
If the topic is prefixed and suffixed by `/`
 then it is assumed to be a regex and will match any topic matching the pattern 
(e.g. `/bro.*/` would match `bro_cust0`, `bro_cust1` and `bro_cust2`)
 * `readMetadata` : Boolean indicating whether to read metadata or not (The 
default is raw message strategy dependent).  See below for a discussion about 
metadata.
 * `mergeMetadata` : Boolean indicating whether to merge metadata with the 
message or not (The default is raw message strategy dependent).  See below for 
a discussion about metadata.
diff --git 
a/metron-platform/metron-parsing/metron-parsers-common/src/main/java/org/apache/metron/parsers/GrokParser.java
 
b/metron-platform/metron-parsing/metron-parsers-common/src/main/java/org/apache/metron/parsers/GrokParser.java
index f64b4af..616639c 100644
--- 
a/metron-platform/metron-parsing/metron-parsers-common/src/main/java/org/apache/metron/parsers/GrokParser.java
+++ 
b/metron-platform/metron-parsing/metron-parsers-common/src/main/java/org/apache/metron/parsers/GrokParser.java
@@ -20,19 +20,6 @@ package org.apache.metron.parsers;
 
 import com.google.common.base.Joiner;
 import com.google.common.base.Splitter;
-import oi.thekraken.grok.api.Grok;
-import oi.thekraken.grok.api.Match;
-import org.apache.commons.lang3.StringUtils;
-import org.apache.hadoop.conf.Configuration;
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
-import org.apache.metron.common.Constants;
-import org.apache.metron.parsers.interfaces.MessageParser;
-import org.apache.metron.parsers.interfaces.MessageParserResult;
-import org.json.simple.JSONObject;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
 import java.io.BufferedReader;
 import java.io.IOException;
 import java.io.InputStream;
@@ -50,6 +37,18 @@ import java.util.List;
 import java.util.Map;
 import java.util.Optional;
 import java.util.TimeZone;
+import oi.thekraken.grok.api.Grok;
+import oi.thekraken.grok.api.Match;
+import org.apache.commons.lang3.StringUtils;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.metron.common.Constants;
+import org.apache.metron.parsers.interfaces.MessageParser;
+import org.apache.metron.parsers.interfaces.MessageParserResult;
+import org.json.simple.JSONObject;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 
 public class GrokParser implements MessageParser<JSONObject>, Serializable {
@@ -96,9 +95,11 @@ public class GrokParser implements 
MessageParser<JSONObject>, Serializable {
   public InputStream openInputStream(String streamName) throws IOException {
     FileSystem fs = FileSystem.get(new Configuration());
     Path path = new Path(streamName);
-    if(fs.exists(path)) {
+    if (fs.exists(path)) {
+      LOG.info("Loading {} from HDFS.", streamName);
       return fs.open(path);
     } else {
+      LOG.info("File not found in HDFS, attempting to load {} from classpath 
using classloader for {}.", streamName, getClass());
       return getClass().getResourceAsStream(streamName);
     }
   }
@@ -108,7 +109,7 @@ public class GrokParser implements 
MessageParser<JSONObject>, Serializable {
     grok = new Grok();
     try {
       InputStream commonInputStream = openInputStream(patternsCommonDir);
-      LOG.debug("Grok parser loading common patterns from: {}", 
patternsCommonDir);
+      LOG.info("Grok parser loading common patterns from: {}", 
patternsCommonDir);
 
       if (commonInputStream == null) {
         throw new RuntimeException(
@@ -116,7 +117,7 @@ public class GrokParser implements 
MessageParser<JSONObject>, Serializable {
       }
 
       grok.addPatternFromReader(new InputStreamReader(commonInputStream));
-      LOG.debug("Loading parser-specific patterns from: {}", grokPath);
+      LOG.info("Loading parser-specific patterns from: {}", grokPath);
 
       InputStream patterInputStream = openInputStream(grokPath);
       if (patterInputStream == null) {
@@ -125,14 +126,12 @@ public class GrokParser implements 
MessageParser<JSONObject>, Serializable {
       }
       grok.addPatternFromReader(new InputStreamReader(patterInputStream));
 
-      if (LOG.isDebugEnabled()) {
-        LOG.debug("Grok parser set the following grok expression: {}", 
grok.getNamedRegexCollectionById(patternLabel));
-      }
+      LOG.info("Grok parser set the following grok expression for '{}': {}", 
patternLabel, grok.getPatterns().get(patternLabel));
 
       String grokPattern = "%{" + patternLabel + "}";
 
       grok.compile(grokPattern);
-      LOG.debug("Compiled grok pattern {}", grokPattern);
+      LOG.info("Compiled grok pattern {}", grokPattern);
 
     } catch (Throwable e) {
       LOG.error(e.getMessage(), e);

[metron] branch master updated: METRON-2066 Documentation and logging corrections (mmiklavc) closes apache/metron#1378

Reply via email to