Repository: metron Updated Branches: refs/heads/master 577ff80e3 -> 5c0ac32d1
METRON-1050 Improve Docs of `profiler.period.duration` (nickwallen) closes apache/metron#656 Project: http://git-wip-us.apache.org/repos/asf/metron/repo Commit: http://git-wip-us.apache.org/repos/asf/metron/commit/5c0ac32d Tree: http://git-wip-us.apache.org/repos/asf/metron/tree/5c0ac32d Diff: http://git-wip-us.apache.org/repos/asf/metron/diff/5c0ac32d Branch: refs/heads/master Commit: 5c0ac32d19e3805ec4b7ac587ed196e0431f8b35 Parents: 577ff80 Author: nickwallen <n...@nickallen.org> Authored: Tue Jul 25 17:56:36 2017 -0400 Committer: nickallen <nickal...@apache.org> Committed: Tue Jul 25 17:56:36 2017 -0400 ---------------------------------------------------------------------- .../metron-profiler-client/README.md | 1 - metron-analytics/metron-profiler/README.md | 138 +++++++++++++++---- .../src/main/config/profiler.properties | 4 +- metron-deployment/README.md | 5 +- site-book/bin/fix-md-dialect.py | 11 +- 5 files changed, 120 insertions(+), 39 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/metron/blob/5c0ac32d/metron-analytics/metron-profiler-client/README.md ---------------------------------------------------------------------- diff --git a/metron-analytics/metron-profiler-client/README.md b/metron-analytics/metron-profiler-client/README.md index 220dd5d..dcf30f6 100644 --- a/metron-analytics/metron-profiler-client/README.md +++ b/metron-analytics/metron-profiler-client/README.md @@ -60,7 +60,6 @@ want to change the global Client configuration so as not to disrupt the work of | profiler.client.salt.divisor | The salt divisor used to store profile data. | Optional | 1000 | | hbase.provider.impl | The name of the HBaseTableProvider implementation class. | Optional | | - ### Profile Selectors You will notice that the third argument for `PROFILE_GET` is a list of `ProfilePeriod` objects. This list is expected to http://git-wip-us.apache.org/repos/asf/metron/blob/5c0ac32d/metron-analytics/metron-profiler/README.md ---------------------------------------------------------------------- diff --git a/metron-analytics/metron-profiler/README.md b/metron-analytics/metron-profiler/README.md index 7a68969..66c5557 100644 --- a/metron-analytics/metron-profiler/README.md +++ b/metron-analytics/metron-profiler/README.md @@ -17,7 +17,7 @@ Any field contained within a message can be used to generate a profile. A profi Follow these instructions to install the Profiler. This assumes that core Metron has already been installed and validated. -1. Build the Metron RPMs by [following these instructions](../../metron-deployment#rpm). +1. Build the Metron RPMs (see Building the [RPMs](../../metron-deployment#rpms)). You may have already built the Metron RPMs when core Metron was installed. @@ -58,7 +58,7 @@ Follow these instructions to install the Profiler. This assumes that core Metro /usr/metron/0.4.1/lib/metron-profiler-0.4.0-uber.jar ``` -1. Create a table within HBase that will store the profile data. The table name and column family must match the [Profiler's configuration](#configuring-the-profiler). By default, the table is named `profiler` with a column family `P`. +1. Create a table within HBase that will store the profile data. By default, the table is named `profiler` with a column family `P`. The table name and column family must match the Profiler's configuration (see [Configuring the Profiler](#configuring-the-profiler)). ``` $ /usr/hdp/current/hbase-client/bin/hbase shell @@ -83,9 +83,9 @@ At this point the Profiler is running and consuming telemetry messages. We have ## Getting Started -This section will describe the steps required to get your first "Hello, World!"" profile running. This assumes that you have successfully [installed the Profiler](#installation) and have it running. +This section will describe the steps required to get your first "Hello, World!"" profile running. This assumes that you have a successful Profiler [Installation](#installation) and have it running. -1. Create the profile definition in a file located at `$METRON_HOME/config/zookeeper/profiler.json`. +1. Create the profile definition in a file located at `$METRON_HOME/config/zookeeper/profiler.json`. This file will likely not exist, if you have never created Profiles before. The following example will create a profile that simply counts the number of messages per `ip_src_addr`. ``` @@ -129,7 +129,7 @@ This section will describe the steps required to get your first "Hello, World!"" } ``` -1. Ensure that test messages are being sent to the Profiler's input topic in Kafka. The Profiler will consume messages from the `inputTopic` defined in the [Profiler's configuration](#configuring-the-profiler). By default this is the `indexing` topic. +1. Ensure that test messages are being sent to the Profiler's input topic in Kafka. The Profiler will consume messages from the input topic defined in the Profiler's configuration (see [Configuring the Profiler](#configuring-the-profiler)). By default this is the `indexing` topic. 1. Check the HBase table to validate that the Profiler is writing the profile. Remember that the Profiler is flushing the profile every 15 minutes. You will need to wait at least this long to start seeing profile data in HBase. ``` @@ -137,15 +137,19 @@ This section will describe the steps required to get your first "Hello, World!"" hbase(main):001:0> count 'profiler' ``` -1. Use the Profiler Client to read the profile data. The below example `PROFILE_GET` command will read data written by the sample profile given above, if 10.0.0.1 is one of the input values for `ip_src_addr`. -More information on configuring and using the client can be found [here](../metron-profiler-client). -It is assumed that the `PROFILE_GET` client is correctly configured before using it. +1. Use the [Profiler Client](../metron-profiler-client) to read the profile data. The following `PROFILE_GET` command will read the data written by the `hello-world` profile. This assumes that `10.0.0.1` is one of the values for `ip_src_addr` contained within the telemetry consumed by the Profiler. + ``` $ bin/stellar -z node1:2181 [Stellar]>>> PROFILE_GET( "hello-world", "10.0.0.1", PROFILE_FIXED(30, "MINUTES")) [451, 448] ``` + This result indicates that over the past 30 minutes, the Profiler stored two values related to the source IP address "10.0.0.1". In the first 15 minute period, the IP `10.0.0.1` was seen in 451 telemetry messages. In the second 15 minute period, the same IP was seen in 448 telemetry messages. + + It is assumed that the `PROFILE_GET` client is correctly configured to match the Profile configuration before using it to read that Profile. More information on configuring and using the Profiler client can be found [here](../metron-profiler-client). + + ## Creating Profiles The Profiler specification requires a JSON-formatted set of elements, many of which can contain Stellar code. The specification contains the following elements. (For the impatient, skip ahead to the [Examples](#examples).) @@ -275,27 +279,105 @@ The Profiler runs as an independent Storm topology. The configuration for the P The values can be changed on disk and then the Profiler topology must be restarted. -| Setting | Description -|--- |--- -| profiler.workers | The number of worker processes to create for the topology. -| profiler.executors | The number of executors to spawn per component. -| profiler.input.topic | The name of the Kafka topic from which to consume data. -| profiler.output.topic | The name of the Kafka topic to which profile data is written. Only used with profiles that use the [`triage` result field](#result). -| profiler.period.duration | The duration of each profile period. This value should be defined along with `profiler.period.duration.units`. -| profiler.period.duration.units | The units used to specify the `profiler.period.duration`. -| profiler.ttl | If a message has not been applied to a Profile in this period of time, the Profile will be forgotten and its resources will be cleaned up. This value should be defined along with `profiler.ttl.units`. -| profiler.ttl.units | The units used to specify the `profiler.ttl`. -| profiler.hbase.salt.divisor | A salt is prepended to the row key to help prevent hotspotting. This constant is used to generate the salt. Ideally, this constant should be roughly equal to the number of nodes in the Hbase cluster. -| profiler.hbase.table | The name of the HBase table that profiles are written to. -| profiler.hbase.column.family | The column family used to store profiles. -| profiler.hbase.batch | The number of puts that are written in a single batch. -| profiler.hbase.flush.interval.seconds | The maximum number of seconds between batch writes to HBase. - -After altering the configuration, start the Profiler. +| Setting | Description +|--- |--- +| [`profiler.input.topic`](#profilerinputtopic) | The name of the Kafka topic from which to consume data. +| [`profiler.output.topic`](#profileroutputtopic) | The name of the Kafka topic to which profile data is written. Only used with profiles that define the [`triage` result field](#result). +| [`profiler.period.duration`](#profilerperiodduration) | The duration of each profile period. +| [`profiler.period.duration.units`](#profilerperioddurationunits) | The units used to specify the [`profiler.period.duration`](#profilerperiodduration). +| [`profiler.workers`](#profilerworkers) | The number of worker processes for the topology. +| [`profiler.executors`](#profilerexecutors) | The number of executors to spawn per component. +| [`profiler.ttl`](#profilerttl) | If a message has not been applied to a Profile in this period of time, the Profile will be forgotten and its resources will be cleaned up. +| [`profiler.ttl.units`](#profilerttlunits) | The units used to specify the `profiler.ttl`. +| [`profiler.hbase.salt.divisor`](#profilerhbasesaltdivisor) | A salt is prepended to the row key to help prevent hotspotting. +| [`profiler.hbase.table`](#profilerhbasetable) | The name of the HBase table that profiles are written to. +| [`profiler.hbase.column.family`](#profilerhbasecolumnfamily) | The column family used to store profiles. +| [`profiler.hbase.batch`](#profilerhbasebatch) | The number of puts that are written to HBase in a single batch. +| [`profiler.hbase.flush.interval.seconds`](#profilerhbaseflushintervalseconds) | The maximum number of seconds between batch writes to HBase. -``` -$ $METRON_HOME/start_profiler_topology.sh -``` +### `profiler.input.topic` + +*Default*: indexing + +The name of the Kafka topic from which to consume data. By default, the Profiler consumes data from the `indexing` topic so that it has access to fully enriched telemetry. + +### `profiler.output.topic` + +*Default*: enrichments + +The name of the Kafka topic to which profile data is written. This property is only applicable to profiles that define the [`result` `triage` field](#result). This allows Profile data to be selectively triaged like any other source of telemetry in Metron. + +### `profiler.period.duration` + +*Default*: 15 + +The duration of each profile period. This value should be defined along with [`profiler.period.duration.units`](#profilerperioddurationunits). + +*Important*: To read a profile using the [Profiler Client](metron-analytics/metron-profiler-client), the Profiler Client's `profiler.client.period.duration` property must match this value. Otherwise, the Profiler Client will be unable to read the profile data. + +### `profiler.period.duration.units` + +*Default*: MINUTES + +The units used to specify the `profiler.period.duration`. This value should be defined along with [`profiler.period.duration`](#profilerperiodduration). + +*Important*: To read a profile using the Profiler Client, the Profiler Client's `profiler.client.period.duration.units` property must match this value. Otherwise, the [Profiler Client](metron-analytics/metron-profiler-client) will be unable to read the profile data. + +### `profiler.workers` + +*Default*: 1 + +The number of worker processes to create for the Profiler topology. This property is useful for performance tuning the Profiler. + +### `profiler.executors` + +*Default*: 0 + +The number of executors to spawn per component for the Profiler topology. This property is useful for performance tuning the Profiler. + +### `profiler.ttl` + +*Default*: 30 + + If a message has not been applied to a Profile in this period of time, the Profile will be terminated and its resources will be cleaned up. This value should be defined along with [`profiler.ttl.units`](#profilerttlunits). + + This time-to-live does not affect the persisted Profile data in HBase. It only affects the state stored in memory during the execution of the latest profile period. This state will be deleted if the time-to-live is exceeded. + +### `profiler.ttl.units` + +*Default*: MINUTES + +The units used to specify the [`profiler.ttl`](#profilerttl). + +### `profiler.hbase.salt.divisor` + +*Default*: 1000 + +A salt is prepended to the row key to help prevent hotspotting. This constant is used to generate the salt. This constant should be roughly equal to the number of nodes in the Hbase cluster to ensure even distribution of data. + +### `profiler.hbase.table` + +*Default*: profiler + +The name of the HBase table that profile data is written to. The Profiler expects that the table exists and is writable. It will not create the table. + +### `profiler.hbase.column.family` + +*Default*: P + +The column family used to store profile data in HBase. + +### `profiler.hbase.batch` + +*Default*: 10 + +The number of puts that are written to HBase in a single batch. + +### `profiler.hbase.flush.interval.seconds` + +*Default*: 30 + +The maximum number of seconds between batch writes to HBase. ## Examples http://git-wip-us.apache.org/repos/asf/metron/blob/5c0ac32d/metron-analytics/metron-profiler/src/main/config/profiler.properties ---------------------------------------------------------------------- diff --git a/metron-analytics/metron-profiler/src/main/config/profiler.properties b/metron-analytics/metron-profiler/src/main/config/profiler.properties index f020b30..873c837 100644 --- a/metron-analytics/metron-profiler/src/main/config/profiler.properties +++ b/metron-analytics/metron-profiler/src/main/config/profiler.properties @@ -24,12 +24,12 @@ topology.worker.childopts= ##### Profiler ##### -profiler.workers=1 -profiler.executors=0 profiler.input.topic=indexing profiler.output.topic=enrichments profiler.period.duration=15 profiler.period.duration.units=MINUTES +profiler.workers=1 +profiler.executors=0 profiler.ttl=30 profiler.ttl.units=MINUTES profiler.hbase.salt.divisor=1000 http://git-wip-us.apache.org/repos/asf/metron/blob/5c0ac32d/metron-deployment/README.md ---------------------------------------------------------------------- diff --git a/metron-deployment/README.md b/metron-deployment/README.md index dd3f510..9470fb5 100644 --- a/metron-deployment/README.md +++ b/metron-deployment/README.md @@ -61,7 +61,7 @@ This will set up ### Prerequisites - A cluster managed by Ambari 2.4.2+ - Metron RPMs available on the cluster in the /localrepo directory. See [RPM](#rpm) for further information. -- [Node.js](https://nodejs.org/en/download/package-manager/) repository installed on the Management UI host +- [Node.js](https://nodejs.org/en/download/package-manager/) repository installed on the Management UI host ### Building Management Pack From `metron-deployment` run @@ -104,7 +104,7 @@ There are a set of limitations that should be addressed based to improve the cur - Several configuration parameters used when installing the Metron service could (and should) be grabbed from Ambari. Install will require them to be manually entered. - Need to handle upgrading Metron -## RPM +## RPMs RPMs can be built to install the components in metron-platform. These RPMs are built in a Docker container and placed into `target`. Components in the RPMs: @@ -178,4 +178,3 @@ Using the MPack is preferred, but instructions for Kerberizing manually can be f ## TODO - Support Ubuntu deployments - http://git-wip-us.apache.org/repos/asf/metron/blob/5c0ac32d/site-book/bin/fix-md-dialect.py ---------------------------------------------------------------------- diff --git a/site-book/bin/fix-md-dialect.py b/site-book/bin/fix-md-dialect.py index 5e6db3e..02be2fb 100755 --- a/site-book/bin/fix-md-dialect.py +++ b/site-book/bin/fix-md-dialect.py @@ -59,7 +59,8 @@ import inspect import re # These are the characters excluded by Markdown from use in auto-generated anchor text for Headings. -EXCLUDED_CHARS_REGEX = r'[^\w\-]' # all non-alphanumerics except "-" and "_". Whitespace are previously converted. +EXCLUDED_CHARS_REGEX_GHM = r'[^\w\-]' # all non-alphanumerics except "-" and "_". Whitespace are previously converted. +EXCLUDED_CHARS_REGEX_DOX = r'[^\w\.\-]' # all non-alphanumerics except "-", "_", and ".". Whitespace are previously converted. def report_error(s) : print >>sys.stderr, "ERROR: " + s @@ -242,12 +243,12 @@ def rewrite_relative_links() : trace('labeltext = "' + labeltext + '"') scratch = labeltext.lower() # Github-MD forces all anchors to lowercase scratch = re.sub(r'[\s]', "-", scratch) # convert whitespace to "-" - scratch = re.sub(EXCLUDED_CHARS_REGEX, "", scratch) # strip non-alphanumerics + scratch = re.sub(EXCLUDED_CHARS_REGEX_GHM, "", scratch) # strip non-alphanumerics if (scratch == named_anchor) : trace("Found a rewritable case") scratch = labeltext # Doxia-markdown doesn't change case scratch = re.sub(r'[\s]', "_", scratch) # convert whitespace to "_" - scratch = re.sub(EXCLUDED_CHARS_REGEX, "", scratch) # strip non-alphanumerics + scratch = re.sub(EXCLUDED_CHARS_REGEX_DOX, "", scratch) # strip non-alphanumerics except "." href = re.sub("#" + named_anchor, "#" + scratch, href) trace("After anchor rewrite, href is: " + href) @@ -372,9 +373,9 @@ for FILENAME in sys.argv[1:] : active_type = "none" indent_stack.init_indent() if re.search(r'^#[^#]', inputline) : - # First-level headers ("H1") need explicit anchor inserted. This fixes problem #6. + # First-level headers ("H1") need explicit anchor inserted (Doxia style). This fixes problem #6. anchor_name = re.sub(r' ', "_", inputline[1:].strip()) - anchor_name = re.sub(EXCLUDED_CHARS_REGEX, "", anchor_name) + anchor_name = re.sub(EXCLUDED_CHARS_REGEX_DOX, "", anchor_name) anchor_text = '<a name="' + anchor_name + '"></a>' if H1_COUNT == 0 : # Treat the first header differently - put the header after instead of before