[02/32] samza git commit: Reorganize website content, link hyper-links correctly, fix image links

jagadish Thu, 11 Oct 2018 19:47:57 -0700

Reorganize website content, link hyper-links correctly, fix image links


Project: http://git-wip-us.apache.org/repos/asf/samza/repo
Commit: http://git-wip-us.apache.org/repos/asf/samza/commit/1bf8bf5a
Tree: http://git-wip-us.apache.org/repos/asf/samza/tree/1bf8bf5a
Diff: http://git-wip-us.apache.org/repos/asf/samza/diff/1bf8bf5a

Branch: refs/heads/master
Commit: 1bf8bf5a632cac7548223ab9d990ce6e70c1c2f0
Parents: 334d24e
Author: Jagadish <jvenkatra...@linkedin.com>
Authored: Mon Oct 1 15:44:22 2018 -0700
Committer: Jagadish <jvenkatra...@linkedin.com>
Committed: Mon Oct 1 15:45:42 2018 -0700

----------------------------------------------------------------------
 .../learn/documentation/container/jconsole.png  | Bin 145220 -> 0 bytes
 .../learn/documentation/operations/jconsole.png | Bin 0 -> 145220 bytes
 .../learn/documentation/operations/visualvm.png | Bin 0 -> 198050 bytes
 .../versioned/api/high-level-api.md             |  24 +
 .../versioned/api/low-level-api.md              |  52 ++
 .../documentation/versioned/api/samza-sql.md    |  52 ++
 .../architecture/architecture-overview.md       |  23 +
 .../versioned/architecture/kinesis.md           |  23 +
 .../documentation/versioned/aws/kinesis.md      | 124 ----
 .../versioned/connectors/eventhubs.md           |  24 +
 .../documentation/versioned/connectors/hdfs.md  |  24 +
 .../versioned/connectors/kinesis.md             | 124 ++++
 .../versioned/connectors/overview.md            |  24 +
 .../versioned/container/monitoring.md           | 612 -------------------
 .../versioned/core-concepts/core-concepts.md    |  23 +
 .../versioned/deployment/standalone.md          | 217 +++++++
 .../documentation/versioned/deployment/yarn.md  |  27 +
 docs/learn/documentation/versioned/index.html   |  35 +-
 .../versioned/operations/monitoring.md          | 612 +++++++++++++++++++
 19 files changed, 1264 insertions(+), 756 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/samza/blob/1bf8bf5a/docs/img/versioned/learn/documentation/container/jconsole.png
----------------------------------------------------------------------
diff --git a/docs/img/versioned/learn/documentation/container/jconsole.png 
b/docs/img/versioned/learn/documentation/container/jconsole.png
deleted file mode 100644
index 6058b16..0000000
Binary files a/docs/img/versioned/learn/documentation/container/jconsole.png 
and /dev/null differ

http://git-wip-us.apache.org/repos/asf/samza/blob/1bf8bf5a/docs/img/versioned/learn/documentation/operations/jconsole.png
----------------------------------------------------------------------
diff --git a/docs/img/versioned/learn/documentation/operations/jconsole.png 
b/docs/img/versioned/learn/documentation/operations/jconsole.png
new file mode 100644
index 0000000..6058b16
Binary files /dev/null and 
b/docs/img/versioned/learn/documentation/operations/jconsole.png differ

http://git-wip-us.apache.org/repos/asf/samza/blob/1bf8bf5a/docs/img/versioned/learn/documentation/operations/visualvm.png
----------------------------------------------------------------------
diff --git a/docs/img/versioned/learn/documentation/operations/visualvm.png 
b/docs/img/versioned/learn/documentation/operations/visualvm.png
new file mode 100644
index 0000000..4399d7f
Binary files /dev/null and 
b/docs/img/versioned/learn/documentation/operations/visualvm.png differ

http://git-wip-us.apache.org/repos/asf/samza/blob/1bf8bf5a/docs/learn/documentation/versioned/api/high-level-api.md
----------------------------------------------------------------------
diff --git a/docs/learn/documentation/versioned/api/high-level-api.md 
b/docs/learn/documentation/versioned/api/high-level-api.md
new file mode 100644
index 0000000..2a54215
--- /dev/null
+++ b/docs/learn/documentation/versioned/api/high-level-api.md
@@ -0,0 +1,24 @@
+---
+layout: page
+title: Streams DSL
+---
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
+
+# High level API section 1
+

http://git-wip-us.apache.org/repos/asf/samza/blob/1bf8bf5a/docs/learn/documentation/versioned/api/low-level-api.md
----------------------------------------------------------------------
diff --git a/docs/learn/documentation/versioned/api/low-level-api.md 
b/docs/learn/documentation/versioned/api/low-level-api.md
new file mode 100644
index 0000000..c162ca2
--- /dev/null
+++ b/docs/learn/documentation/versioned/api/low-level-api.md
@@ -0,0 +1,52 @@
+---
+layout: page
+title: Low level API
+---
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
+
+# Section 1
+
+# Sample Applications
+
+
+# Section 2
+
+# Section 3
+
+
+# Section 4
+
+The table below summarizes table metrics:
+
+
+| Metrics | Class | Description |
+|---------|-------|-------------|
+|`get-ns`|`ReadableTable`|Average latency of `get/getAsync()` operations|
+|`getAll-ns`|`ReadableTable`|Average latency of `getAll/getAllAsync()` 
operations|
+|`num-gets`|`ReadableTable`|Count of `get/getAsync()` operations
+|`num-getAlls`|`ReadableTable`|Count of `getAll/getAllAsync()` operations
+
+
+### Section 5 example
+
+It is up to the developer whether to implement both `TableReadFunction` and 
+`TableWriteFunction` in one class or two separate classes. Defining them in 
+separate classes can be cleaner if their implementations are elaborate and 
+extended, whereas keeping them in a single class may be more practical if 
+they share a considerable amount of code or are relatively short.

http://git-wip-us.apache.org/repos/asf/samza/blob/1bf8bf5a/docs/learn/documentation/versioned/api/samza-sql.md
----------------------------------------------------------------------
diff --git a/docs/learn/documentation/versioned/api/samza-sql.md 
b/docs/learn/documentation/versioned/api/samza-sql.md
new file mode 100644
index 0000000..bad7545
--- /dev/null
+++ b/docs/learn/documentation/versioned/api/samza-sql.md
@@ -0,0 +1,52 @@
+---
+layout: page
+title: Samza SQL
+---
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
+
+# Section 1
+
+# Sample Applications
+
+
+# Section 2
+
+# Section 3
+
+
+# Section 4
+
+The table below summarizes table metrics:
+
+
+| Metrics | Class | Description |
+|---------|-------|-------------|
+|`get-ns`|`ReadableTable`|Average latency of `get/getAsync()` operations|
+|`getAll-ns`|`ReadableTable`|Average latency of `getAll/getAllAsync()` 
operations|
+|`num-gets`|`ReadableTable`|Count of `get/getAsync()` operations
+|`num-getAlls`|`ReadableTable`|Count of `getAll/getAllAsync()` operations
+
+
+### Section 5 example
+
+It is up to the developer whether to implement both `TableReadFunction` and 
+`TableWriteFunction` in one class or two separate classes. Defining them in 
+separate classes can be cleaner if their implementations are elaborate and 
+extended, whereas keeping them in a single class may be more practical if 
+they share a considerable amount of code or are relatively short.

http://git-wip-us.apache.org/repos/asf/samza/blob/1bf8bf5a/docs/learn/documentation/versioned/architecture/architecture-overview.md
----------------------------------------------------------------------
diff --git 
a/docs/learn/documentation/versioned/architecture/architecture-overview.md 
b/docs/learn/documentation/versioned/architecture/architecture-overview.md
new file mode 100644
index 0000000..6c1fbb1
--- /dev/null
+++ b/docs/learn/documentation/versioned/architecture/architecture-overview.md
@@ -0,0 +1,23 @@
+---
+layout: page
+title: Architecture page
+---
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
+## Samza architecture page
+

http://git-wip-us.apache.org/repos/asf/samza/blob/1bf8bf5a/docs/learn/documentation/versioned/architecture/kinesis.md
----------------------------------------------------------------------
diff --git a/docs/learn/documentation/versioned/architecture/kinesis.md 
b/docs/learn/documentation/versioned/architecture/kinesis.md
new file mode 100644
index 0000000..6c1fbb1
--- /dev/null
+++ b/docs/learn/documentation/versioned/architecture/kinesis.md
@@ -0,0 +1,23 @@
+---
+layout: page
+title: Architecture page
+---
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
+## Samza architecture page
+

http://git-wip-us.apache.org/repos/asf/samza/blob/1bf8bf5a/docs/learn/documentation/versioned/aws/kinesis.md
----------------------------------------------------------------------
diff --git a/docs/learn/documentation/versioned/aws/kinesis.md 
b/docs/learn/documentation/versioned/aws/kinesis.md
deleted file mode 100644
index a866484..0000000
--- a/docs/learn/documentation/versioned/aws/kinesis.md
+++ /dev/null
@@ -1,124 +0,0 @@
----
-layout: page
-title: Kinesis Connector
----
-<!--
-   Licensed to the Apache Software Foundation (ASF) under one or more
-   contributor license agreements.  See the NOTICE file distributed with
-   this work for additional information regarding copyright ownership.
-   The ASF licenses this file to You under the Apache License, Version 2.0
-   (the "License"); you may not use this file except in compliance with
-   the License.  You may obtain a copy of the License at
-
-       http://www.apache.org/licenses/LICENSE-2.0
-
-   Unless required by applicable law or agreed to in writing, software
-   distributed under the License is distributed on an "AS IS" BASIS,
-   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-   See the License for the specific language governing permissions and
-   limitations under the License.
--->
-
-## Overview
-
-The Samza Kinesis connector provides access to [Amazon Kinesis Data 
Streams](https://aws.amazon.com/kinesis/data-streams),
-Amazonâs data streaming service. A Kinesis Data Stream is similar to a Kafka 
topic and can have multiple partitions.
-Each message consumed from a Kinesis Data Stream is an instance of 
[Record](http://docs.aws.amazon.com/goto/WebAPI/kinesis-2013-12-02/Record).
-Samzaâs 
[KinesisSystemConsumer](https://github.com/apache/samza/blob/master/samza-aws/src/main/java/org/apache/samza/system/kinesis/consumer/KinesisSystemConsumer.java)
-wraps the Record into a 
[KinesisIncomingMessageEnvelope](https://github.com/apache/samza/blob/master/samza-aws/src/main/java/org/apache/samza/system/kinesis/consumer/KinesisIncomingMessageEnvelope.java).
-
-## Consuming from Kinesis
-
-### Basic Configuration
-
-You can configure your Samza jobs to process data from Kinesis Streams. To 
configure Samza job to consume from Kinesis
-streams, please add the below configuration:
-
-{% highlight jproperties %}
-// define a kinesis system factory with your identifier. eg: kinesis-system
-systems.kinesis-system.samza.factory=org.apache.samza.system.eventhub.KinesisSystemFactory
-
-// kinesis system consumer works with only AllSspToSingleTaskGrouperFactory
-job.systemstreampartition.grouper.factory=org.apache.samza.container.grouper.stream.AllSspToSingleTaskGrouperFactory
-
-// define your streams
-task.inputs=kinesis-system.input0
-
-// define required properties for your streams
-systems.kinesis-system.streams.input0.aws.region=YOUR-STREAM-REGION
-systems.kinesis-system.streams.input0.aws.accessKey=YOUR-ACCESS_KEY
-sensitive.systems.kinesis-system.streams.input0.aws.secretKey=YOUR-SECRET-KEY
-{% endhighlight %}
-
-The tuple required to access the Kinesis data stream must be provided, namely 
the following fields:<br>
-**YOUR-STREAM-REGION**, **YOUR-ACCESS-KEY**, **YOUR-SECRET-KEY**.
-
-
-### Advanced Configuration
-
-#### AWS Client configs
-You can configure any [AWS client 
config](http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/ClientConfiguration.html)
-with the prefix **systems.system-name.aws.clientConfig.***
-
-{% highlight jproperties %}
-systems.system-name.aws.clientConfig.CONFIG-PARAM=CONFIG-VALUE
-{% endhighlight %}
-
-As an example, to set a *proxy host* and *proxy port* for the AWS Client:
-
-{% highlight jproperties %}
-systems.system-name.aws.clientConfig.ProxyHost=my-proxy-host.com
-systems.system-name.aws.clientConfig.ProxyPort=my-proxy-port
-{% endhighlight %}
-
-#### Kinesis Client Library Configs
-Samza Kinesis Connector uses [Kinesis Client 
Library](https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-kcl.html#kinesis-record-processor-overview-kcl)
-(KCL) to access the Kinesis data streams. You can set any [Kinesis Client Lib 
Configuration](https://github.com/awslabs/amazon-kinesis-client/blob/master/amazon-kinesis-client-multilang/src/main/java/software/amazon/kinesis/coordinator/KinesisClientLibConfiguration.java)
-for a stream by configuring it under 
**systems.system-name.streams.stream-name.aws.kcl.***
-
-{% highlight jproperties %}
-systems.system-name.streams.stream-name.aws.kcl.CONFIG-PARAM=CONFIG-VALUE
-{% endhighlight %}
-
-Obtain the config param from the public functions in [Kinesis Client Lib 
Configuration](https://github.com/awslabs/amazon-kinesis-client/blob/master/amazon-kinesis-client-multilang/src/main/java/software/amazon/kinesis/coordinator/KinesisClientLibConfiguration.java)
-by removing the *"with"* prefix. For example: config param corresponding to 
**withTableName()** is **TableName**.
-
-### Resetting Offsets
-
-The source of truth for checkpointing while using Kinesis Connector is not the 
Samza checkpoint topic but Kinesis itself.
-The Kinesis Client Library (KCL) [uses 
DynamoDB](https://docs.aws.amazon.com/streams/latest/dev/kinesis-record-processor-ddb.html)
-to store itâs checkpoints. By default, Kinesis Connector reads from the 
latest offset in the stream.
-
-To reset the checkpoints and consume from earliest/latest offset of a Kinesis 
data stream, please change the KCL TableName
-and set the appropriate starting position for the stream as shown below.
-
-{% highlight jproperties %}
-// change the TableName to a unique name to reset checkpoint.
-systems.kinesis-system.streams.input0.aws.kcl.TableName=my-app-table-name
-// set the starting position to either TRIM_HORIZON (oldest) or LATEST (latest)
-systems.kinesis-system.streams.input0.aws.kcl.InitialPositionInStream=my-start-position
-{% endhighlight %}
-
-To manipulate checkpoints to start from a particular position in the Kinesis 
stream, in lieu of Samza CheckpointTool,
-please login to the AWS Console and change the offsets in the DynamoDB Table 
with the table name that you have specified
-in the config above. By default, the table name has the following format:
-"\<job name\>-\<job id\>-\<kinesis stream\>".
-
-### Known Limitations
-
-The following limitations apply to Samza jobs consuming from Kinesis streams 
using the Samza consumer:
-
-- Stateful processing (eg: windows or joins) is not supported on Kinesis 
streams. However, you can accomplish this by
-chaining two Samza jobs where the first job reads from Kinesis and sends to 
Kafka while the second job processes the
-data from Kafka.
-- Kinesis streams cannot be configured as 
[bootstrap](https://samza.apache.org/learn/documentation/latest/container/streams.html)
-or 
[broadcast](https://samza.apache.org/learn/documentation/latest/container/samza-container.html)
 streams.
-- Kinesis streams must be used ONLY with the 
[AllSspToSingleTaskGrouperFactory](https://github.com/apache/samza/blob/master/samza-core/src/main/java/org/apache/samza/container/grouper/stream/AllSspToSingleTaskGrouperFactory.java)
-as the Kinesis consumer does the partition management by itself. No other 
grouper is supported.
-- A Samza job that consumes from Kinesis cannot consume from any other input 
source. However, you can send your results
-to any destination (eg: Kafka, EventHubs), and have another Samza job consume 
them.
-
-## Producing to Kinesis
-
-The KinesisSystemProducer for Samza is not yet implemented.
-

http://git-wip-us.apache.org/repos/asf/samza/blob/1bf8bf5a/docs/learn/documentation/versioned/connectors/eventhubs.md
----------------------------------------------------------------------
diff --git a/docs/learn/documentation/versioned/connectors/eventhubs.md 
b/docs/learn/documentation/versioned/connectors/eventhubs.md
new file mode 100644
index 0000000..b99b46d
--- /dev/null
+++ b/docs/learn/documentation/versioned/connectors/eventhubs.md
@@ -0,0 +1,24 @@
+---
+layout: page
+title: Eventhubs Connector
+---
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
+# Section 1
+# Section 2
+

http://git-wip-us.apache.org/repos/asf/samza/blob/1bf8bf5a/docs/learn/documentation/versioned/connectors/hdfs.md
----------------------------------------------------------------------
diff --git a/docs/learn/documentation/versioned/connectors/hdfs.md 
b/docs/learn/documentation/versioned/connectors/hdfs.md
new file mode 100644
index 0000000..a78c4aa
--- /dev/null
+++ b/docs/learn/documentation/versioned/connectors/hdfs.md
@@ -0,0 +1,24 @@
+---
+layout: page
+title: HDFS Connector
+---
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
+# Section 1
+# Section 2
+

http://git-wip-us.apache.org/repos/asf/samza/blob/1bf8bf5a/docs/learn/documentation/versioned/connectors/kinesis.md
----------------------------------------------------------------------
diff --git a/docs/learn/documentation/versioned/connectors/kinesis.md 
b/docs/learn/documentation/versioned/connectors/kinesis.md
new file mode 100644
index 0000000..a866484
--- /dev/null
+++ b/docs/learn/documentation/versioned/connectors/kinesis.md
@@ -0,0 +1,124 @@
+---
+layout: page
+title: Kinesis Connector
+---
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
+## Overview
+
+The Samza Kinesis connector provides access to [Amazon Kinesis Data 
Streams](https://aws.amazon.com/kinesis/data-streams),
+Amazonâs data streaming service. A Kinesis Data Stream is similar to a Kafka 
topic and can have multiple partitions.
+Each message consumed from a Kinesis Data Stream is an instance of 
[Record](http://docs.aws.amazon.com/goto/WebAPI/kinesis-2013-12-02/Record).
+Samzaâs 
[KinesisSystemConsumer](https://github.com/apache/samza/blob/master/samza-aws/src/main/java/org/apache/samza/system/kinesis/consumer/KinesisSystemConsumer.java)
+wraps the Record into a 
[KinesisIncomingMessageEnvelope](https://github.com/apache/samza/blob/master/samza-aws/src/main/java/org/apache/samza/system/kinesis/consumer/KinesisIncomingMessageEnvelope.java).
+
+## Consuming from Kinesis
+
+### Basic Configuration
+
+You can configure your Samza jobs to process data from Kinesis Streams. To 
configure Samza job to consume from Kinesis
+streams, please add the below configuration:
+
+{% highlight jproperties %}
+// define a kinesis system factory with your identifier. eg: kinesis-system
+systems.kinesis-system.samza.factory=org.apache.samza.system.eventhub.KinesisSystemFactory
+
+// kinesis system consumer works with only AllSspToSingleTaskGrouperFactory
+job.systemstreampartition.grouper.factory=org.apache.samza.container.grouper.stream.AllSspToSingleTaskGrouperFactory
+
+// define your streams
+task.inputs=kinesis-system.input0
+
+// define required properties for your streams
+systems.kinesis-system.streams.input0.aws.region=YOUR-STREAM-REGION
+systems.kinesis-system.streams.input0.aws.accessKey=YOUR-ACCESS_KEY
+sensitive.systems.kinesis-system.streams.input0.aws.secretKey=YOUR-SECRET-KEY
+{% endhighlight %}
+
+The tuple required to access the Kinesis data stream must be provided, namely 
the following fields:<br>
+**YOUR-STREAM-REGION**, **YOUR-ACCESS-KEY**, **YOUR-SECRET-KEY**.
+
+
+### Advanced Configuration
+
+#### AWS Client configs
+You can configure any [AWS client 
config](http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/ClientConfiguration.html)
+with the prefix **systems.system-name.aws.clientConfig.***
+
+{% highlight jproperties %}
+systems.system-name.aws.clientConfig.CONFIG-PARAM=CONFIG-VALUE
+{% endhighlight %}
+
+As an example, to set a *proxy host* and *proxy port* for the AWS Client:
+
+{% highlight jproperties %}
+systems.system-name.aws.clientConfig.ProxyHost=my-proxy-host.com
+systems.system-name.aws.clientConfig.ProxyPort=my-proxy-port
+{% endhighlight %}
+
+#### Kinesis Client Library Configs
+Samza Kinesis Connector uses [Kinesis Client 
Library](https://docs.aws.amazon.com/streams/latest/dev/developing-consumers-with-kcl.html#kinesis-record-processor-overview-kcl)
+(KCL) to access the Kinesis data streams. You can set any [Kinesis Client Lib 
Configuration](https://github.com/awslabs/amazon-kinesis-client/blob/master/amazon-kinesis-client-multilang/src/main/java/software/amazon/kinesis/coordinator/KinesisClientLibConfiguration.java)
+for a stream by configuring it under 
**systems.system-name.streams.stream-name.aws.kcl.***
+
+{% highlight jproperties %}
+systems.system-name.streams.stream-name.aws.kcl.CONFIG-PARAM=CONFIG-VALUE
+{% endhighlight %}
+
+Obtain the config param from the public functions in [Kinesis Client Lib 
Configuration](https://github.com/awslabs/amazon-kinesis-client/blob/master/amazon-kinesis-client-multilang/src/main/java/software/amazon/kinesis/coordinator/KinesisClientLibConfiguration.java)
+by removing the *"with"* prefix. For example: config param corresponding to 
**withTableName()** is **TableName**.
+
+### Resetting Offsets
+
+The source of truth for checkpointing while using Kinesis Connector is not the 
Samza checkpoint topic but Kinesis itself.
+The Kinesis Client Library (KCL) [uses 
DynamoDB](https://docs.aws.amazon.com/streams/latest/dev/kinesis-record-processor-ddb.html)
+to store itâs checkpoints. By default, Kinesis Connector reads from the 
latest offset in the stream.
+
+To reset the checkpoints and consume from earliest/latest offset of a Kinesis 
data stream, please change the KCL TableName
+and set the appropriate starting position for the stream as shown below.
+
+{% highlight jproperties %}
+// change the TableName to a unique name to reset checkpoint.
+systems.kinesis-system.streams.input0.aws.kcl.TableName=my-app-table-name
+// set the starting position to either TRIM_HORIZON (oldest) or LATEST (latest)
+systems.kinesis-system.streams.input0.aws.kcl.InitialPositionInStream=my-start-position
+{% endhighlight %}
+
+To manipulate checkpoints to start from a particular position in the Kinesis 
stream, in lieu of Samza CheckpointTool,
+please login to the AWS Console and change the offsets in the DynamoDB Table 
with the table name that you have specified
+in the config above. By default, the table name has the following format:
+"\<job name\>-\<job id\>-\<kinesis stream\>".
+
+### Known Limitations
+
+The following limitations apply to Samza jobs consuming from Kinesis streams 
using the Samza consumer:
+
+- Stateful processing (eg: windows or joins) is not supported on Kinesis 
streams. However, you can accomplish this by
+chaining two Samza jobs where the first job reads from Kinesis and sends to 
Kafka while the second job processes the
+data from Kafka.
+- Kinesis streams cannot be configured as 
[bootstrap](https://samza.apache.org/learn/documentation/latest/container/streams.html)
+or 
[broadcast](https://samza.apache.org/learn/documentation/latest/container/samza-container.html)
 streams.
+- Kinesis streams must be used ONLY with the 
[AllSspToSingleTaskGrouperFactory](https://github.com/apache/samza/blob/master/samza-core/src/main/java/org/apache/samza/container/grouper/stream/AllSspToSingleTaskGrouperFactory.java)
+as the Kinesis consumer does the partition management by itself. No other 
grouper is supported.
+- A Samza job that consumes from Kinesis cannot consume from any other input 
source. However, you can send your results
+to any destination (eg: Kafka, EventHubs), and have another Samza job consume 
them.
+
+## Producing to Kinesis
+
+The KinesisSystemProducer for Samza is not yet implemented.
+

http://git-wip-us.apache.org/repos/asf/samza/blob/1bf8bf5a/docs/learn/documentation/versioned/connectors/overview.md
----------------------------------------------------------------------
diff --git a/docs/learn/documentation/versioned/connectors/overview.md 
b/docs/learn/documentation/versioned/connectors/overview.md
new file mode 100644
index 0000000..579c494
--- /dev/null
+++ b/docs/learn/documentation/versioned/connectors/overview.md
@@ -0,0 +1,24 @@
+---
+layout: page
+title: Connectors overview
+---
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
+# Section 1
+# Section 2
+

http://git-wip-us.apache.org/repos/asf/samza/blob/1bf8bf5a/docs/learn/documentation/versioned/container/monitoring.md
----------------------------------------------------------------------
diff --git a/docs/learn/documentation/versioned/container/monitoring.md 
b/docs/learn/documentation/versioned/container/monitoring.md
deleted file mode 100644
index af6ec77..0000000
--- a/docs/learn/documentation/versioned/container/monitoring.md
+++ /dev/null
@@ -1,612 +0,0 @@
----
-layout: page
-title: Monitoring
----
-<!--
-   Licensed to the Apache Software Foundation (ASF) under one or more
-   contributor license agreements.  See the NOTICE file distributed with
-   this work for additional information regarding copyright ownership.
-   The ASF licenses this file to You under the Apache License, Version 2.0
-   (the "License"); you may not use this file except in compliance with
-   the License.  You may obtain a copy of the License at
-
-       http://www.apache.org/licenses/LICENSE-2.0
-
-   Unless required by applicable law or agreed to in writing, software
-   distributed under the License is distributed on an "AS IS" BASIS,
-   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIFND, either express or implied.
-   See the License for the specific language governing permissions and
-   limitations under the License.
--->
-
-# Monitoring Samza Applications
-
-This section provides details on monitoring of Samza jobs, not to be confused 
with _Samza Monitors_ (components of the Samza-REST service that provide 
cluster-wide monitoring capabilities).
-
-
-
-Like any other production software, it is critical to monitor the health of 
our Samza jobs. Samza relies on metrics for monitoring and includes an 
extensible metrics library. While a few standard metrics are provided 
out-of-the-box, it is easy to define metrics specific to your application.
-
-
-* [A. Metrics Reporters](#a-metrics-reporters)
-  + [A.1 Reporting Metrics to JMX (JMX Reporter)](#jmxreporter)
-    + [Enabling the JMX Reporter](#enablejmxreporter)
-    - [Using the JMX Reporter](#jmxreporter)
-  + [A.2 Reporting Metrics to Kafka (MetricsSnapshot 
Reporter)](#snapshotreporter)
-    - [Enabling the MetricsSnapshot Reporter](#enablesnapshotreporter)
-  + [A.3 Creating a Custom MetricsReporter](#customreporter)
-* [B. Metric Types in Samza](#metrictypes)
-* [C. Adding User-Defined Metrics](#userdefinedmetrics)
-  + [Low-level API](#lowlevelapi)
-  + [High-Level API](#highlevelapi)
-* [D. Key Internal Samza Metrics](#keyinternalsamzametrics)
-  + [D.1 Vital Metrics](#vitalmetrics)
-  + [D.2 Store Metrics](#storemetrics)
-  + [D.3 Operator Metrics](#operatormetrics)
-* [E. Metrics Reference Sheet](#metricssheet)
-
-## A. Metrics Reporters
-
-Samza&#39;s metrics library encapsulates the metrics collection and sampling 
logic. Metrics Reporters in Samza are responsible for emitting metrics to 
external services which may archive, process, visualize the metrics&#39; 
values, or trigger alerts based on them.
-
-Samza includes default implementations for two such Metrics Reporters:
-
-1. a) A _JMXReporter_ (detailed [below](#jmxreporter)) which allows using 
standard JMX clients for probing containers to retrieve metrics encoded as JMX 
MBeans. Visualization tools such as 
[Grafana](https://grafana.com/dashboards/3457) could also be used to visualize 
this JMX data.
-
-1. b) A _MetricsSnapshot_ reporter (detailed [below](#snapshotreporter)) which 
allows periodically publishing all metrics to Kafka. A downstream Samza job 
could then consume and publish these metrics to other metrics management 
systems such as [Prometheus](https://prometheus.io/) and 
[Graphite](https://graphiteapp.org/).
-
-Note that Samza allows multiple Metrics Reporters to be used simultaneously.
-
-
-### <a name="jmxreporter"></a> A.1 Reporting Metrics to JMX (JMX Reporter)
-
-This reporter encodes all its internal and user-defined metrics as JMX MBeans 
and hosts a JMX MBean server. Standard JMX clients (such as JConsole, VisualVM) 
can thus be used to probe Samza&#39;s containers and YARN-ApplicationMaster to 
retrieve these metrics&#39; values. JMX also provides additional profiling 
capabilities (e.g., for CPU and memory utilization), which are also enabled by 
this reporter.
-
-#### <a name="enablejmxreporter"></a> Enabling the JMX Reporter
-JMXReporter can be enabled by adding the following configuration.
-
-```
-#Define a Samza metrics reporter called "jxm", which publishes to JMX
-metrics.reporter.jmx.class=org.apache.samza.metrics.reporter.JmxReporterFactory
-
-# Use the jmx reporter (if using multiple reporters, separate them with commas)
-metrics.reporters=jmx
-
-```
-
-#### <a name="usejmxreporter"></a> Using the JMX Reporter
-
-To connect to the JMX MBean server, first obtain the JMX Server URL and port, 
published in the container logs:
-
-
-```
-
-2018-08-14 11:30:49.888 [main] JmxServer [INFO] Started JmxServer registry 
port=54661 server port=54662 
url=service:jmx:rmi://localhost:54662/jndi/rmi://localhost:54661/jmxrmi
-
-```
-
-
-If using the **JConsole** JMX client, launch it with the service URL as:
-
-```
-jconsole service:jmx:rmi://localhost:54662/jndi/rmi://localhost:54661/jmxrmi
-```
-
-<img src="/img/versioned/learn/documentation/container/jconsole.png" 
alt="JConsole" class="diagram-large">
-
- 
-
-If using the VisualVM JMX client, run:
-
-```
-jvisualvm
-```
-
-After **VisualVM** starts, click the &quot;Add JMX Connection&quot; button and 
paste in your JMX server URL (obtained from the logs).
-Install the VisualVM-MBeans plugin (Tools->Plugin) to view the metrics MBeans.
-
-<img src="/img/versioned/learn/documentation/container/visualvm.png" 
alt="VisualVM" class="diagram-small">
-
- 
-###  <a name="snapshotreporter"></a> A.2 Reporting Metrics to Kafka 
(MetricsSnapshot Reporter)
-
-This reporter publishes metrics to Kafka.
-
-#### <a name="enablesnapshotreporter"></a> Enabling the MetricsSnapshot 
Reporter
-To enable this reporter, simply append the following to your job&#39;s 
configuration.
-
-```
-#Define a metrics reporter called "snapshot"
-metrics.reporters=snapshot
-metrics.reporter.snapshot.class=org.apache.samza.metrics.reporter.MetricsSnapshotReporterFactory
-```
-
-
-Specify the kafka topic to which the reporter should publish to
-
-```
-metrics.reporter.snapshot.stream=kafka.metrics
-```
-
-
-Specify the serializer to be used for the metrics data
-
-```
-serializers.registry.metrics.class=org.apache.samza.serializers.MetricsSnapshotSerdeFactory
-systems.kafka.streams.metrics.samza.msg.serde=metrics
-```
-With this configuration, all containers (including the YARN-ApplicationMaster) 
will publish their JSON-encoded metrics 
-to a Kafka topic called &quot;metrics&quot; every 60 seconds.
-The following is an example of such a metrics message:
-
-```
-{
-  "header": {
-    "container-name": "samza-container-0",
-
-    "exec-env-container-id": "YARN-generated containerID",
-    "host": "samza-grid-1234.example.com",
-    "job-id": "1",
-    "job-name": "my-samza-job",
-    "reset-time": 1401729000347,
-    "samza-version": "0.0.1",
-    "source": "TaskName-Partition1",
-    "time": 1401729420566,
-    "version": "0.0.1"
-  },
-  "metrics": {
-    "org.apache.samza.container.TaskInstanceMetrics": {
-      "commit-calls": 1,
-      "window-calls": 0,
-      "process-calls": 14,
-
-      "messages-actually-processed": 14,
-      "send-calls": 0,
-
-      "flush-calls": 1,
-      "pending-messages": 0,
-      "messages-in-flight": 0,
-      "async-callback-complete-calls": 14,
-        "wikipedia-#en.wikipedia-0-offset": 8979,
-    }
-  }
-}
-```
-
-
-Each message contains a header which includes information about the job, time, 
and container from which the metrics were obtained. 
-The remainder of the message contains the metric values, grouped by their 
types, such as TaskInstanceMetrics, SamzaContainerMetrics,  
KeyValueStoreMetrics, JVMMetrics, etc. Detailed descriptions of the various 
metric categories and metrics are available [here](#metricssheet).
-
-It is possible to configure the MetricsSnapshot reporter to use a different 
serializer using this configuration
-
-```
-serializers.registry.metrics.class=<classpath-to-my-custom-serializer-factory>
-```
-
-
-
-To configure the reporter to publish with a different frequency (default 60 
seconds), add the following to your job&#39;s configuration
-
-```
-metrics.reporter.snapshot.interval=<publish frequency in seconds>
-```
-
-Similarly, to limit the set of metrics emitted you can use the regex based 
blacklist supported by this reporter. For example, to limit it to publishing 
only SamzaContainerMetrics use:
-
-```
-metrics.reporter.snapshot.blacklist=^(?!.\*?(?:SamzaContainerMetrics)).\*$
-```
-
-
-### <a name="customreporter"></a> A.3 Creating a Custom MetricsReporter
-
-Creating a custom MetricsReporter entails implementing the MetricsReporter 
interface. The lifecycle of Metrics Reporters is managed by Samza and is 
aligned with the Samza container lifecycle. Metrics Reporters can poll metric 
values and can receive callbacks when new metrics are added at runtime, e.g., 
user-defined metrics. Metrics Reporters are responsible for maintaining 
executor pools, IO connections, and any in-memory state that they require in 
order to export metrics to the desired external system, and managing the 
lifecycles of such components.
-
-After implementation, a custom reporter can be enabled by appending the 
following to the Samza job&#39;s configuration:
-
-```
-#Define a metrics reporter with a desired name
-metrics.reporter.<my-custom-reporter-name>.class=<classpath-of-my-custom-reporter-factory>
-
-
-#Enable its use for metrics reporting
-metrics.reporters=<my-custom-reporter-name>
-```
-
-
-
-## <a name="metrictypes"></a> B. Metric Types in Samza 
-
-Metrics in Samza are divided into three types -- _Gauges_, _Counters_, and 
_Timers_.
-
-_Gauges_ are useful when measuring the magnitude of a certain system property, 
e.g., the current queue length, or a buffer size.
-
-_Counters_ are useful in measuring metrics that are cumulative values, e.g., 
the number of messages processed since container startup. Certain counters are 
also useful when visualized with their rate-of-change, e.g., the rate of 
message processing.
-
-_Timers_ are useful for storing and reporting a sliding-window of timing 
values. Samza also supports a ListGauge type metric, which can be used to store 
and report a list of any primitive-type such as strings.
-
-## <a name="userdefinedmetrics"></a> C. Adding User-Defined Metrics
-
-
-To add a new metric, you can simply use the _MetricsRegistry_ in the provided 
TaskContext of 
-the init() method to register new metrics. The code snippets below show 
examples of registering and updating a user-defined
- Counter metric. Timers and gauges can similarly be used from within your task 
class.
-
-### <a name="lowlevelapi"></a> Low-level API
-
-Simply have your task implement the InitableTask interface and access the 
MetricsRegistry from the TaskContext.
-
-```
-public class MyJavaStreamTask implements StreamTask, InitableTask {
-
-  private Counter messageCount;
-  public void init(Config config, TaskContext context) {
-    this.messageCount = 
context.getMetricsRegistry().newCounter(getClass().getName(), "message-count");
-
-  }
-
-  public void process(IncomingMessageEnvelope envelope, MessageCollector 
collector, TaskCoordinator coordinator) {
-    messageCount.inc();
-  }
-
-}
-```
-
-### <a name="highlevelapi"></a> High-Level API
-
-In the high-level API, you can define a ContextManager and access the 
MetricsRegistry from the TaskContext, using which you can add and update your 
metrics.
-
-```
-public class MyJavaStreamApp implements StreamApplication {
-
-  private Counter messageCount = null;
-
-  @Override
-  public void init(StreamGraph graph, Config config) {
-    graph.withContextManager(new DemoContextManager());
-    MessageStream<IndexedRecord> viewEvent = ...;
-    viewEvent
-        .map(this::countMessage)
-        ...;
-  }
-
-  public final class DemoContextManager implements ContextManager {
-
-  @Override
-  public void init(Config config, TaskContext context) {
-      messageCount = context.getMetricsRegistry().
-      newCounter(getClass().getName(), "message-count");
-  }
-
-  private IndexedRecord countMessage(IndexedRecord value) {
-    messageCount.inc();
-    return value;
-  }
-
-  @Override
-  public void close() { }
-
-  }
-```
-
-## <a name="keyinternalsamzametrics"></a> D. Key Internal Samza Metrics
-
-Samza&#39;s internal metrics allow for detailed monitoring of a Samza job and 
all its components. Detailed descriptions 
-of all internal metrics are listed in a reference sheet 
[here](#e-metrics-reference-sheet). 
-However, a small subset of internal metrics facilitates easy high-level 
monitoring of a job.
-
-These key metrics can be grouped into three categories: _Vital metrics_, 
_Store__metrics_, and _Operator metrics_. 
-We explain each of these categories in detail below.
-
-### <a name="vitalmetrics"></a> D.1. Vital Metrics
-
-These metrics indicate the vital signs of a Samza job&#39;s health. Note that 
these metrics are categorized into different groups based on the Samza 
component they are emitted by, (e.g. SamzaContainerMetrics, 
TaskInstanceMetrics, ApplicationMaster metrics, etc).
-
-| **Metric Name** | **Group** | **Meaning** |
-| --- | --- | --- |
-| **Availability -- Are there any resource failures impacting my job?** |
-| job-healthy | ContainerProcessManagerMetrics | A binary value, where 1 
indicates that all the required containers configured for a job are running, 0 
otherwise. |
-| failed-containers | ContainerProcessManagerMetrics  | Number of containers 
that have failed in the job&#39;s lifetime |
-| **Input Processing Lag -- Is my job lagging ?** |
-| \<Topic\>-\<Partition\>-messages-behind-high-watermark |
-KafkaSystemConsumerMetrics | Number of input messages waiting to be processed 
on an input topic-partition |
-| consumptionLagMs | EventHubSystemConsumer | Time difference between the 
processing and enqueuing (into EventHub)  of input events |
-| millisBehindLatest | KinesisSystemConsumerMetrics | Current processing lag 
measured from the tip of the stream, expressed in milliseconds. |
-| **Output/Produce Errors -- Is my job failing to produce output?** |
-| producer-send-failed | KafkaSystemProducerMetrics | Number of send requests 
to Kafka (e.g., output topics) that failed due to unrecoverable errors |
-| flush-failed | HdfsSystemProducerMetrics | Number of failed flushes to HDFS |
-| **Processing Time -- Is my job spending too much time processing inputs?** |
-| process-ns | SamzaContainerMetrics | Amount of time the job is spending in 
processing each input |
-| commit-ns | SamzaContainerMetrics | Amount of time the job is spending in 
checkpointing inputs (and flushing producers, checkpointing KV stores, flushing 
side input stores).
-The frequency of this function is configured using _task.commit.ms_ |
-| window-ns | SamzaContainerMetrics | In case of WindowableTasks being used, 
amount of time the job is spending in its window() operations |
-
-### <a name="storemetrics"></a>  D.2. Store Metrics
-
-Stateful Samza jobs typically use RocksDB backed KV stores for storing state. 
Therefore, timing metrics associated with 
-KV stores can be useful for monitoring input lag. These are some key metrics 
for KV stores. 
-The metrics reference sheet [here](#e-metrics-reference-sheet) details all 
metrics for KV stores.
-
-
-
-| **Metric name** | **Group** | **Meaning** |
-| --- | --- | --- |
-| get-ns, put-ns, delete-ns, all-ns | KeyValueStorageEngineMetrics | Time 
spent performing respective KV store operations |
-
-
-
-### <a name="operatormetrics"></a>  D.3. Operator Metrics
-
-If your Samza job uses Samza&#39;s Fluent API or Samza-SQL, Samza creates a 
DAG (directed acyclic graph) of 
-_operators_ to form the required data processing pipeline. In such cases, 
operator metrics allow fine-grained 
-monitoring of such operators. Key operator metrics are listed below, while a 
detailed list is present 
-in the metrics reference sheet.
-
-| **Metric name** | **Group** | **Meaning** |
-| --- | --- | --- |
-| <Operator-ID\>-handle-message-ns | WindowOperatorImpl, 
PartialJoinOperatorImpl, StreamOperatorImpl, StreamTableJoinOperatorImpl, etc | 
Time spent handling a given input message by the operator |
-
-
-
-## <a name="metricssheet"></a>  E. Metrics Reference Sheet
-Suffixes &quot;-ms&quot; and &quot;-ns&quot; to metric names indicated 
milliseconds and nanoseconds respectively. All &quot;average time&quot; metrics 
are calculated over a sliding time window of 300 seconds.
-
-All \<system\>, \<stream\>, \<partition\>, \<store-name\>, \<topic\>, are 
populated with the corresponding actual values at runtime.
-
-
-| **Group** | **Metric name** | **Meaning** |
-| --- | --- | --- |
-| **ContainerProcessManagerMetrics** | running-containers | Total number of 
running containers. |
-| | needed-containers | Number of containers needed for the job to be declared 
healthy. |
-| | completed-containers | Number of containers that have completed their 
execution and exited. |
-| | failed-containers | Number of containers that have failed in the job&#39;s 
lifetime. |
-| | released-containers | Number of containers released due to overallocation 
by the YARN-ResourceManager. |
-| | container-count | Number of containers configured for the job. |
-| | redundant-notifications | Number of redundant onResourceCompletedcallbacks 
received from the RM after container shutdown. |
-| | job-healthy | A binary value, where 1 indicates that all the required 
containers configured for a job are running, 0 otherwise. |
-| | preferred-host-requests | Number of container resource-requests for a 
preferred host received by the cluster manager. |
-| | any-host-requests | Number of container resource-requests for _any_ host 
received by the cluster manager |
-| | expired-preferred-host-requests | Number of expired resource-requests-for 
-preferred-host received by the cluster manager. |
-| | expired-any-host-requests | Number of expired resource-requests-for 
-any-host received by the cluster manager. |
-| | host-affinity-match-pct | Percentage of non-expired preferred host 
requests. This measures the % of resource-requests for which host-affinity 
provided the preferred host. |
-
-| **Group** | **Metric name** | **Meaning** |
-| --- | --- | --- |
-| **SamzaContainerMetrics (Timer metrics)** | choose-ns | Average time spent 
by a task instance for choosing the input to process; this includes time spent 
waiting for input, selecting one in case of multiple inputs, and deserializing 
input. |
-| | window-ns | In case of WindowableTasks being used, average time a task 
instance is spending in its window() operations. |
-| | timer-ns | Average time spent in the timer-callback when a timer 
registered with TaskContext fires. |
-| | process-ns | Average time the job is spending in processing each input. |
-| | commit-ns | Average time the job is spending in checkpointing inputs (and 
flushing producers, checkpointing KV stores, flushing side input stores). The 
frequency of this function is configured using _task.commit.ms._ |
-| | block-ns | Average time the run loop is blocked because all task instances 
are busy processing input; could indicate lag accumulating. |
-| | container-startup-time | Time spent in starting the container. This 
includes time to start the JMX server, starting metrics reporters, starting 
system producers, consumers, system admins, offset manager, locality manager, 
disk space manager, security manager, statistics manager, and initializing all 
task instances. |
-
-| **Group** | **Metric name** | **Meaning** |
-| --- | --- | --- |
-| **SamzaContainerMetrics (Counters and Gauges)** | commit-calls | Number of 
commits. Each commit includes input checkpointing, flushing producers, 
checkpointing KV stores, flushing side input stores, etc. |
-| | window-calls | In case of WindowableTask, this measures the number of 
window invocations. |
-| | timer-calls | Number of timer callbacks. |
-| | process-calls | Number of process method invocations. |
-| | process-envelopers | Number of input message envelopes processed. |
-| | process-null-envelopes | Number of times no input message envelopes was 
available for the run loop to process. |
-| | event-loop-utilization | The duty-cycle of the event loop. That is, the 
fraction of time of each event loop iteration that is spent in process(), 
window(), and commit. |
-| | disk-usage-bytes | Total disk space size used by key-value stores (in 
bytes). |
-| | disk-quota-bytes | Disk memory usage quota for key-value stores (in 
bytes). |
-| | executor-work-factor | The work factor of the run loop. A work factor of 1 
indicates full throughput, while a work factor of less than 1 will introduce 
delays into the execution to approximate the requested work factor. The work 
factor is set by the disk space monitor in accordance with the disk quota 
policy. Given the latest percentage of available disk quota, this policy 
returns the work factor that should be applied. |
-| | physical-memory-mb | The physical memory used by the Samza container 
process (native + on heap) (in MBs). |
-| | <TaskName\>-<StoreName\>-restore-time | Time taken to restore task stores 
(per task store). |
-
-
-| **Group** | **Metric name** | **Meaning** |
-| --- | --- | --- |
-| **Job-Coordinator Metrics (Gauge)** | \<system\>-\<stream\>-partitionCount | 
The current number of partitions detected by the Stream Partition Count 
Monitor. This can be enabled by configuring 
_job.coordinator.monitor-partition-change_ to true. |
-
-| **Group** | **Metric name** | **Meaning** |
-| --- | --- | --- |
-| **TaskInstance Metrics (Counters and Gauges)** | 
\<system\>-\<stream\>-\<partition\>-offset | The offset of the last processed 
message on the given system-stream-partition input. |
-|   | commit-calls | Number of commit calls for the task. Each commit call 
involves checkpointing inputs (and flushing producers, checkpointing KV stores, 
flushing side input stores). |
-|   | window-calls | In case of WIndowableTask, the number of window() 
invocations on the task. |
-|   | process-calls | Number of process method calls. |
-|   | send-calls | Number of send method calls (representing number of 
messages that were sent to the underlying SystemProducers) |
-|   | flush-calls | Number of times the underlying system producers were 
flushed. |
-|   | messages-actually-processed | Number of messages processed by the task. |
-|   | pending-messages | Number of pending messages in the pending envelope 
queue
-|   | messages-in-flight | Number of input messages currently being processed. 
This is impacted by the task.max.concurrency configuration. |
-|   | async-callback-complete-calls | Number of processAsync invocations that 
have completed (applicable to AsyncStreamTasks). |
-
-| **Group** | **Metric name** | **Meaning** |
-| --- | --- | --- |
-| OffsetManagerMetrics (Gauge) | 
\<system\>-\<stream\>-\<partition\>-checkpointed-offset | Latest checkpointed 
offsets for each input system-stream-partition. |
-
-
-| **Group** | **Metric name** | **Meaning** |
-| --- | --- | --- |
-| **JvmMetrics (Timers)** | gc-time-millis | Total time spent in GC. |
-|   | <gc-name\>-time-millis | Total time spent in garbage collection (for 
each garbage collector) (in milliseconds) |
-
-
-| **Group** | **Metric name** | **Meaning** |
-| --- | --- | --- |
-| **JvmMetrics (Counters and Gauges)** | gc-count | Number of GC invocations. |
-|   | mem-heap-committed-mb | Size of committed heap memory (in MBs) Because 
the guest allocates memory lazily to the JVM heap and because the difference 
between Free and Used memory is opaque to the guest, the guest commits memory 
to the JVM heap as it is required. The Committed memory, therefore, is a 
measure of how much memory the JVM heap is really consuming in the 
guest.[https://pubs.vmware.com/vfabric52/index.jsp?topic=/com.vmware.vfabric.em4j.1.2/em4j/conf-heap-management.html](https://pubs.vmware.com/vfabric52/index.jsp?topic=/com.vmware.vfabric.em4j.1.2/em4j/conf-heap-management.html)
 |
-|   | mem-heap-used-mb | Used memory from the perspective of the JVM is 
(Working set + Garbage) and Free memory is (Current heap size â Used memory). 
|
-|   | mem-heap-max-mb | Size of maximum heap memory (in MBs). This is defined 
by the âXmx option. |
-|   | mem-nonheap-committed-mb | Size of non-heap memory committed in MBs. |
-|   | mem-nonheap-used-mb | Size of non-heap memory used in MBs. |
-|   | mem-nonheap-max-mb | Size of non-heap memory in MBs. This can be changed 
using âXX:MaxPermSize VM option. |
-|   | threads-new | Number of threads not started at that instant. |
-|   | threads-runnable | Number of running threads at that instant. |
-|   | threads-timed-waiting | Current number of timed threads waiting at that 
instant. A thread in TIMED\_WAITING stated as: &quot;A thread that is waiting 
for another thread to perform an action for up to a specified waiting time is 
in this state.&quot; |
-|   | threads-waiting | Current number of waiting threads. |
-|   | threads-blocked | Current number of blocked threads. |
-|   | threads-terminated | Current number of terminated threads. |
-|   | \<gc-name\>-gc-count | Number of garbage collection calls (for each 
garbage collector). |
-| **(Emitted only if the OS supports it)** | process-cpu-usage | Returns the 
&quot;recent cpu usage&quot; for the Java Virtual Machine process. |
-| **(Emitted only if the OS supports it)** | system-cpu-usage | Returns the 
&quot;recent cpu usage&quot; for the whole system. |
-| **(Emitted only if the OS supports it)** | open-file-descriptor-count | 
Count of open file descriptors. |
-
-
-| **Group** | **Metric name** | **Meaning** |
-| --- | --- | --- |
-| **SystemConsumersMetrics (Counters and Gauges)** <br/> These metrics are 
emitted when multiplexing and coordinating between per-system consumers and 
message choosers for polling | chose-null | Number of times the message chooser 
returned a null message envelope. This is typically indicative of low input 
traffic on one or more input partitions. |
-|   | chose-object | Number of times the message chooser returned a non-null 
message envelope. |
-|   | deserialization-error | Number of times an incoming message was not 
deserialized successfully. |
-|   | ssps-needed-by-chooser | Number of systems for which no buffered message 
exists, and hence these systems need to be polled (to obtain a message). |
-|   | poll-timeout | The timeout for polling at that instant. |
-|   | unprocessed-messages | Number of unprocessed messages buffered in 
SystemConsumers. |
-|   | \<system\>-polls | Number of times the given system was polled |
-|   | \<system\>-ssp-fetches-per-poll | Number of partitions of the given 
system polled at that instant. |
-|   | \<system\>-messages-per-poll | Number of times the SystemConsumer for 
the underlying system was polled to get new messages. |
-|   | \<system\>-\<stream\>-\<partition\>-messages-chosen | Number of messages 
that were chosen by the MessageChooser for particular system stream partition. |
-
-
-| **Group** | **Metric name** | **Meaning** |
-| --- | --- | --- |
-| **SystemConsumersMetrics (Timers)** | poll-ns | Average time spent polling 
all underlying systems for new messages (in nanoseconds). |
-|   | deserialization-ns | Average time spent deserializing incoming messages 
(in nanoseconds). |
-
-
-| **Group** | **Metric name** | **Meaning** |
-| --- | --- | --- |
-| **KafkaSystemConsumersMetrics (Timers)** | 
\<system\>-\<topic\>-\<partition\>-offset-change | The next offset to be read 
for this topic and partition. |
-|   | \<system\>-\<topic\>-\<partition\>-bytes-read | Total size of all 
messages read for a topic partition (payload + key size). |
-|   | \<system\>-\<topic\>-\<partition\>-messages-read | Number of messages 
read for a topic partition. |
-|   | \<system\>-\<topic\>-\<partition\>-high-watermark | Offset of the last 
committed message in Kafka&#39;s topic partition. |
-|   | \<system\>-\<topic\>-\<partition\>-messages-behind-high-watermark | 
Number of input messages waiting to be processed on an input topic-partition. 
That is, the difference between high watermark and next offset. |
-|   | \<system\>-<host\>-<port\>-reconnects | Number of reconnects to a broker 
on a particular host and port. |
-|   | \<system\>-<host\>-<port\>-bytes-read | Total size of all messages read 
from a broker on a particular host and port. |
-|   | \<system\>-<host\>-<port\>-messages-read | Number of times the consumer 
used a broker on a particular host and port to get new messages. |
-|   | \<system\>-<host\>-<port\>-skipped-fetch-requests | Number of times the 
fetchMessage method is called but no topic/partitions needed new messages. |
-|   | \<system\>-<host\>-<port\>-topic-partitions | Number of broker&#39;s 
topic partitions which are being consumed. |
-|   | poll-count | Number of polls the KafkaSystemConsumer performed to get 
new messages. |
-|   | no-more-messages-SystemStreamPartition [\<system\>, \<stream\>, 
\<partition\>] | Indicates if the Kafka consumer is at the head for particular 
partition. |
-|   | blocking-poll-count-SystemStreamPartition [\<system\>, \<stream\>, 
\<partition\>] | Number of times a blocking poll is executed (polling until we 
get at least one message, or until we catch up to the head of the stream) (per 
partition). |
-|   | blocking-poll-timeout-count-SystemStreamPartition [\<system\>, 
\<stream\>, \<partition\>] | Number of times a blocking poll has timed out 
(polling until we get at least one message within a timeout period) (per 
partition). |
-|   | buffered-message-count-SystemStreamPartition [\<system\>, \<stream\>, 
\<partition\>] | Current number of messages in queue (per partition). |
-|   | buffered-message-size-SystemStreamPartition [\<system\>, \<stream\>, 
\<partition\>] | Current size of messages in queue (if 
systems.system.samza.fetch.threshold.bytes is defined) (per partition). |
-|   | \<system\>-\<topic\>-\<partition\>-offset-change | The next offset to be 
read for a topic partition. |
-|   | \<system\>-\<topic\>-\<partition\>-bytes-read | Total size of all 
messages read for a topic partition (payload + key size). |
-
-
-
-
-| **Group** | **Metric name** | **Meaning** |
-| --- | --- | --- |
-| **SystemProducersMetrics (Counters and Gauges)** <br/>These metrics are 
aggregated across Producers. | sends | Number of send method calls. 
Representing total number of sent messages. |
-|   | flushes | Number of flush method calls for all registered producers. |
-|   | <source\>-sends | Number of sent messages for a particular source (task 
instance). |
-|   | <source\>-flushes | Number of flushes for particular source (task 
instance). |
-|   | serialization error | Number of errors occurred while serializing 
envelopes before sending. |
-
-| **Group** | **Metric name** | **Meaning** |
-| --- | --- | --- |
-| **KafkaSystemProducersMetrics (Counters)** | \<system\>-producer-sends | 
Number of send invocations to the KafkaSystemProducer. |
-|   | \<system\>-producer-send-success | Number of send requests that were 
successfully completed by the KafkaSystemProducer. |
-|   | \<system\>-producer-send-failed | Number of send requests to Kafka 
(e.g., output topics) that failed due to unrecoverable errors |
-|   | \<system\>-flushes | Number of calls made to flush in the 
KafkaSystemProducer. |
-|   | \<system\>-flush-failed | Number of times flush operation failed. |
-
-
-| **Group** | **Metric name** | **Meaning** |
-| --- | --- | --- |
-| **KafkaSystemProducersMetrics (Timers)** | \<system\>-flush-ns | Represents 
average time the flush call takes to complete (in nanoseconds). |
-
-
-| **Group** | **Metric name** | **Meaning** |
-| --- | --- | --- |
-| **KeyValueStorageEngineMetrics (Counters)** <br/> These metrics provide 
insight into the type and number of KV Store operations taking place | 
<store-name\>-puts | Total number of put operations on the given KV store. |
-|   | <store-name\>-put-alls | Total number putAll operations on the given KV 
store. |
-|   | <store-name\>-gets | Total number get operations on the given KV store. |
-|   | <store-name\>-get-alls | Total number getAll operations on the given KV 
store. |
-|   | <store-name\>-alls | Total number of accesses to the iterator on the 
given KV store. |
-|   | <store-name\>-ranges | Total number of accesses to a sorted-range 
iterator on the given KV store. |
-|   | <store-name\>-deletes | Total number delete operations on the given KV 
store. |
-|   | <store-name\>-delete-alls | Total number deleteAll operations on the 
given KV store. |
-|   | <store-name\>-flushes | Total number flush operations on the given KV 
store. |
-|   | <store-name\>-restored-messages | Number of entries in the KV store 
restored from the changelog for that store. |
-|   | <store-name\>-restored-bytes | Size in bytes of entries in the KV store 
restored from the changelog for that store. |
-|   | <store-name\>-snapshots | Total number of snapshot operations on the 
given KV store. |
-
-
-| **Group** | **Metric name** | **Meaning** |
-| --- | --- | --- |
-| **KeyValueStorageEngineMetrics (Timers)** <br/> These metrics provide 
insight into the latencies of  of KV Store operations | <store-name\>-get-ns | 
Average duration of the get operation on the given KV Store. |
-|   | <store-name\>-get-all-ns | Average duration of the getAll operation on 
the given KV Store. |
-|   | <store-name\>-put-ns | Average duration of the put operation on the 
given KV Store. |
-|   | <store-name\>-put-all-ns | Average duration of the putAll operation on 
the given KV Store. |
-|   | <store-name\>-delete-ns | Average duration of the delete operation on 
the given KV Store. |
-|   | <store-name\>-delete-all-ns | Average duration of the deleteAll 
operation on the given KV Store. |
-|   | <store-name\>-flush-ns | Average duration of the flush operation on the 
given KV Store. |
-|   | <store-name\>-all-ns | Average duration of obtaining an iterator (using 
the all operation) on the given KV Store. |
-|   | <store-name\>-range-ns | Average duration of obtaining a sorted-range 
iterator (using the all operation) on the given KV Store. |
-|   | <store-name\>-snapshot-ns | Average duration of the snapshot operation 
on the given KV Store. |
-
-
-| **Group** | **Metric name** | **Meaning** |
-| --- | --- | --- |
-| **KeyValueStoreMetrics (Counters)** <br/> These metrics are measured at the 
App-facing layer for different KV Stores, e.g., RocksDBStore, InMemoryKVStore. 
| <store-name\>-gets, <store-name\>-getAlls, <store-name\>-puts, 
<store-name\>-putAlls, <store-name\>-deletes, <store-name\>-deleteAlls, 
<store-name\>-alls, <store-name\>-ranges, <store-name\>-flushes | Total number 
of the specified operation on the given KV Store.(These metrics have are 
equivalent to the respective ones under KeyValueStorageEngineMetrics). |
-|   | bytes-read | Total number of bytes read (when serving reads -- gets, 
getAlls, and iterations). |
-|   | bytes-written | Total number of bytes written (when serving writes -- 
puts, putAlls). |
-
-
-| **Group** | **Metric name** | **Meaning** |
-| --- | --- | --- |
-| **SerializedKeyValueStoreMetrics (Counters)** <br/> These metrics are 
measured at the serialization layer. | <store-name\>-gets, 
<store-name\>-getAlls, <store-name\>-puts, <store-name\>-putAlls, 
<store-name\>-deletes, <store-name\>-deleteAlls, <store-name\>-alls, 
<store-name\>-ranges, <store-name\>-flushes | Total number of the specified 
operation on the given KV Store. (These metrics have are equivalent to the 
respective ones under KeyValueStorageEngineMetrics) |
-|   | bytes-deserialized | Total number of bytes deserialized (when serving 
reads -- gets, getAlls, and iterations). |
-|   | bytes-serialized | Total number of bytes serialized (when serving reads 
and writes -- gets, getAlls, puts, putAlls). In addition to writes, 
serialization is also done during reads to serialize key to bytes for lookup in 
the underlying store. |
-
-
-| **Group** | **Metric name** | **Meaning** |
-| --- | --- | --- |
-| **LoggedStoreMetrics (Counters)** <br/> These metrics are measured at the 
changeLog-backup layer for KV stores. | <store-name\>-gets, <store-name\>-puts, 
<store-name\>-alls, <store-name\>-deletes, <store-name\>-flushes, 
<store-name\>-ranges, | Total number of the specified operation on the given KV 
Store.
-|
-
-
-
-| **Group** | **Metric name** | **Meaning** |
-| --- | --- | --- |
-| **CachedStoreMetrics (Counters and Gauges)** <br/> These metrics are 
measured at the caching layer for RocksDB-backed KV stores. | 
<store-name\>-gets, <store-name\>-puts, <store-name\>-alls, 
<store-name\>-deletes, <store-name\>-flushes, <store-name\>-ranges, | Total 
number of the specified operation on the given KV Store.|
-|   | cache-hits | Total number of get and getAll operations that hit cached 
entries. |
-|   | put-all-dirty-entries-batch-size | Total number of dirty KV-entries 
written-back to the underlying store. |
-|   | dirty-count | Number of entries in the cache marked dirty at that 
instant. |
-|   | cache-count | Number of entries in the cache at that instant. |
-
-
-| **Group** | **Metric name** | **Meaning** |
-| --- | --- | --- |
-| **RoundRobinChooserMetrics (Counters)** | buffered-messages | Size of the 
queue with potential messages to process. |
-
-
-| **Group** | **Metric name** | **Meaning** |
-| --- | --- | --- |
-| **BatchingChooserMetrics (Counters and gauges)** | batch-resets | Number of 
batch resets because they  exceeded the max batch size limit. |
-|   | batched-envelopes | Number of envelopes in the batch at the current 
instant. |
-
-
-| **Group** | **Metric name** | **Meaning** |
-| --- | --- | --- |
-| **BootstrappingChooserMetrics (Gauges)** | lagging-batch-streams | Number of 
bootstrapping streams that are lagging. |
-|   | \<system\>-\<stream\>-lagging-partitions | Number of lagging partitions 
in the stream (for each stream marked as bootstrapping stream). |
-
-
-
-| **Group** | **Metric name** | **Meaning** |
-| --- | --- | --- |
-| **HdfsSystemProducerMetrics (Counters)** | system-producer-sends | Total 
number of attempts to write to HDFS. |
-|   | system-send-success | Total number of successful writes to HDFS. |
-|   | system-send-failed | Total number of failures while sending envelopes to 
HDFS. |
-|   | system-flushes | Total number of attempts to flush data to HDFS. |
-|   | system-flush-success | Total number of successfully flushed all written 
data to HDFS. |
-|   | system-flush-failed | Total number of failures while flushing data to 
HDFS. |
-
-| **Group** | **Metric name** | **Meaning** |
-| --- | --- | --- |
-| **HdfsSystemProducerMetrics (Timers)** | system-send-ms | Average time spent 
for writing messages to HDFS (in milliseconds). |
-|   | system-flush-ms | Average time spent for flushing messages to HDFS (in 
milliseconds). |
-
-
-| **Group** | **Metric name** | **Meaning** |
-| --- | --- | --- |
-| **ElasticsearchSystemProducerMetrics (Counters)** | system-bulk-send-success 
| Total number of successful bulk requests |
-|   | system-docs-inserted | Total number of documents created. |
-|   | system-docs-updated | Total number of documents updated. |
-|   | system-version-conflicts | Number of times the failed requests due to 
conflicts with the current state of the document. |

http://git-wip-us.apache.org/repos/asf/samza/blob/1bf8bf5a/docs/learn/documentation/versioned/core-concepts/core-concepts.md
----------------------------------------------------------------------
diff --git a/docs/learn/documentation/versioned/core-concepts/core-concepts.md 
b/docs/learn/documentation/versioned/core-concepts/core-concepts.md
new file mode 100644
index 0000000..449b338
--- /dev/null
+++ b/docs/learn/documentation/versioned/core-concepts/core-concepts.md
@@ -0,0 +1,23 @@
+---
+layout: page
+title: Core concepts
+---
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
+## Core concepts page
+

http://git-wip-us.apache.org/repos/asf/samza/blob/1bf8bf5a/docs/learn/documentation/versioned/deployment/standalone.md
----------------------------------------------------------------------
diff --git a/docs/learn/documentation/versioned/deployment/standalone.md 
b/docs/learn/documentation/versioned/deployment/standalone.md
new file mode 100644
index 0000000..c7425f6
--- /dev/null
+++ b/docs/learn/documentation/versioned/deployment/standalone.md
@@ -0,0 +1,217 @@
+---
+layout: page
+title: Run as embedded library.
+---
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
+- [Introduction](#introduction)
+- [User guide](#user-guide)
+     - [Setup dependencies](#setup-dependencies)
+     - [Configuration](#configuration)
+     - [Code sample](#code-sample)
+- [Quick start guide](#quick-start-guide)
+  - [Setup zookeeper](#setup-zookeeper)
+  - [Setup kafka](#setup-kafka)
+  - [Build binaries](#build-binaries)
+  - [Deploy binaries](#deploy-binaries)
+  - [Inspect results](#inspect-results)
+- [Coordinator internals](#coordinator-internals)
+
+#
+
+# Introduction
+
+With Samza 0.13.0, the deployment model of samza jobs has been simplified and 
decoupled from YARN. _Standalone_ model provides the stream processing 
capabilities of samza packaged in the form of a library with pluggable 
coordination. This library model offers an easier integration path  and 
promotes a flexible deployment model for an application. Using the standalone 
mode, you can leverage Samza processors directly in your application and deploy 
Samza applications to self-managed clusters.
+
+A standalone application typically is comprised of multiple _stream 
processors_. A _stream processor_ encapsulates a user defined processing 
function and is responsible for processing a subset of input topic partitions. 
A stream processor of a standalone application is uniquely identified by a 
_processorId_.
+
+Samza provides pluggable job _coordinator_ layer to perform leader election 
and assign work to the stream processors. Standalone supports Zookeeper 
coordination out of the box and uses it for distributed coordination between 
the stream processors of standalone application. A processor can become part of 
a standalone application by setting its app.name(Ex: app.name=group\_1) and 
joining the group.
+
+In samza standalone, the input topic partitions are distributed between the 
available processors dynamically at runtime. In each standalone application, 
one stream processor will be chosen as a leader initially to mediate the 
assignment of input topic partitions to the stream processors. If the number of 
available processors changes(for example, if a processors is shutdown or 
added), then the leader processor will regenerate the partition assignments and 
re-distribute it to all the processors.
+
+On processor group change, the act of re-assigning input topic partitions to 
the remaining live processors in the group is known as rebalancing the group. 
On failure of the leader processor of a standalone application, an another 
stream processor of the standalone application will be chosen as leader.
+
+## User guide
+
+Samza standalone is designed to help you to have more control over the 
deployment of the application. So it is your responsibility to configure and 
deploy the processors. In case of ZooKeeper coordination, you have to configure 
the URL for an instance of ZooKeeper.
+
+A stream processor is identified by a unique processorID which is generated by 
the pluggable ProcessorIdGenerator abstraction. ProcessorId of the stream 
processor is used with the coordination service. Samza supports UUID based 
ProcessorIdGenerator out of the box.
+
+The diagram below shows a input topic with three partitions and an standalone 
application with three processors consuming messages from it.
+
+<img 
src="/img/versioned/learn/documentation/standalone/standalone-application.jpg" 
alt="Standalone application" height="550px" width="700px" align="middle">
+
+When a group is first initialized, each stream processor typically starts 
processing messages from either the earliest or latest offset of the input 
topic partition. The messages in each partition are sequentially delivered to 
the user defined processing function. As the stream processor makes progress, 
it commits the offsets of the messages it has successfully processed. For 
example, in the figure above, the stream processor position is at offset 7 and 
its last committed offset is at offset 3.
+
+When a input partition is reassigned to another processor in the group, the 
initial position is set to the last committed offset. If the processor-1 in the 
example above suddenly crashed, then the live processor taking over the 
partition would begin consumption from offset 3. In that case, it would not 
have to reprocess the messages up to the crashed processor's position of 3.
+
+### Setup dependencies
+
+Add the following samza-standalone maven dependencies to your project.
+
+```xml
+<dependency>
+    <groupId>org.apache.samza</groupId>
+    <artifactId>samza-kafka_2.11</artifactId>
+    <version>1.0</version>
+</dependency>
+<dependency>
+    <groupId>org.apache.samza</groupId>
+    <artifactId>samza-core_2.11</artifactId>
+    <version>1.0</version>
+</dependency>
+<dependency>
+    <groupId>org.apache.samza</groupId>
+    <artifactId>samza-api</artifactId>
+    <version>1.0</version>
+</dependency>
+```
+
+### Configuration
+
+A samza standalone application requires you to define the following mandatory 
configurations:
+
+```bash
+job.coordinator.factory=org.apache.samza.zk.ZkJobCoordinatorFactory
+job.coordinator.zk.connect=your_zk_connection(for local zookeeper, use 
localhost:2181)
+task.name.grouper.factory=org.apache.samza.container.grouper.task.GroupByContainerIdsFactory
 
+```
+
+You have to configure the stream processor with the kafka brokers as defined 
in the following sample(we have assumed that the broker is running on 
localhost):
+
+```bash 
+systems.kafka.samza.factory=org.apache.samza.system.kafka.KafkaSystemFactory
+systems.kafka.samza.msg.serde=json
+systems.kafka.consumer.zookeeper.connect=localhost:2181
+systems.kafka.producer.bootstrap.servers=localhost:9092 
+```
+
+### Code sample
+
+Here&#39;s a sample standalone application with app.name set to sample-test. 
Running this class would launch a stream processor.
+
+```java
+public class PageViewEventExample implements StreamApplication {
+
+  public static void main(String[] args) {
+    CommandLine cmdLine = new CommandLine();
+    OptionSet options = cmdLine.parser().parse(args);
+    Config config = cmdLine.loadConfig(options);
+
+    ApplicationRunner runner = 
ApplicationRunners.getApplicationRunner(ApplicationClassUtils.fromConfig(config),
 config);
+    runner.run();
+    runner.waitForFinish();
+  }
+
+  @Override
+  public void describe(StreamAppDescriptor appDesc) {
+     MessageStream<PageViewEvent> pageViewEvents = null;
+     pageViewEvents = appDesc.getInputStream("inputStream", new 
JsonSerdeV2<>(PageViewEvent.class));
+     OutputStream<KV<String, PageViewCount>> pageViewEventPerMemberStream =
+         appDesc.getOutputStream("outputStream",  new 
JsonSerdeV2<>(PageViewEvent.class));
+     pageViewEvents.sendTo(pageViewEventPerMemberStream);
+  }
+}
+```
+
+## Quick start guide
+
+The [Hello-samza](https://github.com/apache/samza-hello-samza/) project 
contains sample Samza standalone applications. Here are step by step 
instruction guide to install, build and run a standalone application binaries 
using the local zookeeper cluster for coordination. Check out the hello-samza 
project by running the following commands:
+
+```bash
+git clone https://git.apache.org/samza-hello-samza.git hello-samza
+cd hello-samza 
+```
+
+### Setup Zookeeper
+
+Run the following command to install and start a local zookeeper cluster.
+
+```bash
+./bin/grid install zookeeper
+./bin/grid start zookeeper
+```
+
+### Setup Kafka
+
+Run the following command to install and start a local kafka cluster.
+
+```bash
+./bin/grid install kafka
+./bin/grid start zookeeper
+```
+
+### Build binaries
+
+Before you can run the standalone job, you need to build a package for it 
using the following command.
+
+```bash
+mvn clean package
+mkdir -p deploy/samza
+tar -xvf ./target/hello-samza-0.15.0-SNAPSHOT-dist.tar.gz -C deploy/samza 
+```
+
+### Deploy binaries
+
+To run the sample standalone application 
[WikipediaZkLocalApplication](https://github.com/apache/samza-hello-samza/blob/master/src/main/java/samza/examples/wikipedia/application/WikipediaZkLocalApplication.java)
+
+```bash
+./bin/deploy.sh
+./deploy/samza/bin/run-class.sh 
samza.examples.wikipedia.application.WikipediaZkLocalApplication  
--config-factory=org.apache.samza.config.factories.PropertiesConfigFactory 
--config-path=file://$PWD/deploy/samza/config/wikipedia-application-local-runner.properties
+```
+
+### Inspect results
+
+The standalone application reads messages from the wikipedia-edits topic, and 
calculates counts, every ten seconds, for all edits that were made during that 
window. It outputs these counts to the local wikipedia-stats kafka topic. To 
inspect events in output topic, run the following command.
+
+```bash
+./deploy/kafka/bin/kafka-console-consumer.sh  --zookeeper localhost:2181 
--topic wikipedia-stats
+```
+
+Events produced to the output topic from the standalone application launched 
above will be of the following form:
+
+```
+{"is-talk":2,"bytes-added":5276,"edits":13,"unique-titles":13}
+{"is-bot-edit":1,"is-talk":3,"bytes-added":4211,"edits":30,"unique-titles":30,"is-unpatrolled":1,"is-new":2,"is-minor":7}
+```
+
+# Coordinator internals
+
+A samza application is comprised of multiple stream processors. A processor 
can become part of a standalone application by setting its app.name(Ex: 
app.name=group\_1) and joining the group. In samza standalone, the input topic 
partitions are distributed between the available processors dynamically at 
runtime. If the number of available processors changes(for example, if some 
processors are shutdown or added), then the partition assignments will be 
regenerated and re-distributed to all the processors. One processor will be 
elected as leader and it will generate the partition assignments and distribute 
it to the other processors in the group.
+
+To mediate the partition assignments between processors, samza standalone 
relies upon a coordination service. The main responsibilities of coordination 
service are the following:
+
+**Leader Election** - Elects a single processor to generate the partition 
assignments and distribute it to other processors in the group.
+
+**Distributed barrier** - Coordination primitive used by the processors to 
reach consensus(agree) on an partition assignment.
+
+By default, embedded samza uses Zookeeper for coordinating between processors 
of an application and store the partition assignment state. Coordination 
sequence for a standalone application is listed below:
+
+1. Each processor(participant) will register with the coordination 
service(e.g: Zookeeper) with its participant ID.
+
+2. One of the participants will be elected as the leader.
+
+3. The leader will monitor the list of all the active participants.
+
+4. Whenever the list of the participants changes in a group, the leader will 
generate a new partition assignments for the current participants and persist 
it to a common storage.
+
+5. Participants are notified that the new partition assignment is available. 
Notification is done through the coordination service(e.g. ZooKeeper).
+
+6. The participants will stop processing, pick up the new partition 
assignment, and then resume processing.
+
+In order to ensure that no two partitions are processed by different 
processors, processing is paused and all the processors will synchronize on a 
distributed barrier. Once all the processors are paused, the new partition 
assignments are applied, after which processing resumes.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/samza/blob/1bf8bf5a/docs/learn/documentation/versioned/deployment/yarn.md
----------------------------------------------------------------------
diff --git a/docs/learn/documentation/versioned/deployment/yarn.md 
b/docs/learn/documentation/versioned/deployment/yarn.md
new file mode 100644
index 0000000..06f0446
--- /dev/null
+++ b/docs/learn/documentation/versioned/deployment/yarn.md
@@ -0,0 +1,27 @@
+---
+layout: page
+title: Run on YARN
+---
+<!--
+   Licensed to the Apache Software Foundation (ASF) under one or more
+   contributor license agreements.  See the NOTICE file distributed with
+   this work for additional information regarding copyright ownership.
+   The ASF licenses this file to You under the Apache License, Version 2.0
+   (the "License"); you may not use this file except in compliance with
+   the License.  You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
+-->
+
+
+# YARN section 1
+# YARN section 2
+# YARN section 3
+# YARN section 4
+# YARN section 5

http://git-wip-us.apache.org/repos/asf/samza/blob/1bf8bf5a/docs/learn/documentation/versioned/index.html
----------------------------------------------------------------------
diff --git a/docs/learn/documentation/versioned/index.html 
b/docs/learn/documentation/versioned/index.html
index 49592f6..80035bb 100644
--- a/docs/learn/documentation/versioned/index.html
+++ b/docs/learn/documentation/versioned/index.html
@@ -19,20 +19,18 @@ title: Documentation
    limitations under the License.
 -->
 
-<h4><a href="comparisons/introduction.html">Core concepts</a></h4>
-<hr/>
-
-<h4>Architecture</h4>
+<h4><a href="core-concepts/core-concepts.html">CORE CONCEPTS</a></h4>
+<h4><a href="architecture/architecture-overview.html">ARCHITECTURE</a></h4>
 
 
 <h4>API</h4>
 
 <ul class="documentation-list">
-  <li><a href="comparisons/introduction.html">Low-level API</a></li>
-  <li><a href="comparisons/mupd8.html">Streams DSL</a></li>
+  <li><a href="api/low-level-api.html">Low-level API</a></li>
+  <li><a href="api/high-level-api.html">Streams DSL</a></li>
   <li><a href="api/table-api.html">Table API</a></li>
-  <li><a href="comparisons/storm.html">Samza SQL</a></li>
-  <li><a href="comparisons/spark-streaming.html">Apache BEAM</a></li>
+  <li><a href="api/samza-sql.html">Samza SQL</a></li>
+  <li><a href="https://beam.apache.org/documentation/runners/samza/";>Apache 
BEAM</a></li>
 <!-- TODO comparisons pages
   <li><a href="comparisons/aurora.html">Aurora</a></li>
   <li><a href="comparisons/jms.html">JMS</a></li>
@@ -43,28 +41,25 @@ title: Documentation
 <h4>Deployment</h4>
 
 <ul class="documentation-list">
-  <li><a href="api/overview.html">Deployment overview</a></li>
-  <li><a href="deployment/deployment-model.html">Deployment model</a></li>
-  <li><a href="api/overview.html">Run on YARN</a></li>
-  <li><a href="standalone/standalone.html">Run as an embedded library</a></li>
+  <li><a href="deployment/deployment-model.html">Deployment options</a></li>
+  <li><a href="deployment/yarn.html">Run on YARN</a></li>
+  <li><a href="deployment/standalone.html">Run as an embedded library</a></li>
 </ul>
 
 <h4>Connectors</h4>
 
 <ul class="documentation-list">
-  <li><a href="jobs/job-runner.html">Connectors overview</a></li>
-  <li><a href="jobs/configuration.html">Apache Kafka</a></li>
-  <li><a href="jobs/packaging.html">Apache Hadoop</a></li>
-  <li><a href="jobs/yarn-jobs.html">Azure EventHubs</a></li>
-  <li><a href="aws/kinesis.html">AWS Kinesis</a></li>
+  <li><a href="connectors/overview.html">Connectors overview</a></li>
+  <li><a href="connectors/kafka.html">Apache Kafka</a></li>
+  <li><a href="connectors/hdfs.html">Apache Hadoop</a></li>
+  <li><a href="connectors/eventhubs.html">Azure EventHubs</a></li>
+  <li><a href="connectors/kinesis.html">AWS Kinesis</a></li>
 </ul>
 
 <h4>Operations</h4>
 
 <ul class="documentation-list">
-  <li><a href="yarn/application-master.html">Debugging</a></li>
-  <li><a href="yarn/isolation.html">Monitoring & metrics</a></li>
-  <li><a href="yarn/isolation.html">Samza REST service</a></li>
+  <li><a href="operations/monitoring.html">Monitoring</a></li>
 </ul>
 
 </div>

[02/32] samza git commit: Reorganize website content, link hyper-links correctly, fix image links

Reply via email to