[1/2] samza git commit: Updated API documentation for high and low level APIs.

jagadish Tue, 13 Nov 2018 18:18:02 -0800

Repository: samza
Updated Branches:
  refs/heads/master 1d6f693c8 -> 4e5907090



http://git-wip-us.apache.org/repos/asf/samza/blob/4e590709/docs/learn/documentation/versioned/api/programming-model.md
----------------------------------------------------------------------
diff --git a/docs/learn/documentation/versioned/api/programming-model.md 
b/docs/learn/documentation/versioned/api/programming-model.md
index 1c9bd1c..efdcfa3 100644
--- a/docs/learn/documentation/versioned/api/programming-model.md
+++ b/docs/learn/documentation/versioned/api/programming-model.md
@@ -18,74 +18,81 @@ title: Programming Model
    See the License for the specific language governing permissions and
    limitations under the License.
 -->
-# Introduction
-Samza provides different sets of programming APIs to meet requirements from 
different sets of users. The APIs are listed below:
+### Introduction
+Samza provides multiple programming APIs to fit your use case:
 
-1. Java programming APIs: Samza provides Java programming APIs for users who 
are familiar with imperative programming languages. The overall programming 
model to create a Samza application in Java will be described here. Samza also 
provides two sets of APIs to describe user processing logic:
-    1. [High-level API](high-level-api.md): this API allows users to describe 
the end-to-end stream processing pipeline in a connected DAG (Directional 
Acyclic Graph). It also provides a rich set of build-in operators to help users 
implementing common transformation logic, such as filter, map, join, and window.
-    2. [Task API](low-level-api.md): this is low-level Java API which provides 
âbare-metalâ programming interfaces to the users. Task API allows users to 
explicitly access physical implementation details in the system, such as 
accessing the physical system stream partition of an incoming message and 
explicitly controlling the thread pool to execute asynchronous processing 
method.
-2. [Samza SQL](samza-sql.md): Samza provides SQL for users who are familiar 
with declarative query languages, which allows the users to focus on data 
manipulation via SQL predicates and UDFs, not the physical implementation 
details.
-3. Beam API: Samza also provides a [Beam 
runner](https://beam.apache.org/documentation/runners/capability-matrix/) to 
run applications written in Beam API. This is considered as an extension to 
existing operators supported by the high-level API in Samza.
+1. Java APIs: Samza's provides two Java programming APIs that are ideal for 
building advanced Stream Processing applications. 
+    1. [High Level Streams API](high-level-api.md): Samza's flexible High 
Level Streams API lets you describe your complex stream processing pipeline in 
the form of a Directional Acyclic Graph (DAG) of operations on message streams. 
It provides a rich set of built-in operators that simplify common stream 
processing operations such as filtering, projection, repartitioning, joins, and 
windows.
+    2. [Low Level Task API](low-level-api.md): Samza's powerful Low Level Task 
API lets you write your application in terms of processing logic for each 
incoming message. 
+2. [Samza SQL](samza-sql.md): Samza SQL provides a declarative query language 
for describing your stream processing logic. It lets you manipulate streams 
using SQL predicates and UDFs instead of working with the physical 
implementation details.
+3. Apache Beam API: Samza also provides a [Apache Beam 
runner](https://beam.apache.org/documentation/runners/capability-matrix/) to 
run applications written using the Apache Beam API. This is considered as an 
extension to the operators supported by the High Level Streams API in Samza.
 
-The following sections will be focused on Java programming APIs.
-
-# Key Concepts for a Samza Java Application
-To write a Samza Java application, you will typically follow the steps below:
-1. Define your input and output streams and tables
-2. Define your main processing logic
 
+### Key Concepts
 The following sections will talk about key concepts in writing your Samza 
applications in Java.
 
-## Samza Applications
-When writing your stream processing application using Java API in Samza, you 
implement either a 
[StreamApplication](javadocs/org/apache/samza/application/StreamApplication.html)
 or 
[TaskApplication](javadocs/org/apache/samza/application/TaskApplication.html) 
and define your processing logic in the describe method.
-- For StreamApplication:
+#### Samza Applications
+A 
[SamzaApplication](javadocs/org/apache/samza/application/SamzaApplication.html) 
describes the inputs, outputs, state,Â configuration and the logic for 
processing data from one or more streaming sources. 
+
+You can implement a 
+[StreamApplication](javadocs/org/apache/samza/application/StreamApplication.html)
 and use the provided 
[StreamApplicationDescriptor](javadocs/org/apache/samza/application/descriptors/StreamApplicationDescriptor)
 to describe the processing logic using Samza's High Level Streams API in terms 
of [MessageStream](javadocs/org/apache/samza/operators/MessageStream.html) 
operators. 
 
 {% highlight java %}
-    
-    public void describe(StreamApplicationDescriptor appDesc) { â¦ }
+
+    public class MyStreamApplication implements StreamApplication {
+        @Override
+        public void describe(StreamApplicationDescriptor appDesc) {
+            // Describe your application here 
+        }
+    }
 
 {% endhighlight %}
+
+Alternatively, you can implement a 
[TaskApplication](javadocs/org/apache/samza/application/TaskApplication.html) 
and use the provided 
[TaskApplicationDescriptor](javadocs/org/apache/samza/application/descriptors/TaskApplicationDescriptor)
 to describe it using Samza's Low Level API in terms of per-message processing 
logic.
+
+
 - For TaskApplication:
 
 {% highlight java %}
     
-    public void describe(TaskApplicationDescriptor appDesc) { â¦ }
+    public class MyTaskApplication implements TaskApplication {
+        @Override
+        public void describe(TaskApplicationDescriptor appDesc) {
+            // Describe your application here
+        }
+    }
 
 {% endhighlight %}
 
-## Descriptors for Data Streams and Tables
-There are three different types of descriptors in Samza: 
[InputDescriptor](javadocs/org/apache/samza/system/descriptors/InputDescriptor.html),
 
[OutputDescriptor](javadocs/org/apache/samza/system/descriptors/OutputDescriptor.html),
 and 
[TableDescriptor](javadocs/org/apache/samza/table/descriptors/TableDescriptor.html).
 The InputDescriptor and OutputDescriptor are used to describe the physical 
sources and destinations of a stream, while a TableDescriptor is used to 
describe the physical dataset and IO functions for a table.
-Usually, you will obtain InputDescriptor and OutputDescriptor from a 
[SystemDescriptor](javadocs/org/apache/samza/system/descriptors/SystemDescriptor.html),
 which include all information about producer and consumers to a physical 
system. The following code snippet illustrate how you will obtain 
InputDescriptor and OutputDescriptor from a SystemDescriptor.
 
-{% highlight java %}
-    
-    public class BadPageViewFilter implements StreamApplication {
-      @Override
-      public void describe(StreamApplicationDescriptor appDesc) {
-        KafkaSystemDescriptor kafka = new KafkaSystemDescriptor();
-        InputDescriptor<PageView> pageViewInput = 
kafka.getInputDescriptor(âpage-viewsâ, new JsonSerdeV2<>(PageView.class));
-        OutputDescriptor<DecoratedPageView> pageViewOutput = 
kafka.getOutputDescriptor(âdecorated-page-viewsâ, new 
JsonSerdeV2<>(DecoratedPageView.class));
+#### Streams and Table Descriptors
+Descriptors let you specify the properties of various aspects of your 
application from within it. 
 
-        // Now, implement your main processing logic
-      }
-    }
-    
-{% endhighlight %}
+[InputDescriptor](javadocs/org/apache/samza/system/descriptors/InputDescriptor.html)s
 and 
[OutputDescriptor](javadocs/org/apache/samza/system/descriptors/OutputDescriptor.html)s
 can be used for specifying Samza and implementation-specific properties of the 
streaming inputs and outputs for your application. You can obtain 
InputDescriptors and OutputDescriptors using a 
[SystemDescriptor](javadocs/org/apache/samza/system/descriptors/SystemDescriptor.html)
 for your system. This SystemDescriptor can be used for specify Samza and 
implementation-specific properties of the producer and consumers for your I/O 
system. Most Samza system implementations come with their own 
SystemDescriptors, but if one isn't available, you 
+can use the 
[GenericSystemDescriptor](javadocs/org/apache/samza/system/descriptors/GenericSystemDescriptor.html).
 
-You can also add a TableDescriptor to your application.
+A 
[TableDescriptor](javadocs/org/apache/samza/table/descriptors/TableDescriptor.html)
 can be used for specifying Samza and implementation-specific properties of a 
[Table](javadocs/org/apache/samza/table/Table.html). You can use a Local 
TableDescriptor (e.g. 
[RocksDbTableDescriptor](javadocs/org/apache/samza/storage/kv/descriptors/RocksDbTableDescriptor.html)
 or a 
[RemoteTableDescriptor](javadocs/org/apache/samza/table/descriptors/RemoteTableDescriptor).
+
+
+The following example illustrates how you can use input and output descriptors 
for a Kafka system, and a table descriptor for a local RocksDB table within 
your application:
 
 {% highlight java %}
-     
-    public class BadPageViewFilter implements StreamApplication {
+    
+    public class MyStreamApplication implements StreamApplication {
       @Override
-      public void describe(StreamApplicationDescriptor appDesc) {
-        KafkaSystemDescriptor kafka = new KafkaSystemDescriptor();
-        InputDescriptor<PageView> pageViewInput = 
kafka.getInputDescriptor(âpage-viewsâ, new JsonSerdeV2<>(PageView.class));
-        OutputDescriptor<DecoratedPageView> pageViewOutput = 
kafka.getOutputDescriptor(âdecorated-page-viewsâ, new 
JsonSerdeV2<>(DecoratedPageView.class));
-        TableDescriptor<String, Integer> viewCountTable = new 
RocksDBTableDescriptor(
-            âpageViewCountTableâ, KVSerde.of(new StringSerde(), new 
IntegerSerde()));
-
-        // Now, implement your main processing logic
+      public void describe(StreamApplicationDescriptor appDescriptor) {
+        KafkaSystemDescriptor ksd = new KafkaSystemDescriptor("kafka")
+            .withConsumerZkConnect(ImmutableList.of("..."))
+            .withProducerBootstrapServers(ImmutableList.of("...", "..."));
+        KafkaInputDescriptor<PageView> kid = 
+            ksd.getInputDescriptor(âpage-viewsâ, new 
JsonSerdeV2<>(PageView.class));
+        KafkaOutputDescriptor<DecoratedPageView> kod = 
+            ksd.getOutputDescriptor(âdecorated-page-viewsâ, new 
JsonSerdeV2<>(DecoratedPageView.class));
+
+        RocksDbTableDescriptor<String, Integer> td = 
+            new RocksDbTableDescriptor(âviewCountsâ, KVSerde.of(new 
StringSerde(), new IntegerSerde()));
+            
+        // Implement your processing logic here
       }
     }
     
@@ -93,21 +100,21 @@ You can also add a TableDescriptor to your application.
 
 The same code in the above describe method applies to TaskApplication as well.
 
-## Stream Processing Logic
+#### Stream Processing Logic
 
-Samza provides two sets of APIs to define the main stream processing logic, 
high-level API and Task API, via StreamApplication and TaskApplication, 
respectively. 
+Samza provides two sets of APIs to define the main stream processing logic, 
High Level Streams API and Low Level Task API, via StreamApplication and 
TaskApplication, respectively. 
 
-High-level API allows you to describe the processing logic in a connected DAG 
of transformation operators, like the example below:
+High Level Streams API allows you to describe the processing logic in a 
connected DAG of transformation operators, like the example below:
 
 {% highlight java %}
 
     public class BadPageViewFilter implements StreamApplication {
       @Override
       public void describe(StreamApplicationDescriptor appDesc) {
-        KafkaSystemDescriptor kafka = new KafkaSystemDescriptor();
+        KafkaSystemDescriptor ksd = new KafkaSystemDescriptor();
         InputDescriptor<PageView> pageViewInput = 
kafka.getInputDescriptor(âpage-viewsâ, new JsonSerdeV2<>(PageView.class));
         OutputDescriptor<DecoratedPageView> pageViewOutput = 
kafka.getOutputDescriptor(âdecorated-page-viewsâ, new 
JsonSerdeV2<>(DecoratedPageView.class));
-        TableDescriptor<String, Integer> viewCountTable = new 
RocksDBTableDescriptor(
+        RocksDbTableDescriptor<String, Integer> viewCountTable = new 
RocksDbTableDescriptor(
             âpageViewCountTableâ, KVSerde.of(new StringSerde(), new 
IntegerSerde()));
 
         // Now, implement your main processing logic
@@ -120,7 +127,7 @@ High-level API allows you to describe the processing logic 
in a connected DAG of
     
 {% endhighlight %}
 
-Task API allows you to describe the processing logic in a customized 
StreamTaskFactory or AsyncStreamTaskFactory, like the example below:
+Low Level Task API allows you to describe the processing logic in a customized 
StreamTaskFactory or AsyncStreamTaskFactory, like the example below:
 
 {% highlight java %}
 
@@ -130,7 +137,7 @@ Task API allows you to describe the processing logic in a 
customized StreamTaskF
         KafkaSystemDescriptor kafka = new KafkaSystemDescriptor();
         InputDescriptor<PageView> pageViewInput = 
kafka.getInputDescriptor(âpage-viewsâ, new JsonSerdeV2<>(PageView.class));
         OutputDescriptor<DecoratedPageView> pageViewOutput = 
kafka.getOutputDescriptor(âdecorated-page-viewsâ, new 
JsonSerdeV2<>(DecoratedPageView.class));
-        TableDescriptor<String, Integer> viewCountTable = new 
RocksDBTableDescriptor(
+        RocksDbTableDescriptor<String, Integer> viewCountTable = new 
RocksDbTableDescriptor(
             âpageViewCountTableâ, KVSerde.of(new StringSerde(), new 
IntegerSerde()));
 
         // Now, implement your main processing logic
@@ -142,11 +149,10 @@ Task API allows you to describe the processing logic in a 
customized StreamTaskF
     
 {% endhighlight %}
 
-Details for [high-level API](high-level-api.md) and [Task 
API](low-level-api.md) are explained later.
+#### Configuration for a Samza Application
 
-## Configuration for a Samza Application
+To deploy a Samza application, you need to specify the implementation class 
for your application and the ApplicationRunner to launch your application. The 
following is an incomplete example of minimum required configuration to set up 
the Samza application and the runner. For additional configuration, see the 
Configuration Reference.
 
-To deploy a Samza application, you will need to specify the implementation 
class for your application and the ApplicationRunner to launch your 
application. The following is an incomplete example of minimum required 
configuration to set up the Samza application and the runner:
 {% highlight jproperties %}
     
     # This is the class implementing StreamApplication

http://git-wip-us.apache.org/repos/asf/samza/blob/4e590709/docs/learn/documentation/versioned/api/samza-sql.md
----------------------------------------------------------------------
diff --git a/docs/learn/documentation/versioned/api/samza-sql.md 
b/docs/learn/documentation/versioned/api/samza-sql.md
index 13b059f..7412f6c 100644
--- a/docs/learn/documentation/versioned/api/samza-sql.md
+++ b/docs/learn/documentation/versioned/api/samza-sql.md
@@ -87,7 +87,7 @@ Note: Samza sql console right now doesnât support queries 
that need state, for
 
 
 # Running Samza SQL on YARN
-The [hello-samza](https://github.com/apache/samza-hello-samza) project is an 
example project designed to help you run your first Samza application. It has 
examples of applications using the low level task API, high level API as well 
as Samza SQL.
+The [hello-samza](https://github.com/apache/samza-hello-samza) project is an 
example project designed to help you run your first Samza application. It has 
examples of applications using the Low Level Task API, High Level Streams API 
as well as Samza SQL.
 
 This tutorial demonstrates a simple Samza application that uses SQL to perform 
stream processing.
 

http://git-wip-us.apache.org/repos/asf/samza/blob/4e590709/docs/learn/documentation/versioned/api/table-api.md
----------------------------------------------------------------------
diff --git a/docs/learn/documentation/versioned/api/table-api.md 
b/docs/learn/documentation/versioned/api/table-api.md
index f5efa88..0a9c33c 100644
--- a/docs/learn/documentation/versioned/api/table-api.md
+++ b/docs/learn/documentation/versioned/api/table-api.md
@@ -181,7 +181,7 @@ join with a table and finally write the output to another 
table.
 
 # Using Table with Samza Low Level API
 
-The code snippet below illustrates the usage of table in Samza low level API.
+The code snippet below illustrates the usage of table in Samza Low Level Task 
API.
 
 {% highlight java %}
  1  class SamzaTaskApplication implements TaskApplication {
@@ -273,8 +273,7 @@ The table below summarizes table metrics:
 
 
[`RemoteTable`](https://github.com/apache/samza/blob/master/samza-core/src/main/java/org/apache/samza/table/remote/RemoteTableDescriptor.java)
 
 provides a unified abstraction for Samza applications to access any remote 
data 
-store through stream-table join in high-level API or direct access in 
low-level 
-API. Remote Table is a store-agnostic abstraction that can be customized to 
+store through stream-table join in High Level Streams API or direct access in 
Low Level Task API. Remote Table is a store-agnostic abstraction that can be 
customized to 
 access new types of stores by writing pluggable I/O "Read/Write" functions, 
 implementations of 
 
[`TableReadFunction`](https://github.com/apache/samza/blob/master/samza-core/src/main/java/org/apache/samza/table/remote/TableReadFunction.java)
 and 
@@ -283,7 +282,7 @@ interfaces. Remote Table also provides common 
functionality, eg. rate limiting
 (built-in) and caching (hybrid).
 
 The async APIs in Remote Table are recommended over the sync versions for 
higher 
-throughput. They can be used with Samza with low-level API to achieve the 
maximum 
+throughput. They can be used with Samza with Low Level Task API to achieve the 
maximum 
 throughput. 
 
 Remote Tables are represented by class 
@@ -420,7 +419,7 @@ created during instantiation of Samza container.
 The life of a table goes through a few phases
 
 1. **Declaration** - at first one declares the table by creating a 
`TableDescriptor`. In both 
-   Samza high level and low level API, the `TableDescriptor` is registered 
with stream 
+   Samza High Level Streams API and Low Level Task API, the `TableDescriptor` 
is registered with stream 
    graph, internally converted to `TableSpec` and in return a reference to a 
`Table` 
    object is obtained that can participate in the building of the DAG.
 2. **Instantiation** - during planning stage, configuration is 
@@ -436,7 +435,7 @@ The life of a table goes through a few phases
    * In Samza high level API, all table instances can be retrieved from 
`TaskContext` using 
      table-id during initialization of a 
      [`InitableFunction`] 
(https://github.com/apache/samza/blob/master/samza-api/src/main/java/org/apache/samza/operators/functions/InitableFunction.java).
-   * In Samza low level API, all table instances can be retrieved from 
`TaskContext` using 
+   * In Samza Low Level Task API, all table instances can be retrieved from 
`TaskContext` using 
      table-id during initialization of a 
    [`InitableTask`] 
(https://github.com/apache/samza/blob/master/samza-api/src/main/java/org/apache/samza/task/InitableTask.java).
 4. **Cleanup** - 

http://git-wip-us.apache.org/repos/asf/samza/blob/4e590709/docs/learn/documentation/versioned/connectors/eventhubs.md
----------------------------------------------------------------------
diff --git a/docs/learn/documentation/versioned/connectors/eventhubs.md 
b/docs/learn/documentation/versioned/connectors/eventhubs.md
index 0be288b..9fdc861 100644
--- a/docs/learn/documentation/versioned/connectors/eventhubs.md
+++ b/docs/learn/documentation/versioned/connectors/eventhubs.md
@@ -126,7 +126,7 @@ In this section, we will walk through a simple pipeline 
that reads from one Even
 4    MessageStream<KV<String, String>> eventhubInput = 
appDescriptor.getInputStream(inputDescriptor);
 5    OutputStream<KV<String, String>> eventhubOutput = 
appDescriptor.getOutputStream(outputDescriptor);
 
-    // Define the execution flow with the high-level API
+    // Define the execution flow with the High Level Streams API
 6    eventhubInput
 7        .map((message) -> {
 8          System.out.println("Received Key: " + message.getKey());

http://git-wip-us.apache.org/repos/asf/samza/blob/4e590709/docs/learn/documentation/versioned/connectors/kafka.md
----------------------------------------------------------------------
diff --git a/docs/learn/documentation/versioned/connectors/kafka.md 
b/docs/learn/documentation/versioned/connectors/kafka.md
index b71c736..9628130 100644
--- a/docs/learn/documentation/versioned/connectors/kafka.md
+++ b/docs/learn/documentation/versioned/connectors/kafka.md
@@ -24,9 +24,9 @@ Samza offers built-in integration with Apache Kafka for 
stream processing. A com
 
 The `hello-samza` project includes multiple examples on interacting with Kafka 
from your Samza jobs. Each example also includes instructions on how to run 
them and view results. 
 
-- [High-level API 
Example](https://github.com/apache/samza-hello-samza/blob/latest/src/main/java/samza/examples/cookbook/FilterExample.java)
 with a corresponding 
[tutorial](/learn/documentation/{{site.version}}/deployment/yarn.html#starting-your-application-on-yarn)
+- [High Level Streams API 
Example](https://github.com/apache/samza-hello-samza/blob/latest/src/main/java/samza/examples/cookbook/FilterExample.java)
 with a corresponding 
[tutorial](/learn/documentation/{{site.version}}/deployment/yarn.html#starting-your-application-on-yarn)
 
-- [Low-level API 
Example](https://github.com/apache/samza-hello-samza/blob/latest/src/main/java/samza/examples/wikipedia/task/application/WikipediaParserTaskApplication.java)
 with a corresponding 
[tutorial](https://github.com/apache/samza-hello-samza#hello-samza)
+- [Low Level Task API 
Example](https://github.com/apache/samza-hello-samza/blob/latest/src/main/java/samza/examples/wikipedia/task/application/WikipediaParserTaskApplication.java)
 with a corresponding 
[tutorial](https://github.com/apache/samza-hello-samza#hello-samza)
 
 
 ### Concepts
@@ -105,7 +105,7 @@ The above example configures Samza to ignore checkpointed 
offsets for `page-view
 
  
 
-### Code walkthrough: High-level API
+### Code walkthrough: High Level Streams API
 
 In this section, we walk through a complete example that reads from a Kafka 
topic, filters a few messages and writes them to another topic.
 

http://git-wip-us.apache.org/repos/asf/samza/blob/4e590709/docs/learn/documentation/versioned/core-concepts/core-concepts.md
----------------------------------------------------------------------
diff --git a/docs/learn/documentation/versioned/core-concepts/core-concepts.md 
b/docs/learn/documentation/versioned/core-concepts/core-concepts.md
index 479ebcb..b69de3d 100644
--- a/docs/learn/documentation/versioned/core-concepts/core-concepts.md
+++ b/docs/learn/documentation/versioned/core-concepts/core-concepts.md
@@ -55,7 +55,7 @@ A stream can have multiple producers that write data to it 
and multiple consumer
 
 A stream is sharded into multiple partitions for scaling how its data is 
processed. Each _partition_ is an ordered, replayable sequence of records. When 
a message is written to a stream, it ends up in one its partitions. Each 
message in a partition is uniquely identified by an _offset_. 
 
-Samza supports for pluggable systems that can implement the stream 
abstraction. As an example, Kafka implements a stream as a topic while another 
database might implement a stream as a sequence of updates to its tables.
+Samza supports pluggable systems that can implement the stream abstraction. As 
an example, Kafka implements a stream as a topic while a database might 
implement a stream as a sequence of updates to its tables.
 
 ## Stream Application
 A _stream application_ processes messages from input streams, transforms them 
and emits results to an output stream or a database. It is built by chaining 
multiple operators, each of which take in one or more streams and transform 
them.
@@ -63,19 +63,19 @@ A _stream application_ processes messages from input 
streams, transforms them an
 
![diagram-medium](/img/{{site.version}}/learn/documentation/core-concepts/stream-application.png)
 
 Samza offers three top-level APIs to help you build your stream applications: 
<br/>
-1. The [Samza Streams 
DSL](/learn/documentation/{{site.version}}/api/high-level-api.html),  which 
offers several built-in operators like map, filter etc. This is the recommended 
API for most use-cases. <br/>
-2. The [low-level 
API](/learn/documentation/{{site.version}}/api/low-level-api.html), which 
allows greater flexibility to define your processing-logic and offers for 
greater control <br/>
+1. The [High Level Streams 
API](/learn/documentation/{{site.version}}/api/high-level-api.html),  which 
offers several built-in operators like map, filter etc. This is the recommended 
API for most use-cases. <br/>
+2. The [Low Level Task 
API](/learn/documentation/{{site.version}}/api/low-level-api.html), which 
allows greater flexibility to define your processing-logic and offers greater 
control <br/>
 3. [Samza SQL](/learn/documentation/{{site.version}}/api/samza-sql.html), 
which offers a declarative SQL interface to create your applications <br/>
 
 ## State
-Samza supports for both stateless and stateful stream processing. _Stateless 
processing_, as the name implies, does not retain any state associated with the 
current message after it has been processed. A good example of this is to 
filter an incoming stream of user-records by a field (eg:userId) and write the 
filtered messages to their own stream. 
+Samza supports for both stateless and stateful stream processing. _Stateless 
processing_, as the name implies, does not retain any state associated with the 
current message after it has been processed. A good example of this is 
filtering an incoming stream of user-records by a field (eg:userId) and writing 
the filtered messages to their own stream. 
 
-In contrast, _stateful processing_ requires you to record some state about a 
message even after processing it. Consider the example of counting the number 
of unique users to a website every five minutes. This requires you to record 
state about each user seen thus far, for deduping later. Samza offers a 
fault-tolerant, scalable state-store for this purpose.
+In contrast, _stateful processing_ requires you to record some state about a 
message even after processing it. Consider the example of counting the number 
of unique users to a website every five minutes. This requires you to store 
information about each user seen thus far for de-duplication. Samza offers a 
fault-tolerant, scalable state-store for this purpose.
 
 ## Time
-Time is a fundamental concept in stream processing, especially how it is 
modeled and interpreted by the system. Samza supports two notions of dealing 
with time. By default, all built-in Samza operators use processing time. In 
processing time, the timestamp of a message is determined by when it is 
processed by the system. For example, an event generated by a sensor could be 
processed by Samza several milliseconds later. 
+Time is a fundamental concept in stream processing, especially in how it is 
modeled and interpreted by the system. Samza supports two notions of time. By 
default, all built-in Samza operators use processing time. In processing time, 
the timestamp of a message is determined by when it is processed by the system. 
For example, an event generated by a sensor could be processed by Samza several 
milliseconds later. 
 
-On the other hand, in event time, the timestamp of an event is determined by 
when it actually occurred in the source. For example, a sensor which generates 
an event could embed the time of occurrence as a part of the event itself. 
Samza provides event-time based processing by its integration with [Apache 
BEAM](https://beam.apache.org/documentation/runners/samza/).
+On the other hand, in event time, the timestamp of an event is determined by 
when it actually occurred at the source. For example, a sensor which generates 
an event could embed the time of occurrence as a part of the event itself. 
Samza provides event-time based processing by its integration with [Apache 
BEAM](https://beam.apache.org/documentation/runners/samza/).
 
 ## Processing guarantee
 Samza supports at-least once processing. As the name implies, this ensures 
that each message in the input stream is processed by the system at-least once. 
This guarantees no data-loss even when there are failures, thereby making Samza 
a practical choice for building fault-tolerant applications.

http://git-wip-us.apache.org/repos/asf/samza/blob/4e590709/docs/learn/documentation/versioned/hadoop/overview.md
----------------------------------------------------------------------
diff --git a/docs/learn/documentation/versioned/hadoop/overview.md 
b/docs/learn/documentation/versioned/hadoop/overview.md
index 0820127..3d8caaa 100644
--- a/docs/learn/documentation/versioned/hadoop/overview.md
+++ b/docs/learn/documentation/versioned/hadoop/overview.md
@@ -29,7 +29,7 @@ Samza provides a single set of APIs for both batch and stream 
processing. This u
 
 ### Multi-stage Batch Pipeline
 
-Complex data pipelines usually consist multiple stages, with data shuffled 
(repartitioned) between stages to enable key-based operations such as 
windowing, aggregation, and join. Samza [high-level 
API](/startup/preview/index.html) provides an operator named `partitionBy` to 
create such multi-stage pipelines. Internally, Samza creates a physical stream, 
called an âintermediate streamâ, based on the system configured as in 
`job.default.system`. Samza repartitions the output of the previous stage by 
sending it to the intermediate stream with the appropriate partition count and 
partition key. It then feeds it to the next stage of the pipeline. The 
lifecycle of intermediate streams is completely managed by Samza so from the 
user perspective the data shuffling is automatic.
+Complex data pipelines usually consist multiple stages, with data shuffled 
(repartitioned) between stages to enable key-based operations such as 
windowing, aggregation, and join. Samza [High Level Streams 
API](/startup/preview/index.html) provides an operator named `partitionBy` to 
create such multi-stage pipelines. Internally, Samza creates a physical stream, 
called an âintermediate streamâ, based on the system configured as in 
`job.default.system`. Samza repartitions the output of the previous stage by 
sending it to the intermediate stream with the appropriate partition count and 
partition key. It then feeds it to the next stage of the pipeline. The 
lifecycle of intermediate streams is completely managed by Samza so from the 
user perspective the data shuffling is automatic.
 
 For a single-stage pipeline, dealing with bounded data sets is 
straightforward: the system consumer âknowsâ the end of a particular 
partition, and it will emit end-of-stream token once a partition is complete. 
Samza will shut down the container when all its input partitions are complete.
 

http://git-wip-us.apache.org/repos/asf/samza/blob/4e590709/docs/learn/documentation/versioned/index.html
----------------------------------------------------------------------
diff --git a/docs/learn/documentation/versioned/index.html 
b/docs/learn/documentation/versioned/index.html
index 893e428..2d5446b 100644
--- a/docs/learn/documentation/versioned/index.html
+++ b/docs/learn/documentation/versioned/index.html
@@ -26,19 +26,15 @@ title: Documentation
 <h4>API</h4>
 
 <ul class="documentation-list">
-  <li><a href="api/low-level-api.html">Low-level API</a></li>
-  <li><a href="api/high-level-api.html">Streams DSL</a></li>
+  <li><a href="api/programming-model.html">API overview</a></li>
+  <li><a href="api/high-level-api.html">High Level Streams API</a></li>
+  <li><a href="api/low-level-api.html">Low Level Task API</a></li>
   <li><a href="api/table-api.html">Table API</a></li>
   <li><a href="api/samza-sql.html">Samza SQL</a></li>
   <li><a href="https://beam.apache.org/documentation/runners/samza/";>Apache 
BEAM</a></li>
-<!-- TODO comparisons pages
-  <li><a href="comparisons/aurora.html">Aurora</a></li>
-  <li><a href="comparisons/jms.html">JMS</a></li>
-  <li><a href="comparisons/s4.html">S4</a></li>
--->
 </ul>
 
-<h4>Deployment</h4>
+<h4>DEPLOYMENT</h4>
 
 <ul class="documentation-list">
   <li><a href="deployment/deployment-model.html">Deployment options</a></li>
@@ -46,7 +42,7 @@ title: Documentation
   <li><a href="deployment/standalone.html">Run as an embedded library</a></li>
 </ul>
 
-<h4>Connectors</h4>
+<h4>CONNECTORS</h4>
 
 <ul class="documentation-list">
   <li><a href="connectors/overview.html">Connectors overview</a></li>
@@ -56,7 +52,7 @@ title: Documentation
   <li><a href="connectors/kinesis.html">AWS Kinesis</a></li>
 </ul>
 
-<h4>Operations</h4>
+<h4>OPERATIONS</h4>
 
 <ul class="documentation-list">
   <li><a href="operations/monitoring.html">Monitoring</a></li>

http://git-wip-us.apache.org/repos/asf/samza/blob/4e590709/docs/learn/documentation/versioned/jobs/samza-configurations.md
----------------------------------------------------------------------
diff --git a/docs/learn/documentation/versioned/jobs/samza-configurations.md 
b/docs/learn/documentation/versioned/jobs/samza-configurations.md
index 5828969..e55446a 100644
--- a/docs/learn/documentation/versioned/jobs/samza-configurations.md
+++ b/docs/learn/documentation/versioned/jobs/samza-configurations.md
@@ -249,7 +249,7 @@ These properties define Samza's storage mechanism for 
efficient [stateful stream
 
 |Name|Default|Description|
 |--- |--- |--- |
-|stores.**_store-name_**.factory| |You can give a store any **_store-name_** 
except `default` (`default` is reserved for defining default store parameters), 
and use that name to get a reference to the store in your Samza application 
(call 
[TaskContext.getStore()](../api/javadocs/org/apache/samza/task/TaskContext.html#getStore(java.lang.String))
 in your task's 
[`init()`](../api/javadocs/org/apache/samza/task/InitableTask.html#init) method 
for the low-level API and in your application function's 
[`init()`](..api/javadocs/org/apache/samza/operators/functions/InitableFunction.html#init)
 method for the high level API. The value of this property is the 
fully-qualified name of a Java class that implements 
[`StorageEngineFactory`](../api/javadocs/org/apache/samza/storage/StorageEngineFactory.html).
 Samza currently ships with two storage engine implementations: 
<br><br>`org.apache.samza.storage.kv.RocksDbKeyValueStorageEngineFactory` 
<br>An on-disk storage engine with a key-value interface, 
 implemented using [RocksDB](http://rocksdb.org/). It supports fast 
random-access reads and writes, as well as range queries on keys. RocksDB can 
be configured with various additional tuning parameters.<br><br> 
`org.apache.samza.storage.kv.inmemory.InMemoryKeyValueStorageEngineFactory`<br> 
In memory implementation of a key-value store. Uses 
`util.concurrent.ConcurrentSkipListMap` to store the keys in order.|
+|stores.**_store-name_**.factory| |You can give a store any **_store-name_** 
except `default` (`default` is reserved for defining default store parameters), 
and use that name to get a reference to the store in your Samza application 
(call 
[TaskContext.getStore()](../api/javadocs/org/apache/samza/task/TaskContext.html#getStore(java.lang.String))
 in your task's 
[`init()`](../api/javadocs/org/apache/samza/task/InitableTask.html#init) method 
for the Low Level Task API and in your application function's 
[`init()`](..api/javadocs/org/apache/samza/operators/functions/InitableFunction.html#init)
 method for the high level API. The value of this property is the 
fully-qualified name of a Java class that implements 
[`StorageEngineFactory`](../api/javadocs/org/apache/samza/storage/StorageEngineFactory.html).
 Samza currently ships with two storage engine implementations: 
<br><br>`org.apache.samza.storage.kv.RocksDbKeyValueStorageEngineFactory` 
<br>An on-disk storage engine with a key-value interf
 ace, implemented using [RocksDB](http://rocksdb.org/). It supports fast 
random-access reads and writes, as well as range queries on keys. RocksDB can 
be configured with various additional tuning parameters.<br><br> 
`org.apache.samza.storage.kv.inmemory.InMemoryKeyValueStorageEngineFactory`<br> 
In memory implementation of a key-value store. Uses 
`util.concurrent.ConcurrentSkipListMap` to store the keys in order.|
 |stores.**_store-name_**.key.serde| |If the storage engine expects keys in the 
store to be simple byte arrays, this [serde](../container/serialization.html) 
allows the stream task to access the store using another object type as key. 
The value of this property must be a serde-name that is registered with 
serializers.registry.*.class. If this property is not set, keys are passed 
unmodified to the storage engine (and the changelog stream, if appropriate).|
 |stores.**_store-name_**.msg.serde| |If the storage engine expects values in 
the store to be simple byte arrays, this 
[serde](../container/serialization.html) allows the stream task to access the 
store using another object type as value. The value of this property must be a 
serde-name that is registered with serializers.registry.*.class. If this 
property is not set, values are passed unmodified to the storage engine (and 
the changelog stream, if appropriate).|
 |stores.**_store-name_**.changelog| |Samza stores are local to a container. If 
the container fails, the contents of the store are lost. To prevent loss of 
data, you need to set this property to configure a changelog stream: Samza then 
ensures that writes to the store are replicated to this stream, and the store 
is restored from this stream after a failure. The value of this property is 
given in the form system-name.stream-name. The "system-name" part is optional. 
If it is omitted you must specify the system in job.changelog.system config. 
Any output stream can be used as changelog, but you must ensure that only one 
job ever writes to a given changelog stream (each instance of a job and each 
store needs its own changelog stream).|

http://git-wip-us.apache.org/repos/asf/samza/blob/4e590709/docs/learn/documentation/versioned/operations/monitoring.md
----------------------------------------------------------------------
diff --git a/docs/learn/documentation/versioned/operations/monitoring.md 
b/docs/learn/documentation/versioned/operations/monitoring.md
index d93e497..ec8430c 100644
--- a/docs/learn/documentation/versioned/operations/monitoring.md
+++ b/docs/learn/documentation/versioned/operations/monitoring.md
@@ -37,8 +37,8 @@ Like any other production software, it is critical to monitor 
the health of our
   + [A.3 Creating a Custom MetricsReporter](#customreporter)
 * [B. Metric Types in Samza](#metrictypes)
 * [C. Adding User-Defined Metrics](#userdefinedmetrics)
-  + [Low-level API](#lowlevelapi)
-  + [High-Level API](#highlevelapi)
+  + [Low Level Task API](#lowlevelapi)
+  + [High Level Streams API](#highlevelapi)
 * [D. Key Internal Samza Metrics](#keyinternalsamzametrics)
   + [D.1 Vital Metrics](#vitalmetrics)
   + [D.2 Store Metrics](#storemetrics)
@@ -232,7 +232,7 @@ To add a new metric, you can simply use the 
_MetricsRegistry_ in the provided Ta
 the init() method to register new metrics. The code snippets below show 
examples of registering and updating a user-defined
  Counter metric. Timers and gauges can similarly be used from within your task 
class.
 
-### <a name="lowlevelapi"></a> Low-level API
+### <a name="lowlevelapi"></a> Low Level Task API
 
 Simply have your task implement the InitableTask interface and access the 
MetricsRegistry from the TaskContext.
 
@@ -252,9 +252,9 @@ public class MyJavaStreamTask implements StreamTask, 
InitableTask {
 }
 ```
 
-### <a name="highlevelapi"></a> High-Level API
+### <a name="highlevelapi"></a> High Level Streams API
 
-In the high-level API, you can define a ContextManager and access the 
MetricsRegistry from the TaskContext, using which you can add and update your 
metrics.
+In the High Level Streams API, you can define a ContextManager and access the 
MetricsRegistry from the TaskContext, using which you can add and update your 
metrics.
 
 ```
 public class MyJavaStreamApp implements StreamApplication {

http://git-wip-us.apache.org/repos/asf/samza/blob/4e590709/docs/learn/tutorials/versioned/hello-samza-high-level-code.md
----------------------------------------------------------------------
diff --git a/docs/learn/tutorials/versioned/hello-samza-high-level-code.md 
b/docs/learn/tutorials/versioned/hello-samza-high-level-code.md
index 1c06116..a881b51 100644
--- a/docs/learn/tutorials/versioned/hello-samza-high-level-code.md
+++ b/docs/learn/tutorials/versioned/hello-samza-high-level-code.md
@@ -35,7 +35,7 @@ cd hello-samza
 git checkout latest
 {% endhighlight %}
 
-This project already contains implementations of the wikipedia application 
using both the low-level task API and the high-level API. The low-level task 
implementations are in the `samza.examples.wikipedia.task` package. The 
high-level application implementation is in the 
`samza.examples.wikipedia.application` package.
+This project already contains implementations of the wikipedia application 
using both the Low Level Task API and the High Level Streams API. The Low Level 
Task API implementations are in the `samza.examples.wikipedia.task` package. 
The High Level Streams API implementation is in the 
`samza.examples.wikipedia.application` package.
 
 This tutorial will provide step by step instructions to recreate the existing 
wikipedia application.
 

http://git-wip-us.apache.org/repos/asf/samza/blob/4e590709/docs/learn/tutorials/versioned/hello-samza-high-level-yarn.md
----------------------------------------------------------------------
diff --git a/docs/learn/tutorials/versioned/hello-samza-high-level-yarn.md 
b/docs/learn/tutorials/versioned/hello-samza-high-level-yarn.md
index cfe589a..5127fd6 100644
--- a/docs/learn/tutorials/versioned/hello-samza-high-level-yarn.md
+++ b/docs/learn/tutorials/versioned/hello-samza-high-level-yarn.md
@@ -18,9 +18,9 @@ title: Hello Samza High Level API - YARN Deployment
    See the License for the specific language governing permissions and
    limitations under the License.
 -->
-The [hello-samza](https://github.com/apache/samza-hello-samza) project is an 
example project designed to help you run your first Samza application. It has 
examples of applications using the low level task API as well as the high level 
API.
+The [hello-samza](https://github.com/apache/samza-hello-samza) project is an 
example project designed to help you run your first Samza application. It has 
examples of applications using the Low Level Task API as well as the High Level 
Streams API.
 
-This tutorial demonstrates a simple wikipedia application created with the 
high level API. The [Hello Samza tutorial] 
(/startup/hello-samza/{{site.version}}/index.html) is the low-level analog to 
this tutorial. It demonstrates the same logic but is created with the task API. 
The tutorials are designed to be as similar as possible. The primary 
differences are that with the high level API we accomplish the equivalent of 3 
separate low-level jobs with a single application, we skip the intermediate 
topics for simplicity, and we can visualize the execution plan after we start 
the application.
+This tutorial demonstrates a simple wikipedia application created with the 
High Level Streams API. The [Hello Samza tutorial] 
(/startup/hello-samza/{{site.version}}/index.html) is the Low Level Task API 
analog to this tutorial. It demonstrates the same logic but is created with the 
Low Level Task API. The tutorials are designed to be as similar as possible. 
The primary differences are that with the High Level Streams API we accomplish 
the equivalent of 3 separate Low Level Task API jobs with a single application, 
we skip the intermediate topics for simplicity, and we can visualize the 
execution plan after we start the application.
 
 ### Get the Code
 

http://git-wip-us.apache.org/repos/asf/samza/blob/4e590709/docs/learn/tutorials/versioned/samza-event-hubs-standalone.md
----------------------------------------------------------------------
diff --git a/docs/learn/tutorials/versioned/samza-event-hubs-standalone.md 
b/docs/learn/tutorials/versioned/samza-event-hubs-standalone.md
index 53dbec3..b9fe533 100644
--- a/docs/learn/tutorials/versioned/samza-event-hubs-standalone.md
+++ b/docs/learn/tutorials/versioned/samza-event-hubs-standalone.md
@@ -18,7 +18,7 @@ title: Samza Event Hubs Connectors Example
    See the License for the specific language governing permissions and
    limitations under the License.
 -->
-The [hello-samza](https://github.com/apache/samza-hello-samza) project has an 
example that uses the Samza high-level API to consume and produce from [Event 
Hubs](../../documentation/versioned/connectors/eventhubs.html) using the 
Zookeeper deployment model.
+The [hello-samza](https://github.com/apache/samza-hello-samza) project has an 
example that uses the Samza High Level Streams API to consume and produce from 
[Event Hubs](../../documentation/versioned/connectors/eventhubs.html) using the 
Zookeeper deployment model.
 
 #### Get the Code
 

http://git-wip-us.apache.org/repos/asf/samza/blob/4e590709/docs/learn/tutorials/versioned/samza-sql.md
----------------------------------------------------------------------
diff --git a/docs/learn/tutorials/versioned/samza-sql.md 
b/docs/learn/tutorials/versioned/samza-sql.md
index f64aa06..71cfcc6 100644
--- a/docs/learn/tutorials/versioned/samza-sql.md
+++ b/docs/learn/tutorials/versioned/samza-sql.md
@@ -68,7 +68,7 @@ Below are some of the sql queries that you can execute using 
the samza-sql-conso
 
 # Running Samza SQL on YARN
 
-The [hello-samza](https://github.com/apache/samza-hello-samza) project is an 
example project designed to help you run your first Samza application. It has 
examples of applications using the low level task API, high level API as well 
as Samza SQL.
+The [hello-samza](https://github.com/apache/samza-hello-samza) project is an 
example project designed to help you run your first Samza application. It has 
examples of applications using the Low Level  Task API, High Level Streams API 
as well as Samza SQL.
 
 This tutorial demonstrates a simple Samza application that uses SQL to perform 
stream processing.
 

http://git-wip-us.apache.org/repos/asf/samza/blob/4e590709/docs/startup/code-examples/versioned/index.md
----------------------------------------------------------------------
diff --git a/docs/startup/code-examples/versioned/index.md 
b/docs/startup/code-examples/versioned/index.md
index ba1cc3e..384419d 100644
--- a/docs/startup/code-examples/versioned/index.md
+++ b/docs/startup/code-examples/versioned/index.md
@@ -36,7 +36,7 @@ These include:
 
 - The [Join 
example](https://github.com/apache/samza-hello-samza/blob/latest/src/main/java/samza/examples/cookbook/JoinExample.java])
 demonstrates how you can join a Kafka stream of page-views with a stream of 
ad-clicks
 
-- The [Stream-Table Join 
example](https://github.com/apache/samza-hello-samza/blob/latest/src/main/java/samza/examples/cookbook/RemoteTableJoinExample.java)
 demonstrates how the Samza Table API. It joins a Kafka stream with a remote 
dataset accessed through a REST service.
+- The [Stream-Table Join 
example](https://github.com/apache/samza-hello-samza/blob/latest/src/main/java/samza/examples/cookbook/RemoteTableJoinExample.java)
 demonstrates how to use the Samza Table API. It joins a Kafka stream with a 
remote dataset accessed through a REST service.
 
 - The 
[SessionWindow](https://github.com/apache/samza-hello-samza/blob/latest/src/main/java/samza/examples/cookbook/SessionWindowExample.java)
 and 
[TumblingWindow](https://github.com/apache/samza-hello-samza/blob/latest/src/main/java/samza/examples/cookbook/TumblingWindowExample.java)
 examples illustrate Samza's rich windowing and triggering capabilities.
 

http://git-wip-us.apache.org/repos/asf/samza/blob/4e590709/samza-api/src/main/java/org/apache/samza/task/ClosableTask.java
----------------------------------------------------------------------
diff --git a/samza-api/src/main/java/org/apache/samza/task/ClosableTask.java 
b/samza-api/src/main/java/org/apache/samza/task/ClosableTask.java
index 36003b3..148773f 100644
--- a/samza-api/src/main/java/org/apache/samza/task/ClosableTask.java
+++ b/samza-api/src/main/java/org/apache/samza/task/ClosableTask.java
@@ -19,12 +19,19 @@
 
 package org.apache.samza.task;
 
+import org.apache.samza.context.ApplicationContainerContext;
+import org.apache.samza.context.ApplicationTaskContext;
+
 /**
  * A ClosableTask augments {@link org.apache.samza.task.StreamTask}, allowing 
the method implementer to specify
  * code that will be called when the StreamTask is being shut down by the 
framework, providing to emit final metrics,
  * clean or close resources, etc.  The close method is not guaranteed to be 
called in event of crash or hard kill
  * of the process.
+ *
+ * Deprecated: It's recommended to manage the lifecycle of any runtime objects 
using
+ * {@link ApplicationContainerContext} and {@link ApplicationTaskContext} 
instead.
  */
+@Deprecated
 public interface ClosableTask {
   void close() throws Exception;
 }

[1/2] samza git commit: Updated API documentation for high and low level APIs.

Reply via email to