samza git commit: Added Samza Configurations to website

jagadish Wed, 17 Oct 2018 09:48:59 -0700

Repository: samza
Updated Branches:
  refs/heads/master 8b8526682 -> 058776d65



Added Samza Configurations to website

vjagadish
Added `CONFIGURATIONS` under `DOCUMENTATION`
Updated `configuration.md` page to work with new configs

Do we have documentation about `SystemDescriptors` anywhere on the website?
I was thinking to add it in the `configuration.md` page otherwise.

Author: Daniel Chen <dch...@linkedin.com>

Reviewers: Jagadish<jagad...@apache.org>

Closes #723 from dxichen/add-configs-to-website


Project: http://git-wip-us.apache.org/repos/asf/samza/repo
Commit: http://git-wip-us.apache.org/repos/asf/samza/commit/058776d6
Tree: http://git-wip-us.apache.org/repos/asf/samza/tree/058776d6
Diff: http://git-wip-us.apache.org/repos/asf/samza/diff/058776d6

Branch: refs/heads/master
Commit: 058776d65e79a3d4973f748dc5f57ac7ad36d72e
Parents: 8b85266
Author: Daniel Chen <dch...@linkedin.com>
Authored: Wed Oct 17 09:40:39 2018 -0700
Committer: Jagadish <jvenkatra...@linkedin.com>
Committed: Wed Oct 17 09:40:39 2018 -0700

----------------------------------------------------------------------
 docs/learn/documentation/versioned/index.html   |  2 +-
 .../versioned/jobs/configuration.md             | 56 ++++++++++----------
 .../versioned/jobs/samza-configurations.md      |  4 +-
 3 files changed, 30 insertions(+), 32 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/samza/blob/058776d6/docs/learn/documentation/versioned/index.html
----------------------------------------------------------------------
diff --git a/docs/learn/documentation/versioned/index.html 
b/docs/learn/documentation/versioned/index.html
index 94f7e18..50bfd2d 100644
--- a/docs/learn/documentation/versioned/index.html
+++ b/docs/learn/documentation/versioned/index.html
@@ -21,7 +21,7 @@ title: Documentation
 
 <h4><a href="core-concepts/core-concepts.html">CORE CONCEPTS</a></h4>
 <h4><a href="architecture/architecture-overview.html">ARCHITECTURE</a></h4>
-
+<h4><a href="jobs/configuration.html">CONFIGURATIONS</a></h4>
 
 <h4>API</h4>
 

http://git-wip-us.apache.org/repos/asf/samza/blob/058776d6/docs/learn/documentation/versioned/jobs/configuration.md
----------------------------------------------------------------------
diff --git a/docs/learn/documentation/versioned/jobs/configuration.md 
b/docs/learn/documentation/versioned/jobs/configuration.md
index 4aac9bf..aafb870 100644
--- a/docs/learn/documentation/versioned/jobs/configuration.md
+++ b/docs/learn/documentation/versioned/jobs/configuration.md
@@ -19,48 +19,46 @@ title: Configuration
    limitations under the License.
 -->
 
-All Samza jobs have a configuration file that defines the job. A very basic 
configuration file looks like this:
+All Samza applications have a [properties 
format](https://en.wikipedia.org/wiki/.properties) file that defines its 
configurations.
+A complete list of configuration keys can be found on the [__Samza 
Configurations Table__](samza-configurations.html) page. 
+ 
+A very basic configuration file looks like this:
 
 {% highlight jproperties %}
-# Job
-job.factory.class=org.apache.samza.job.local.ThreadJobFactory
-job.name=hello-world
-
-# Task
-task.class=samza.task.example.MyJavaStreamerTask
-task.inputs=example-system.example-stream
-
-# Serializers
+# Application Configurations
+job.factory.class=org.apache.samza.job.local.YarnJobFactory
+app.name=hello-world
+job.default.system=example-system
 serializers.registry.json.class=org.apache.samza.serializers.JsonSerdeFactory
 
serializers.registry.string.class=org.apache.samza.serializers.StringSerdeFactory
 
-# Systems
+# Systems & Streams Configurations
 
systems.example-system.samza.factory=samza.stream.example.ExampleConsumerFactory
 systems.example-system.samza.key.serde=string
 systems.example-system.samza.msg.serde=json
-{% endhighlight %}
 
-There are four major sections to a configuration file:
+# Checkpointing
+task.checkpoint.factory=org.apache.samza.checkpoint.kafka.KafkaCheckpointManagerFactory
 
-1. The job section defines things like the name of the job, and whether to use 
the YarnJobFactory or ProcessJobFactory/ThreadJobFactory (See the 
job.factory.class property in [Configuration Table](configuration-table.html)).
-2. The task section is where you specify the class name for your 
[StreamTask](../api/overview.html). It's also where you define what the [input 
streams](../container/streams.html) are for your task.
-3. The serializers section defines the classes of the 
[serdes](../container/serialization.html) used for serialization and 
deserialization of specific objects that are received and sent along different 
streams.
-4. The system section defines systems that your StreamTask can read from along 
with the types of serdes used for sending keys and messages from that system. 
Usually, you'll define a Kafka system, if you're reading from Kafka, although 
you can also specify your own self-implemented Samza-compatible systems. See 
the [hello-samza example project](/startup/hello-samza/{{site.version}})'s 
Wikipedia system for a good example of a self-implemented system.
+# State Storage
+stores.example-store.factory=org.apache.samza.storage.kv.RocksDbKeyValueStorageEngineFactory
+stores.example-store.key.serde=string
+stores.example-store.value.serde=json
 
-### Required Configuration
-
-Configuration keys that absolutely must be defined for a Samza job are:
+# Metrics
+metrics.reporter.example-reporter.class=org.apache.samza.metrics.reporter.JmxReporterFactory
+metrics.reporters=example-reporter
+{% endhighlight %}
 
-* `job.factory.class`
-* `job.name`
-* `task.class`
-* `task.inputs`
+There are 6 sections sections to a configuration file:
 
-### Configuration Keys
+1. The [__Application__](samza-configurations.html#application-configurations) 
section defines things like the name of the job, job factory (See the 
job.factory.class property in [Configuration 
Table](samza-configurations.html)), the class name for your 
[StreamTask](../api/overview.html) and serialization and deserialization of 
specific objects that are received and sent along different streams.
+2. The [__Systems & Streams__](samza-configurations.html#systems-streams) 
section defines systems that your StreamTask can read from along with the types 
of serdes used for sending keys and messages from that system. You may use any 
of the [predefined systems](../connectors/overview.html) that Samza ships with, 
although you can also specify your own self-implemented Samza-compatible 
systems. See the [hello-samza example 
project](/startup/hello-samza/{{site.version}})'s Wikipedia system for a good 
example of a self-implemented system.
+3. The [__Checkpointing__](samza-configurations.html#checkpointing) section 
defines how the messages processing state is saved, which provides 
fault-tolerant processing of streams (See 
[Checkpointing](../container/checkpointing.html) for more details).
+4. The [__State Storage__](samza-configurations.html#state-storage) section 
defines the [stateful stream processing](../container/state-management.html) 
settings for Samza.
+5. The [__Deployment__](samza-configurations.html#deployment) section defines 
how the Samza application will be deployed (To a cluster manager (YARN), or as 
a standalone library) as well as settings for each option. See [Deployment 
Models](/deployment/deployment-model.html) for more details.
+6. The [__Metrics__](samza-configurations.html#metrics) section defines how 
the Samza application metrics will be monitored and collected. (See 
[Monitoring](../operations/monitoring.html))
 
-A complete list of configuration keys can be found on the [Samza 
Configurations](samza-configurations.html) page.  Note
-that configuration keys prefixed with "sensitive." are treated specially, in 
that the values associated with such keys
+Note that configuration keys prefixed with `sensitive.` are treated specially, 
in that the values associated with such keys
 will be masked in logs and Samza's YARN ApplicationMaster UI.  This is to 
prevent accidental disclosure only; no
 encryption is done.
-
-## [Packaging &raquo;](packaging.html)

http://git-wip-us.apache.org/repos/asf/samza/blob/058776d6/docs/learn/documentation/versioned/jobs/samza-configurations.md
----------------------------------------------------------------------
diff --git a/docs/learn/documentation/versioned/jobs/samza-configurations.md 
b/docs/learn/documentation/versioned/jobs/samza-configurations.md
index ea76210..0928ee2 100644
--- a/docs/learn/documentation/versioned/jobs/samza-configurations.md
+++ b/docs/learn/documentation/versioned/jobs/samza-configurations.md
@@ -57,7 +57,7 @@ These are the basic properties for setting up a Samza 
application.
 |job.host-affinity.enabled|false|This property indicates whether host-affinity 
is enabled or not. Host-affinity refers to the ability of Samza to request and 
allocate a container on the same host every time the job is deployed. When 
host-affinity is enabled, Samza makes a "best-effort" to honor the 
host-affinity constraint. The property 
`cluster-manager.container.request.timeout.ms` determines how long to wait 
before de-prioritizing the host-affinity constraint and assigning the container 
to any available resource.|
 |task.window.ms|-1|If task.class implements 
[WindowableTask](../api/javadocs/org/apache/samza/task/WindowableTask.html), it 
can receive a windowing callback in regular intervals. This property specifies 
the time between window() calls, in milliseconds. If the number is negative 
(the default), window() is never called. A `window()` call will never  occur 
concurrently with the processing of a message. If a message is being processed 
when a window() call is due, the invocation of window happens after processing 
the message. This property is set automatically when using join or window 
operators in a High Level API StreamApplication Note: task.window.ms should be 
set to be much larger than average process or window call duration to avoid 
starving regular processing.|
 |task.log4j.system| |Specify the system name for the StreamAppender. If this 
property is not specified in the config, an exception will be thrown. (See 
[Stream Log4j Appender](logging.html#stream-log4j-appender)) Example: 
task.log4j.system=kafka|
-|serializers.registry.<br>**_serde-name_**.class| |Use this property to 
register a serializer/deserializer, which defines a way of encoding data as an 
array of bytes (used for messages in streams, and for data in persistent 
storage). You can give a serde any serde-name you want, and reference that name 
in properties like systems.\*.samza.key.serde, systems.\*.samza.msg.serde, 
streams.\*.samza.key.serde, streams.\*.samza.msg.serde, stores.\*.key.serde and 
stores.\*.msg.serde. The value of this property is the fully-qualified name of 
a Java class that implements SerdeFactory. Samza ships with the following serde 
implementations, which can be used with their predefined serde name without 
adding them to the registry 
explicitly:<br><br>`org.apache.samza.serializers.ByteSerdeFactory`<br>A no-op 
serde which passes through the undecoded byte array. Its predefined serde-name 
is 
`byte`.<br><br>`org.apache.samza.serializers.ByteBufferSerdeFactory`<br>Encodes 
`java.nio.ByteBuffer` objects. Its 
 predefined serde-name is 
`bytebuffer`.<br><br>`org.apache.samza.serializers.IntegerSerdeFactory`<br>Encodes
 `java.lang.Integer` objects as binary (4 bytes fixed-length big-endian 
encoding). Its predefined serde-name is 
`integer`.<br><br>`org.apache.samza.serializers.StringSerdeFactory`<br>Encodes 
`java.lang.String` objects as UTF-8. Its predefined serde-name is 
`string`.<br><br>`org.apache.samza.serializers.JsonSerdeFactory`<br>Encodes 
nested structures of `java.util.Map`, `java.util.List` etc. as JSON. Note: This 
Serde enforces a dash-separated property naming convention, while JsonSerdeV2 
doesn't. This serde is primarily meant for Samza's internal usage, and is 
publicly available for backwards compatibility. Its predefined serde-name is 
`json`.<br><br>`org.apache.samza.serializers.JsonSerdeV2Factory`<br>Encodes 
nested structures of `java.util.Map`, `java.util.List` etc. as JSON. Note: This 
Serde uses Jackson's default (camelCase) property naming convention. This serde 
should be pr
 eferred over JsonSerde, especially in High Level API, unless the dasherized 
naming convention is required (e.g., for backwards 
compatibility).<br><br>`org.apache.samza.serializers.LongSerdeFactory`<br>Encodes
 `java.lang.Long` as binary (8 bytes fixed-length big-endian encoding). Its 
predefined serde-name is 
`long`.<br><br>`org.apache.samza.serializers.DoubleSerdeFactory`<br>Encodes 
`java.lang.Double` as binary (8 bytes double-precision float point). Its 
predefined serde-name is 
`double`.<br><br>`org.apache.samza.serializers.UUIDSerdeFactory`<br>Encodes 
`java.util.UUID` 
objects.<br><br>`org.apache.samza.serializers.SerializableSerdeFactory`<br>Encodes
 `java.io.Serializable` objects. Its predefined serde-name is 
`serializable`.<br><br>`org.apache.samza.serializers.MetricsSnapshotSerdeFactory`<br>Encodes
 `org.apache.samza.metrics.reporter.MetricsSnapshot` objects (which are used 
for reporting metrics) as 
JSON.<br><br>`org.apache.samza.serializers.KafkaSerdeFactory`<br>Adapter which 
all
 ows existing `kafka.serializer.Encoder` and `kafka.serializer.Decoder` 
implementations to be used as Samza serdes. Set 
`serializers.registry.serde-name.encoder` and  
`serializers.registry.serde-name.decoder` to the appropriate class names.|
+|serializers.registry.<br>**_serde-name_**.class| |Use this property to 
register a serializer/deserializer, which defines a way of encoding data as an 
array of bytes (used for messages in streams, and for data in persistent 
storage). You can give a serde any serde-name you want, and reference that name 
in properties like systems.\*.samza.key.serde, systems.\*.samza.msg.serde, 
streams.\*.samza.key.serde, streams.\*.samza.msg.serde, stores.\*.key.serde and 
stores.\*.msg.serde. The value of this property is the fully-qualified name of 
a Java class that implements SerdeFactory. Samza ships with the following serde 
implementations:<br><br>`org.apache.samza.serializers.ByteSerdeFactory`<br>A 
no-op serde which passes through the undecoded byte array. 
<br><br>`org.apache.samza.serializers.ByteBufferSerdeFactory`<br>Encodes 
`java.nio.ByteBuffer` objects. 
<br><br>`org.apache.samza.serializers.IntegerSerdeFactory`<br>Encodes 
`java.lang.Integer` objects as binary (4 bytes fixed-length big-endia
 n 
encoding).<br><br>`org.apache.samza.serializers.StringSerdeFactory`<br>Encodes 
`java.lang.String` objects as UTF-8. 
<br><br>`org.apache.samza.serializers.JsonSerdeFactory`<br>Encodes nested 
structures of `java.util.Map`, `java.util.List` etc. as JSON. Note: This Serde 
enforces a dash-separated property naming convention, while JsonSerdeV2 
doesn't. This serde is primarily meant for Samza's internal usage, and is 
publicly available for backwards 
compatibility.<br><br>`org.apache.samza.serializers.JsonSerdeV2Factory`<br>Encodes
 nested structures of `java.util.Map`, `java.util.List` etc. as JSON. Note: 
This Serde uses Jackson's default (camelCase) property naming convention. This 
serde should be preferred over JsonSerde, especially in High Level API, unless 
the dasherized naming convention is required (e.g., for backwards 
compatibility).<br><br>`org.apache.samza.serializers.LongSerdeFactory`<br>Encodes
 `java.lang.Long` as binary (8 bytes fixed-length big-endian 
encoding).<br><br>`org.
 apache.samza.serializers.DoubleSerdeFactory`<br>Encodes `java.lang.Double` as 
binary (8 bytes double-precision float point). 
<br><br>`org.apache.samza.serializers.UUIDSerdeFactory`<br>Encodes 
`java.util.UUID` 
objects.<br><br>`org.apache.samza.serializers.SerializableSerdeFactory`<br>Encodes
 `java.io.Serializable` 
objects.<br><br>`org.apache.samza.serializers.MetricsSnapshotSerdeFactory`<br>Encodes
 `org.apache.samza.metrics.reporter.MetricsSnapshot` objects (which are used 
for reporting metrics) as 
JSON.<br><br>`org.apache.samza.serializers.KafkaSerdeFactory`<br>Adapter which 
allows existing `kafka.serializer.Encoder` and `kafka.serializer.Decoder` 
implementations to be used as Samza serdes. Set 
`serializers.registry.serde-name.encoder` and  
`serializers.registry.serde-name.decoder` to the appropriate class names.|
 
 #### <a name="advanced-application-configurations"></a> [1.1 Advanced 
Application Configurations](#advanced-application-configurations)
 
@@ -279,7 +279,7 @@ Samza supports both standalone and clustered 
([YARN](yarn-jobs.html)) [deploymen
 #### <a name="yarn-cluster-deployment"></a>[5.1 YARN Cluster 
Deployment](#yarn-cluster-deployment)
 |Name|Default|Description|
 |--- |--- |--- |
-|yarn.package.path| |Required for YARN jobs: The URL from which the job 
package can be downloaded, for example a http:// or hdfs:// URL. The job 
package is a .tar.gz file with a specific directory structure.|
+|yarn.package.path| |__Required for YARN jobs:__ The URL from which the job 
package can be downloaded, for example a http:// or hdfs:// URL. The job 
package is a .tar.gz file with a specific directory structure.|
 |job.container.count|1|The number of YARN containers to request for running 
your job. This is the main parameter for controlling the scale (allocated 
computing resources) of your job: to increase the parallelism of processing, 
you need to increase the number of containers. The minimum is one container, 
and the maximum number of containers is the number of task instances (usually 
the number of input stream partitions). Task instances are evenly distributed 
across the number of containers that you specify.|
 |cluster-manager.container.memory.mb|1024|How much memory, in megabytes, to 
request from the cluster manager per container of your job. Along with 
cluster-manager.container.cpu.cores, this property determines how many 
containers the cluster manager will run on one machine. If the container 
exceeds this limit, it will be killed, so it is important that the container's 
actual memory use remains below the limit. The amount of memory used is 
normally the JVM heap size (configured with task.opts), plus the size of any 
off-heap memory allocation (for example stores.*.container.cache.size.bytes), 
plus a safety margin to allow for JVM overheads.|
 |cluster-manager.container.cpu.cores|1|The number of CPU cores to request per 
container of your job. Each node in the cluster has a certain number of CPU 
cores available, so this number (along with 
cluster-manager.container.memory.mb) determines how many containers can be run 
on one machine.|

samza git commit: Added Samza Configurations to website

Reply via email to