This is an automated email from the ASF dual-hosted git repository. sijie pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/incubator-pulsar.git
The following commit(s) were added to refs/heads/master by this push: new a0623d5 Pulsar Functions documentation (#1350) a0623d5 is described below commit a0623d57f0ac680c589197d3a973c4760d3493ae Author: Luc Perkins <lucperk...@gmail.com> AuthorDate: Tue Mar 6 22:51:55 2018 -0800 Pulsar Functions documentation (#1350) The first cut of pulsar-functions documentation. --- site/_data/popovers.yaml | 3 + site/_data/pulsar-functions.yaml | 85 ++++++++++++++++++++ site/docs/latest/functions/api.md | 76 ++++++++++++++++++ site/docs/latest/functions/deployment.md | 9 +++ site/docs/latest/functions/guarantees.md | 28 +++++++ site/docs/latest/functions/metrics-and-stats.md | 19 +++++ site/docs/latest/functions/overview.md | 91 ++++++++++++++++++++++ site/docs/latest/functions/quickstart.md | 29 +++++++ .../getting-started/ConceptsAndArchitecture.md | 4 + 9 files changed, 344 insertions(+) diff --git a/site/_data/popovers.yaml b/site/_data/popovers.yaml index 3db11d2..f001970 100644 --- a/site/_data/popovers.yaml +++ b/site/_data/popovers.yaml @@ -88,6 +88,9 @@ pub-sub: pulsar: q: What is Pulsar? def: Pulsar is a distributed messaging system originally created by Yahoo but now under the stewardship of the Apache Software Foundation. +pulsar-functions: + q: What are Pulsar Functions? + def: Pulsar Functions are lightweight functions that can consume messages from Pulsar topics, apply custom processing logic, and, if desired, publish results to topics. retention-policy: q: What is a retention policy? def: Size and/or time limits that you can set on a namespace to configure retention of messages that have already been acknowledged. diff --git a/site/_data/pulsar-functions.yaml b/site/_data/pulsar-functions.yaml new file mode 100644 index 0000000..bea3166 --- /dev/null +++ b/site/_data/pulsar-functions.yaml @@ -0,0 +1,85 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# + +description: | + A tool for deploying and managing Pulsar Functions. +example: | + pulsar-functions localrun \ + --function-config my-function.yaml +commands: +- name: localrun + description: Runs a Pulsar Function +- name: create + description: Creates a new Pulsar Function +- name: delete + description: Deletes an existing Pulsar Function +- name: update + description: Updates an existing Pulsar Function +- name: get + description: Returns information about an existing Pulsar Function +- name: list + description: Lists all currently existing Pulsar Functions + options: + - flags: --namespace + description: The namespace of the Pulsar Functions you'd like to list + - flags: --tenant + description: The tenant of the Pulsar Functions you'd like to list (you must also specify a namespace using the `--namespace` flag) +- name: getstatus + description: Checks on the status of the specified Pulsar Function + options: + - flags: --namespace + description: The name of the Pulsar Function whose status you'd like to check + - flags: --tenant + description: The tenant of the Pulsar Function whose status you'd like to check + - flags: --tenant +- name: querystate + description: Displays the current state of the specified Pulsar Function, by key + options: + - flags: -k, --key + description: The key for the desired value + - flags: --name + description: The name of the Pulsar Function whose current state you'd like to query + - flags: --namespace + description: The namespace of the Pulsar Function whose current state you'd like to query + - flags: -u, --storage-service-url + description: The URL of the storage service + - flags: --tenant + description: The tenant of the Pulsar Function whose current state you'd like to query + - flags: -w, --watch + description: If set, watch for changes in the current state of the specified Pulsar Function (by the key set using `-k`/`--key`) + default: 'false' +options: + - flags: --name + description: The name of the Pulsar Function + - flags: --function-classname + description: The Java class name of the Pulsar Function + - flags: --function-classpath + description: The Java classpath of the Pulsar Function + - flags: --source-topic + description: The topic from which the Pulsar Function consumes its input + - flags: --sink-topic + description: The topic to which the Pulsar Function publishes its output (if any) + - flags: --input-serde-classname + description: Input SerDe + default: org.apache.pulsar.functions.runtime.serde.Utf8StringSerDe + - flags: --output-serde-classname + description: Output SerDe + default: org.apache.pulsar.functions.runtime.serde.Utf8StringSerDe + - flags: --function-config + description: The path for the Pulsar Function's YAML configuration file \ No newline at end of file diff --git a/site/docs/latest/functions/api.md b/site/docs/latest/functions/api.md new file mode 100644 index 0000000..ba71c86 --- /dev/null +++ b/site/docs/latest/functions/api.md @@ -0,0 +1,76 @@ +--- +title: The Pulsar Functions API +--- + +## Java + +Java API example: + +```java +import java.util.Function; +public class ExclamationFunction implements Function<String, String> { + @Override + public String apply(String input) { return String.format("%s!", input); } +} +``` + +### With context + +```java +public interface PulsarFunction<I, O> { + O process(I input, Context context) throws Exception; +} +``` + +Context interface: + +```java +public interface Context { + byte[] getMessageId(); + String getTopicName(); + Collection<String> getSourceTopics(); + String getSinkTopic(); + String getOutputSerdeClassName(); + String getTenant(); + String getNamespace(); + String getFunctionName(); + String getFunctionId(); + String getInstanceId(); + String getFunctionVersion(); + Logger getLogger(); + void incrCounter(String key, long amount); + String getUserConfigValue(String key); + void recordMetric(String metricName, double value); + <O> CompletableFuture<Void> publish(String topicName, O object, String serDeClassName); + <O> CompletableFuture<Void> publish(String topicName, O object); + CompletableFuture<Void> ack(byte[] messageId, String topic); +} +``` + +### SerDe + +> Serde stands for **Ser**ialization and **De**serialization. + +Built-in vs. custom. For custom, you need to implement this interface: + +```java +public interface SerDe<T> { + T deserialize(byte[] input); + byte[] serialize(T input); +} +``` + +The built-in is the `org.apache.pulsar.functions.api.DefaultSerDe` class: + +```java + +``` + +The built-in should work fine for basic Java types. For more advanced types, + + +## Python + +```python +def process(input): +``` \ No newline at end of file diff --git a/site/docs/latest/functions/deployment.md b/site/docs/latest/functions/deployment.md new file mode 100644 index 0000000..ea2d79a --- /dev/null +++ b/site/docs/latest/functions/deployment.md @@ -0,0 +1,9 @@ +--- +title: Deploying Pulsar Functions +--- + +At the moment, Pulsar Functions are deployed + +## State storage + +By default, Pulsar uses [Apache BookKeeper](https://bookkeeper.apache.org). \ No newline at end of file diff --git a/site/docs/latest/functions/guarantees.md b/site/docs/latest/functions/guarantees.md new file mode 100644 index 0000000..3efd549 --- /dev/null +++ b/site/docs/latest/functions/guarantees.md @@ -0,0 +1,28 @@ +--- +title: Processing guarantees +lead: Apply at-most-once, at-least-once, or effectively-once delivery semantics to Pulsar Functions +--- + +Pulsar Functions provides three different messaging semantics that you can apply to any Function: + +* **At-most-once** delivery +* **At-least-once** delivery +* **Effectively-once** delivery + +## How it works + +You can set the processing guarantees for a Pulsar Function when you create the Function. This [`pulsar-function create`](../../reference/CliTools#pulsar-functions-create) command, for example, would apply effectively-once guarantees to the Function: + +```bash +$ bin/pulsar-functions \ + # TODO + --processingGuarantees EFFECTIVELY_ONCE +``` + +The available options are: + +* `ATMOST_ONCE` +* `ATLEAST_ONCE` +* `EFFECTIVELY_ONCE` + +{% include admonition.html type='info' content='By default, Pulsar Functions provide at-most-once delivery guarantees. If you create a function without supplying a value for the `--processingGuarantees`flag, then the Function will provide only at-most-once guarantees.' %} \ No newline at end of file diff --git a/site/docs/latest/functions/metrics-and-stats.md b/site/docs/latest/functions/metrics-and-stats.md new file mode 100644 index 0000000..90f3306 --- /dev/null +++ b/site/docs/latest/functions/metrics-and-stats.md @@ -0,0 +1,19 @@ +--- +title: Metrics and stats for Pulsar Functions +--- + +Pulsar Functions can publish arbitrary metrics to the metrics interface (which can then be queried). + +## Java API + +To publish a metric to the metrics interface: + +```java +void recordMetric(String metricName, double value); +``` + +Here's an example: + +```java +Context.recordMetric("my-custom-metrics", 475); +``` \ No newline at end of file diff --git a/site/docs/latest/functions/overview.md b/site/docs/latest/functions/overview.md new file mode 100644 index 0000000..6dff8d4 --- /dev/null +++ b/site/docs/latest/functions/overview.md @@ -0,0 +1,91 @@ +--- +title: Pulsar Functions overview +lead: A bird's-eye look at Pulsar's lightweight, developer-friendly compute platform +--- + + +**Pulsar Functions** are lightweight compute processes that + +* consume {% popover messages %} from one or more Pulsar {% popover topics %}, +* apply a user-supplied processing logic to each message, +* publish the results of the computation to another topic + +Here's an example Pulsar Function for Java: + +```java +import java.util.Function; + +public class ExclamationFunction implements Function<String, String> { + @Override + public String apply(String input) { return String.format("%s!", input); } +} +``` + +Functions are executed each time a message is published to the input topic. If a function is listening on the topic `tweet-stream`, for example, then the function would be run each time a message. + +> Pulsar features automatic message deduplication + +### Goals + +Core goal: make Pulsar do real heavy lifting without needing to deploy a neighboring system (Storm, Heron, Flink, etc.). Ready-made compute infrastructure at your disposal. + +* Developer productivity (easy troubleshooting and deployment) + * "Serverless" philosophy +* No need for a separate SPE + +### Inspirations + +* AWS Lambda, Google Cloud Functions, etc. +* FaaS +* Serverless/NoOps philosophy + +### Command-line interface + +You can manage Pulsar Functions using the [`pulsar-functions`](../../reference/CliTools#pulsar-functions) CLI tool. Here's an example command that would + +```bash +$ bin/pulsar-functions localrun \ + --inputs persistent://sample/standalone/ns1/test_src \ + --output persistent://sample/standalone/ns1/test_result \ + --jar examples/api-examples.jar \ + --className org.apache.pulsar.functions.api.examples.ExclamationFunction +``` + +### Supported languages + +Pulsar Functions can currently be written in [Java](../../functions/api#java) and [Python](../../functions/api#python). Support for additional languages is coming soon. + +### Runtime + +### Deployment modes + +* Local run +* Cluster run + +### Delivery semantics + +* At most once +* At least once +* Effectively once + +### State storage + +### Metrics + +Here's an example function that publishes a value of 1 to the `my-metric` metric. + +```java +public class MetricsFunction implements PulsarFunction<String, Void> { + @Override + public Void process(String input, Context context) { + context.recordMetric("my-metric", 1); + return null; + } +} +``` + +### Logging + +### Data types + +* Strongly typed diff --git a/site/docs/latest/functions/quickstart.md b/site/docs/latest/functions/quickstart.md new file mode 100644 index 0000000..8c04c6b --- /dev/null +++ b/site/docs/latest/functions/quickstart.md @@ -0,0 +1,29 @@ +--- +title: Getting started with Pulsar Functions +--- + +## The `pulsar-functions` CLI tool + +[`pulsar-functions`](../../reference/CliTools#pulsar-functions) + +```bash +$ alias pulsar-functions='/path/to/pulsar/bin/pulsar-functions' +``` + +## Querying state + +```bash +$ bin/pulsar-functions querystate \ + --tenant sample \ + --namespace my-functions \ + --function-name my-function \ + --key "some-key" +``` + +## Running functions locally + +[`localrun`](../../reference/CliTools#pulsar-functions-localrun) + +```bash +$ bin/pulsar-functions localrun +``` \ No newline at end of file diff --git a/site/docs/latest/getting-started/ConceptsAndArchitecture.md b/site/docs/latest/getting-started/ConceptsAndArchitecture.md index 045af3f..7989344 100644 --- a/site/docs/latest/getting-started/ConceptsAndArchitecture.md +++ b/site/docs/latest/getting-started/ConceptsAndArchitecture.md @@ -288,6 +288,10 @@ With message retention, shown at the top, a <span style="color: #89b557;">retent With message expiry, shown at the bottom, some messages are <span style="color: #bb3b3e;">deleted</span>, even though they <span style="color: #337db6;">haven't been acknowledged</span>, because they've expired according to the <span style="color: #e39441;">TTL applied to the namespace</span> (for example because a TTL of 5 minutes has been applied and the messages haven't been acknowledged but are 10 minutes old). +## Pulsar Functions + +For an in-depth look at Pulsar Functions, see the [Pulsar Functions overview](../../functions/overview). + ## Replication Pulsar enables messages to be produced and consumed in different geo-locations. For instance, your application may be publishing data in one region or market and you would like to process it for consumption in other regions or markets. [Geo-replication](../../admin/GeoReplication) in Pulsar enables you to do that. -- To stop receiving notification emails like this one, please contact si...@apache.org.