This is an automated email from the ASF dual-hosted git repository. git-site-role pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/beam.git
The following commit(s) were added to refs/heads/asf-site by this push: new d5904c3 Publishing website 2021/01/08 00:03:03 at commit 74ec609 d5904c3 is described below commit d5904c342a74285e79461ffeb8fd7b4b4ca59302 Author: jenkins <bui...@apache.org> AuthorDate: Fri Jan 8 00:03:03 2021 +0000 Publishing website 2021/01/08 00:03:03 at commit 74ec609 --- .../documentation/runners/direct/index.html | 64 +---- .../documentation/runners/flink/index.html | 2 +- .../get-started/beam-overview/index.html | 3 +- .../get-started/downloads/index.html | 2 +- .../get-started/from-spark/index.html | 90 +++++++ website/generated-content/get-started/index.html | 2 +- website/generated-content/get-started/index.xml | 293 +++++++++++++++++++++ .../get-started/mobile-gaming-example/index.html | 2 +- .../get-started/quickstart-go/index.html | 2 +- .../get-started/quickstart-java/index.html | 2 +- .../get-started/quickstart-py/index.html | 2 +- .../get-started/try-apache-beam/index.html | 2 +- .../get-started/wordcount-example/index.html | 2 +- .../security/cve-2020-1929/index.html | 2 +- website/generated-content/security/index.html | 2 +- website/generated-content/sitemap.xml | 2 +- 16 files changed, 403 insertions(+), 71 deletions(-) diff --git a/website/generated-content/documentation/runners/direct/index.html b/website/generated-content/documentation/runners/direct/index.html index 2d38163..61ecb97 100644 --- a/website/generated-content/documentation/runners/direct/index.html +++ b/website/generated-content/documentation/runners/direct/index.html @@ -1,70 +1,18 @@ <!doctype html><html lang=en class=no-js><head><meta charset=utf-8><meta http-equiv=x-ua-compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1"><title>Direct Runner</title><meta name=description content="Apache Beam is an open source, unified model and set of language-specific SDKs for defining and executing data processing workflows, and also data ingestion and integration flows, supporting Enterprise Integration Patterns (EIPs) and Domain Specific [...] <span class=sr-only>Toggle navigation</span> <span class=icon-bar></span><span class=icon-bar></span><span class=icon-bar></span></button> -<a href=/ class=navbar-brand><img alt=Brand style=height:25px src=/images/beam_logo_navbar.png></a></div><div class="navbar-mask closed"></div><div id=navbar class="navbar-container closed"><ul class="nav navbar-nav"><li><a href=/get-started/beam-overview/>Get Started</a></li><li><a href=/documentation/>Documentation</a></li><li><a href=/documentation/sdks/java/>Languages</a></li><li><a href=/documentation/runners/capability-matrix/>RUNNERS</a></li><li><a href=/roadmap/>Roadmap</a></li>< [...] +<a href=/ class=navbar-brand><img alt=Brand style=height:25px src=/images/beam_logo_navbar.png></a></div><div class="navbar-mask closed"></div><div id=navbar class="navbar-container closed"><ul class="nav navbar-nav"><li><a href=/get-started/beam-overview/>Get Started</a></li><li><a href=/documentation/>Documentation</a></li><li><a href=/documentation/sdks/java/>Languages</a></li><li><a href=/documentation/runners/capability-matrix/>RUNNERS</a></li><li><a href=/roadmap/>Roadmap</a></li>< [...] <span class=o><</span><span class=n>groupId</span><span class=o>></span><span class=n>org</span><span class=o>.</span><span class=na>apache</span><span class=o>.</span><span class=na>beam</span><span class=o></</span><span class=n>groupId</span><span class=o>></span> <span class=o><</span><span class=n>artifactId</span><span class=o>></span><span class=n>beam</span><span class=o>-</span><span class=n>runners</span><span class=o>-</span><span class=n>direct</span><span class=o>-</span><span class=n>java</span><span class=o></</span><span class=n>artifactId</span><span class=o>></span> <span class=o><</span><span class=n>version</span><span class=o>></span><span class=n>2</span><span class=o>.</span><span class=na>26</span><span class=o>.</span><span class=na>0</span><span class=o></</span><span class=n>version</span><span class=o>></span> <span class=o><</span><span class=n>scope</span><span class=o>></span><span class=n>runtime</span><span class=o></</span><span class=n>scope</span><span class=o>></span> -<span class=o></</span><span class=n>dependency</span><span class=o>></span></code></pre></div></div></p><p><span class=language-py>This section is not applicable to the Beam SDK for Python.</span></p><h2 id=pipeline-options-for-the-direct-runner>Pipeline options for the Direct Runner</h2><p>When executing your pipeline from the command-line, set <code>runner</code> to <code>direct</code> or <code>DirectRunner</code>. The default values for the other pipeline options are generally [...] +<span class=o></</span><span class=n>dependency</span><span class=o>></span></code></pre></div></div></p><p><span class=language-py>This section is not applicable to the Beam SDK for Python.</span></p><h2 id=pipeline-options-for-the-direct-runner>Pipeline options for the Direct Runner</h2><p>For general instructions on how to set pipeline options, see the <a href=/documentation/programming-guide/#configuring-pipeline-options>programming guide</a>.</p><p>When executing your pipeline [...] <span class=language-java><a href=https://beam.apache.org/releases/javadoc/2.26.0/index.html?org/apache/beam/runners/direct/DirectOptions.html><code>DirectOptions</code></a></span> <span class=language-py><a href=https://beam.apache.org/releases/pydoc/2.26.0/apache_beam.options.pipeline_options.html#apache_beam.options.pipeline_options.DirectOptions><code>DirectOptions</code></a></span> -interface for defaults and additional pipeline configuration options.</p><h2 id=additional-information-and-caveats>Additional information and caveats</h2><h3 id=memory-considerations>Memory considerations</h3><p>Local execution is limited by the memory available in your local environment. It is highly recommended that you run your pipeline with data sets small enough to fit in local memory. You can create a small in-memory data set using a <span class=language-java><a href=https://beam.a [...] -Python <a href=https://beam.apache.org/contribute/runner-guide/#the-fn-api>FnApiRunner</a> supports multi-threading and multi-processing mode.</p><p>{:.language-py} -<strong>Setting parallelism</strong></p><p>{:.language-py} -Number of threads or subprocesses is defined by setting the <code>direct_num_workers</code> option. -From 2.22.0, <code>direct_num_workers = 0</code> is supported. When <code>direct_num_workers</code> is set to 0, it will set the number of threads/subprocess to the number of cores of the machine where the pipeline is running.</p><p>{:.language-py}</p><ul><li>There are several ways to set this option.</li></ul><div class=highlight><pre class=chroma><code class=language-py data-lang=py><span class=n>python</span> <span class=n>wordcount</span><span class=o>.</span><span class=n>py</span> [...] -</code></pre></div><p>{:.language-py}</p><ul><li>Setting with <code>PipelineOptions</code>.</li></ul><div class=highlight><pre class=chroma><code class=language-py data-lang=py><span class=kn>from</span> <span class=nn>apache_beam.options.pipeline_options</span> <span class=kn>import</span> <span class=n>PipelineOptions</span> -<span class=n>pipeline_options</span> <span class=o>=</span> <span class=n>PipelineOptions</span><span class=p>([</span><span class=s1>'--direct_num_workers'</span><span class=p>,</span> <span class=s1>'2'</span><span class=p>])</span> -</code></pre></div><p>{:.language-py}</p><ul><li>Adding to existing <code>PipelineOptions</code>.</li></ul><div class=highlight><pre class=chroma><code class=language-py data-lang=py><span class=kn>from</span> <span class=nn>apache_beam.options.pipeline_options</span> <span class=kn>import</span> <span class=n>DirectOptions</span> -<span class=n>pipeline_options</span> <span class=o>=</span> <span class=n>PipelineOptions</span><span class=p>(</span><span class=n>xxx</span><span class=p>)</span> -<span class=n>pipeline_options</span><span class=o>.</span><span class=n>view_as</span><span class=p>(</span><span class=n>DirectOptions</span><span class=p>)</span><span class=o>.</span><span class=n>direct_num_workers</span> <span class=o>=</span> <span class=mi>2</span> -</code></pre></div><p>{:.language-py} -<strong>Setting running mode</strong></p><p>{:.language-py} -From 2.19, a new option was added to set running mode. We can use <code>direct_running_mode</code> option to set the running mode. -<code>direct_running_mode</code> can be one of [<code>'in_memory'</code>, <code>'multi_threading'</code>, <code>'multi_processing'</code>].</p><p>{:.language-py} -<b>in_memory</b>: Runner and workers’ communication happens in memory (not through gRPC). This is a default mode.</p><p>{:.language-py} -<b>multi_threading</b>: Runner and workers communicate through gRPC and each worker runs in a thread.</p><p>{:.language-py} -<b>multi_processing</b>: Runner and workers communicate through gRPC and each worker runs in a subprocess.</p><p>{:.language-py} -Same as other options, <code>direct_running_mode</code> can be passed through CLI or set with <code>PipelineOptions</code>.</p><p>{:.language-py} -For the versions before 2.19.0, the running mode should be set with <code>FnApiRunner()</code>. Please refer following examples.</p><p>{:.language-py}</p><h4 id=running-with-multi-threading-mode>Running with multi-threading mode</h4><div class=highlight><pre class=chroma><code class=language-py data-lang=py><span class=kn>import</span> <span class=nn>argparse</span> - -<span class=kn>import</span> <span class=nn>apache_beam</span> <span class=kn>as</span> <span class=nn>beam</span> -<span class=kn>from</span> <span class=nn>apache_beam.options.pipeline_options</span> <span class=kn>import</span> <span class=n>PipelineOptions</span> -<span class=kn>from</span> <span class=nn>apache_beam.runners.portability</span> <span class=kn>import</span> <span class=n>fn_api_runner</span> -<span class=kn>from</span> <span class=nn>apache_beam.portability.api</span> <span class=kn>import</span> <span class=n>beam_runner_api_pb2</span> -<span class=kn>from</span> <span class=nn>apache_beam.portability</span> <span class=kn>import</span> <span class=n>python_urns</span> - -<span class=n>parser</span> <span class=o>=</span> <span class=n>argparse</span><span class=o>.</span><span class=n>ArgumentParser</span><span class=p>()</span> -<span class=n>parser</span><span class=o>.</span><span class=n>add_argument</span><span class=p>(</span><span class=o>...</span><span class=p>)</span> -<span class=n>known_args</span><span class=p>,</span> <span class=n>pipeline_args</span> <span class=o>=</span> <span class=n>parser</span><span class=o>.</span><span class=n>parse_known_args</span><span class=p>(</span><span class=n>argv</span><span class=p>)</span> -<span class=n>pipeline_options</span> <span class=o>=</span> <span class=n>PipelineOptions</span><span class=p>(</span><span class=n>pipeline_args</span><span class=p>)</span> - -<span class=n>p</span> <span class=o>=</span> <span class=n>beam</span><span class=o>.</span><span class=n>Pipeline</span><span class=p>(</span><span class=n>options</span><span class=o>=</span><span class=n>pipeline_options</span><span class=p>,</span> - <span class=n>runner</span><span class=o>=</span><span class=n>fn_api_runner</span><span class=o>.</span><span class=n>FnApiRunner</span><span class=p>(</span> - <span class=n>default_environment</span><span class=o>=</span><span class=n>beam_runner_api_pb2</span><span class=o>.</span><span class=n>Environment</span><span class=p>(</span> - <span class=n>urn</span><span class=o>=</span><span class=n>python_urns</span><span class=o>.</span><span class=n>EMBEDDED_PYTHON_GRPC</span><span class=p>)))</span> -</code></pre></div><p>{:.language-py}</p><h4 id=running-with-multi-processing-mode>Running with multi-processing mode</h4><div class=highlight><pre class=chroma><code class=language-py data-lang=py><span class=kn>import</span> <span class=nn>argparse</span> -<span class=kn>import</span> <span class=nn>sys</span> - -<span class=kn>import</span> <span class=nn>apache_beam</span> <span class=kn>as</span> <span class=nn>beam</span> -<span class=kn>from</span> <span class=nn>apache_beam.options.pipeline_options</span> <span class=kn>import</span> <span class=n>PipelineOptions</span> -<span class=kn>from</span> <span class=nn>apache_beam.runners.portability</span> <span class=kn>import</span> <span class=n>fn_api_runner</span> -<span class=kn>from</span> <span class=nn>apache_beam.portability.api</span> <span class=kn>import</span> <span class=n>beam_runner_api_pb2</span> -<span class=kn>from</span> <span class=nn>apache_beam.portability</span> <span class=kn>import</span> <span class=n>python_urns</span> - -<span class=n>parser</span> <span class=o>=</span> <span class=n>argparse</span><span class=o>.</span><span class=n>ArgumentParser</span><span class=p>()</span> -<span class=n>parser</span><span class=o>.</span><span class=n>add_argument</span><span class=p>(</span><span class=o>...</span><span class=p>)</span> -<span class=n>known_args</span><span class=p>,</span> <span class=n>pipeline_args</span> <span class=o>=</span> <span class=n>parser</span><span class=o>.</span><span class=n>parse_known_args</span><span class=p>(</span><span class=n>argv</span><span class=p>)</span> -<span class=n>pipeline_options</span> <span class=o>=</span> <span class=n>PipelineOptions</span><span class=p>(</span><span class=n>pipeline_args</span><span class=p>)</span> - -<span class=n>p</span> <span class=o>=</span> <span class=n>beam</span><span class=o>.</span><span class=n>Pipeline</span><span class=p>(</span><span class=n>options</span><span class=o>=</span><span class=n>pipeline_options</span><span class=p>,</span> - <span class=n>runner</span><span class=o>=</span><span class=n>fn_api_runner</span><span class=o>.</span><span class=n>FnApiRunner</span><span class=p>(</span> - <span class=n>default_environment</span><span class=o>=</span><span class=n>beam_runner_api_pb2</span><span class=o>.</span><span class=n>Environment</span><span class=p>(</span> - <span class=n>urn</span><span class=o>=</span><span class=n>python_urns</span><span class=o>.</span><span class=n>SUBPROCESS_SDK</span><span class=p>,</span> - <span class=n>payload</span><span class=o>=</span><span class=sa>b</span><span class=s1>'</span><span class=si>%s</span><span class=s1> -m apache_beam.runners.worker.sdk_worker_main'</span> - <span class=o>%</span> <span class=n>sys</span><span class=o>.</span><span class=n>executable</span><span class=o>.</span><span class=n>encode</span><span class=p>(</span><span class=s1>'ascii'</span><span class=p>))))</span> -</code></pre></div></div></div><footer class=footer><div class=footer__contained><div class=footer__cols><div class=footer__cols__col><div class=footer__cols__col__logo><img src=/images/beam_logo_circle.svg class=footer__logo alt="Beam logo"></div><div class=footer__cols__col__logo><img src=/images/apache_logo_circle.svg class=footer__logo alt="Apache logo"></div></div><div class="footer__cols__col footer__cols__col--md"><div class=footer__cols__col__title>Start</div><div class=footer__c [...] +interface for defaults and additional pipeline configuration options.</p><h2 id=additional-information-and-caveats>Additional information and caveats</h2><h3 id=memory-considerations>Memory considerations</h3><p>Local execution is limited by the memory available in your local environment. It is highly recommended that you run your pipeline with data sets small enough to fit in local memory. You can create a small in-memory data set using a <span class=language-java><a href=https://beam.a [...] +By default, <code>targetParallelism</code> is the greater of the number of available processors and 3.</p><p class=language-py>Number of threads or subprocesses is defined by setting the <code>direct_num_workers</code> pipeline option. +From 2.22.0, <code>direct_num_workers = 0</code> is supported. When <code>direct_num_workers</code> is set to 0, it will set the number of threads/subprocess to the number of cores of the machine where the pipeline is running.</p><p class=language-py><strong>Setting running mode</strong></p><p class=language-py>In Beam 2.19.0 and newer, you can use the <code>direct_running_mode</code> pipeline option to set the running mode. +<code>direct_running_mode</code> can be one of [<code>'in_memory'</code>, <code>'multi_threading'</code>, <code>'multi_processing'</code>].</p><p class=language-py><b>in_memory</b>: Runner and workers’ communication happens in memory (not through gRPC). This is a default mode.</p><p class=language-py><b>multi_threading</b>: Runner and workers communicate through gRPC and each worker runs in a thread.</p><p class=language-py><b>multi_processing</b>: Runner and workers communicate th [...] <a href=http://www.apache.org>The Apache Software Foundation</a> | <a href=/privacy_policy>Privacy Policy</a> | <a href=/feed.xml>RSS Feed</a><br><br>Apache Beam, Apache, Beam, the Beam logo, and the Apache feather logo are either registered trademarks or trademarks of The Apache Software Foundation. All other products or name brands are trademarks of their respective holders, including The Apache Software Foundation.</div></footer></body></html> \ No newline at end of file diff --git a/website/generated-content/documentation/runners/flink/index.html b/website/generated-content/documentation/runners/flink/index.html index 4657c23..a478c40 100644 --- a/website/generated-content/documentation/runners/flink/index.html +++ b/website/generated-content/documentation/runners/flink/index.html @@ -92,7 +92,7 @@ The minor version is the first two numbers in the version string, e.g. in <code> minor version is <code>1.8</code>.</p><p>We try to track the latest version of Apache Flink at the time of the Beam release. A Flink version is supported by Beam for the time it is supported by the Flink community. The Flink community supports the last two minor versions. When support for a Flink version is dropped, it may be deprecated and removed also from Beam. -To find out which version of Flink is compatible with Beam please see the table below:</p><table class="table table-bordered"><tr><th>Beam Version</th><th>Flink Version</th><th>Artifact Id</th></tr><tr><td rowspan=3>≥ 2.21.0</td><td>1.10.x</td><td>beam-runners-flink-1.10</td></tr><tr><td>1.9.x</td><td>beam-runners-flink-1.9</td></tr><tr><td>1.8.x</td><td>beam-runners-flink-1.8</td></tr><tr><td rowspan=3>2.17.0 - 2.20.0</td><td>1.9.x</td><td>beam-runners-flink-1.9</td></tr><tr><td>1.8. [...] +To find out which version of Flink is compatible with Beam please see the table below:</p><table class="table table-bordered"><tr><th>Beam Version</th><th>Flink Version</th><th>Artifact Id</th></tr><tr><td rowspan=5>≥ 2.27.0</td><td>1.12.x <sup>*</sup></td><td>beam-runners-flink-1.12</td></tr><tr><td>1.11.x <sup>*</sup></td><td>beam-runners-flink-1.11</td></tr><tr><td>1.10.x</td><td>beam-runners-flink-1.10</td></tr><tr><td>1.9.x</td><td>beam-runners-flink-1.9</td></tr><tr><td>1.8.x</t [...] capabilities of the classic Flink Runner.</p><p>The <a href=https://s.apache.org/apache-beam-portability-support-table>Portable Capability Matrix</a> documents the capabilities of the portable Flink Runner.</p></div></div><footer class=footer><div class=footer__contained><div class=footer__cols><div class=footer__cols__col><div class=footer__cols__col__logo><img src=/images/beam_logo_circle.svg class=footer__logo alt="Beam logo"></div><div class=footer__cols__col__logo><img src=/images/apache_logo_circle.svg class=footer__logo alt="Apache logo"></div></div><div class="footer__cols__col footer__cols__col--md"><div class=footer__cols__col__title> [...] diff --git a/website/generated-content/get-started/beam-overview/index.html b/website/generated-content/get-started/beam-overview/index.html index 3581aad..3807d88 100644 --- a/website/generated-content/get-started/beam-overview/index.html +++ b/website/generated-content/get-started/beam-overview/index.html @@ -1,7 +1,8 @@ <!doctype html><html lang=en class=no-js><head><meta charset=utf-8><meta http-equiv=x-ua-compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1"><title>Beam Overview</title><meta name=description content="Apache Beam is an open source, unified model and set of language-specific SDKs for defining and executing data processing workflows, and also data ingestion and integration flows, supporting Enterprise Integration Patterns (EIPs) and Domain Specific [...] <span class=sr-only>Toggle navigation</span> <span class=icon-bar></span><span class=icon-bar></span><span class=icon-bar></span></button> -<a href=/ class=navbar-brand><img alt=Brand style=height:25px src=/images/beam_logo_navbar.png></a></div><div class="navbar-mask closed"></div><div id=navbar class="navbar-container closed"><ul class="nav navbar-nav"><li><a href=/get-started/beam-overview/>Get Started</a></li><li><a href=/documentation/>Documentation</a></li><li><a href=/documentation/sdks/java/>Languages</a></li><li><a href=/documentation/runners/capability-matrix/>RUNNERS</a></li><li><a href=/roadmap/>Roadmap</a></li>< [...] +<a href=/ class=navbar-brand><img alt=Brand style=height:25px src=/images/beam_logo_navbar.png></a></div><div class="navbar-mask closed"></div><div id=navbar class="navbar-container closed"><ul class="nav navbar-nav"><li><a href=/get-started/beam-overview/>Get Started</a></li><li><a href=/documentation/>Documentation</a></li><li><a href=/documentation/sdks/java/>Languages</a></li><li><a href=/documentation/runners/capability-matrix/>RUNNERS</a></li><li><a href=/roadmap/>Roadmap</a></li>< [...] +check our <a href=/get-started/from-spark>Getting started from Apache Spark</a> page.</p></blockquote><ol><li><p><a href=/get-started/try-apache-beam>Try Apache Beam</a> in an online interactive environment.</p></li><li><p>Follow the Quickstart for the <a href=/get-started/quickstart-java>Java SDK</a>, the <a href=/get-started/quickstart-py>Python SDK</a>, or the <a href=/get-started/quickstart-go>Go SDK</a>.</p></li><li><p>See the <a href=/get-started/wordcount-example>WordCount Example [...] <a href=http://www.apache.org>The Apache Software Foundation</a> | <a href=/privacy_policy>Privacy Policy</a> | <a href=/feed.xml>RSS Feed</a><br><br>Apache Beam, Apache, Beam, the Beam logo, and the Apache feather logo are either registered trademarks or trademarks of The Apache Software Foundation. All other products or name brands are trademarks of their respective holders, including The Apache Software Foundation.</div></footer></body></html> \ No newline at end of file diff --git a/website/generated-content/get-started/downloads/index.html b/website/generated-content/get-started/downloads/index.html index d9381e6..fe77ecc 100644 --- a/website/generated-content/get-started/downloads/index.html +++ b/website/generated-content/get-started/downloads/index.html @@ -1,7 +1,7 @@ <!doctype html><html lang=en class=no-js><head><meta charset=utf-8><meta http-equiv=x-ua-compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1"><title>Beam Releases</title><meta name=description content="Apache Beam is an open source, unified model and set of language-specific SDKs for defining and executing data processing workflows, and also data ingestion and integration flows, supporting Enterprise Integration Patterns (EIPs) and Domain Specific [...] <span class=sr-only>Toggle navigation</span> <span class=icon-bar></span><span class=icon-bar></span><span class=icon-bar></span></button> -<a href=/ class=navbar-brand><img alt=Brand style=height:25px src=/images/beam_logo_navbar.png></a></div><div class="navbar-mask closed"></div><div id=navbar class="navbar-container closed"><ul class="nav navbar-nav"><li><a href=/get-started/beam-overview/>Get Started</a></li><li><a href=/documentation/>Documentation</a></li><li><a href=/documentation/sdks/java/>Languages</a></li><li><a href=/documentation/runners/capability-matrix/>RUNNERS</a></li><li><a href=/roadmap/>Roadmap</a></li>< [...] +<a href=/ class=navbar-brand><img alt=Brand style=height:25px src=/images/beam_logo_navbar.png></a></div><div class="navbar-mask closed"></div><div id=navbar class="navbar-container closed"><ul class="nav navbar-nav"><li><a href=/get-started/beam-overview/>Get Started</a></li><li><a href=/documentation/>Documentation</a></li><li><a href=/documentation/sdks/java/>Languages</a></li><li><a href=/documentation/runners/capability-matrix/>RUNNERS</a></li><li><a href=/roadmap/>Roadmap</a></li>< [...] central repository. The Java SDK is available on <a href=https://search.maven.org/#search%7Cga%7C1%7Cg%3A%22org.apache.beam%22>Maven Central Repository</a>, and the Python SDK is available on <a href=https://pypi.python.org/pypi/apache-beam>PyPI</a>.</p><p>For example, if you are developing using Maven and want to use the SDK for Java with the <code>DirectRunner</code>, add the following dependencies to your <code>pom.xml</code> file:</p><pre><code><dependency> diff --git a/website/generated-content/get-started/from-spark/index.html b/website/generated-content/get-started/from-spark/index.html new file mode 100644 index 0000000..f52382d --- /dev/null +++ b/website/generated-content/get-started/from-spark/index.html @@ -0,0 +1,90 @@ +<!doctype html><html lang=en class=no-js><head><meta charset=utf-8><meta http-equiv=x-ua-compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1"><title>Getting started from Apache Spark</title><meta name=description content="Apache Beam is an open source, unified model and set of language-specific SDKs for defining and executing data processing workflows, and also data ingestion and integration flows, supporting Enterprise Integration Patterns (EIPs) [...] +<span class=sr-only>Toggle navigation</span> +<span class=icon-bar></span><span class=icon-bar></span><span class=icon-bar></span></button> +<a href=/ class=navbar-brand><img alt=Brand style=height:25px src=/images/beam_logo_navbar.png></a></div><div class="navbar-mask closed"></div><div id=navbar class="navbar-container closed"><ul class="nav navbar-nav"><li><a href=/get-started/beam-overview/>Get Started</a></li><li><a href=/documentation/>Documentation</a></li><li><a href=/documentation/sdks/java/>Languages</a></li><li><a href=/documentation/runners/capability-matrix/>RUNNERS</a></li><li><a href=/roadmap/>Roadmap</a></li>< [...] +learning <em>Apache Beam</em> is familiar. +The Beam and Spark APIs are similar, so you already know the basic concepts.</p><p>Spark stores data <em>Spark DataFrames</em> for structured data, +and in <em>Resilient Distributed Datasets</em> (RDD) for unstructured data. +We are using RDDs for this guide.</p><p>A Spark RDD represents a collection of elements, +while in Beam it’s called a <em>Parallel Collection</em> (PCollection). +A PCollection in Beam does <em>not</em> have any ordering guarantees.</p><p>Likewise, a transform in Beam is called a <em>Parallel Transform</em> (PTransform).</p><p>Here are some examples of common operations and their equivalent between PySpark and Beam.</p><h2 id=overview>Overview</h2><p>Here’s a simple example of a PySpark pipeline that takes the numbers from one to four, +multiplies them by two, adds all the values together, and prints the result.</p><div class=language-py><div class=highlight><pre class=chroma><code class=language-py data-lang=py><span class=kn>import</span> <span class=nn>pyspark</span> + +<span class=n>sc</span> <span class=o>=</span> <span class=n>pyspark</span><span class=o>.</span><span class=n>SparkContext</span><span class=p>()</span> +<span class=n>result</span> <span class=o>=</span> <span class=p>(</span> + <span class=n>sc</span><span class=o>.</span><span class=n>parallelize</span><span class=p>([</span><span class=mi>1</span><span class=p>,</span> <span class=mi>2</span><span class=p>,</span> <span class=mi>3</span><span class=p>,</span> <span class=mi>4</span><span class=p>])</span> + <span class=o>.</span><span class=n>map</span><span class=p>(</span><span class=k>lambda</span> <span class=n>x</span><span class=p>:</span> <span class=n>x</span> <span class=o>*</span> <span class=mi>2</span><span class=p>)</span> + <span class=o>.</span><span class=n>reduce</span><span class=p>(</span><span class=k>lambda</span> <span class=n>x</span><span class=p>,</span> <span class=n>y</span><span class=p>:</span> <span class=n>x</span> <span class=o>+</span> <span class=n>y</span><span class=p>)</span> +<span class=p>)</span> +<span class=k>print</span><span class=p>(</span><span class=n>result</span><span class=p>)</span></code></pre></div></div><p>In Beam you pipe your data through the pipeline using the +<em>pipe operator</em> <code>|</code> like <code>data | beam.Map(...)</code> instead of chaining +methods like <code>data.map(...)</code>, but they’re doing the same thing.</p><p>Here’s what an equivalent pipeline looks like in Beam.</p><div class=language-py><div class=highlight><pre class=chroma><code class=language-py data-lang=py><span class=kn>import</span> <span class=nn>apache_beam</span> <span class=kn>as</span> <span class=nn>beam</span> + +<span class=k>with</span> <span class=n>beam</span><span class=o>.</span><span class=n>Pipeline</span><span class=p>()</span> <span class=k>as</span> <span class=n>pipeline</span><span class=p>:</span> + <span class=n>result</span> <span class=o>=</span> <span class=p>(</span> + <span class=n>pipeline</span> + <span class=o>|</span> <span class=n>beam</span><span class=o>.</span><span class=n>Create</span><span class=p>([</span><span class=mi>1</span><span class=p>,</span> <span class=mi>2</span><span class=p>,</span> <span class=mi>3</span><span class=p>,</span> <span class=mi>4</span><span class=p>])</span> + <span class=o>|</span> <span class=n>beam</span><span class=o>.</span><span class=n>Map</span><span class=p>(</span><span class=k>lambda</span> <span class=n>x</span><span class=p>:</span> <span class=n>x</span> <span class=o>*</span> <span class=mi>2</span><span class=p>)</span> + <span class=o>|</span> <span class=n>beam</span><span class=o>.</span><span class=n>CombineGlobally</span><span class=p>(</span><span class=nb>sum</span><span class=p>)</span> + <span class=o>|</span> <span class=n>beam</span><span class=o>.</span><span class=n>Map</span><span class=p>(</span><span class=k>print</span><span class=p>)</span> + <span class=p>)</span></code></pre></div></div><blockquote><p>ℹ️ Note that we called <code>print</code> inside a <code>Map</code> transform. +That’s because we can only access the elements of a PCollection +from within a PTransform.</p></blockquote><p>Another thing to note is that Beam pipelines are constructed lazily. +This means that when you pipe <code>|</code> data you’re only declaring the +transformations and the order you want them to happen, +but the actual computation doesn’t happen. +The pipeline is run after the <code>with beam.Pipeline() as pipeline</code> context has +closed.</p><blockquote><p>ℹ️ When the <code>with beam.Pipeline() as pipeline</code> context closes, +it implicitly calls <code>pipeline.run()</code> which triggers the computation to happen.</p></blockquote><p>The pipeline is then sent to your +<a href=https://beam.apache.org/documentation/runners/capability-matrix/>runner of choice</a> +and it processes the data.</p><blockquote><p>ℹ️ The pipeline can run locally with the <em>DirectRunner</em>, +or in a distributed runner such as Flink, Spark, or Dataflow. +The Spark runner is not related to PySpark.</p></blockquote><p>A label can optionally be added to a transform using the +<em>right shift operator</em> <code>>></code> like <code>data | 'My description' >> beam.Map(...)</code>. +This serves both as comments and makes your pipeline easier to debug.</p><p>This is how the pipeline looks after adding labels.</p><div class=language-py><div class=highlight><pre class=chroma><code class=language-py data-lang=py><span class=kn>import</span> <span class=nn>apache_beam</span> <span class=kn>as</span> <span class=nn>beam</span> + +<span class=k>with</span> <span class=n>beam</span><span class=o>.</span><span class=n>Pipeline</span><span class=p>()</span> <span class=k>as</span> <span class=n>pipeline</span><span class=p>:</span> + <span class=n>result</span> <span class=o>=</span> <span class=p>(</span> + <span class=n>pipeline</span> + <span class=o>|</span> <span class=s1>'Create numbers'</span> <span class=o>>></span> <span class=n>beam</span><span class=o>.</span><span class=n>Create</span><span class=p>([</span><span class=mi>1</span><span class=p>,</span> <span class=mi>2</span><span class=p>,</span> <span class=mi>3</span><span class=p>,</span> <span class=mi>4</span><span class=p>])</span> + <span class=o>|</span> <span class=s1>'Multiply by two'</span> <span class=o>>></span> <span class=n>beam</span><span class=o>.</span><span class=n>Map</span><span class=p>(</span><span class=k>lambda</span> <span class=n>x</span><span class=p>:</span> <span class=n>x</span> <span class=o>*</span> <span class=mi>2</span><span class=p>)</span> + <span class=o>|</span> <span class=s1>'Sum everything'</span> <span class=o>>></span> <span class=n>beam</span><span class=o>.</span><span class=n>CombineGlobally</span><span class=p>(</span><span class=nb>sum</span><span class=p>)</span> + <span class=o>|</span> <span class=s1>'Print results'</span> <span class=o>>></span> <span class=n>beam</span><span class=o>.</span><span class=n>Map</span><span class=p>(</span><span class=k>print</span><span class=p>)</span> + <span class=p>)</span></code></pre></div></div><h2 id=setup>Setup</h2><p>Here’s a comparison on how to get started both in PySpark and Beam.</p><div class=table-wrapper><table><tr><th></th><th>PySpark</th><th>Beam</th></tr><tr><td><b>Install</b></td><td><code>$ pip install pyspark</code></td><td><code>$ pip install apache-beam</code></td></tr><tr><td><b>Imports</b></td><td><code>import pyspark</code></td><td><code>import apache_beam as beam</code></td></tr><tr><td><b>Creating a [...] +<a href=/documentation/transforms/python/overview>Python transform gallery</a>.</p></blockquote><h2 id=using-calculated-values>Using calculated values</h2><p>Since we are working in potentially distributed environments, +we can’t guarantee that the results we’ve calculated are available at any given machine.</p><p>In PySpark, we can get a result from a collection of elements (RDD) by using +<code>data.collect()</code>, or other aggregations such as <code>reduce()</code>, <code>count()</code>, and more.</p><p>Here’s an example to scale numbers into a range between zero and one.</p><div class=language-py><div class=highlight><pre class=chroma><code class=language-py data-lang=py><span class=kn>import</span> <span class=nn>pyspark</span> + +<span class=n>sc</span> <span class=o>=</span> <span class=n>pyspark</span><span class=o>.</span><span class=n>SparkContext</span><span class=p>()</span> +<span class=n>values</span> <span class=o>=</span> <span class=n>sc</span><span class=o>.</span><span class=n>parallelize</span><span class=p>([</span><span class=mi>1</span><span class=p>,</span> <span class=mi>2</span><span class=p>,</span> <span class=mi>3</span><span class=p>,</span> <span class=mi>4</span><span class=p>])</span> +<span class=n>total</span> <span class=o>=</span> <span class=n>values</span><span class=o>.</span><span class=n>reduce</span><span class=p>(</span><span class=k>lambda</span> <span class=n>x</span><span class=p>,</span> <span class=n>y</span><span class=p>:</span> <span class=n>x</span> <span class=o>+</span> <span class=n>y</span><span class=p>)</span> + +<span class=c1># We can simply use `total` since it's already a Python `int` value from `reduce`.</span> +<span class=n>scaled_values</span> <span class=o>=</span> <span class=n>values</span><span class=o>.</span><span class=n>map</span><span class=p>(</span><span class=k>lambda</span> <span class=n>x</span><span class=p>:</span> <span class=n>x</span> <span class=o>/</span> <span class=n>total</span><span class=p>)</span> + +<span class=c1># But to access `scaled_values`, we need to call `collect`.</span> +<span class=k>print</span><span class=p>(</span><span class=n>scaled_values</span><span class=o>.</span><span class=n>collect</span><span class=p>())</span></code></pre></div></div><p>In Beam the results from all transforms result in a PCollection. +We use <a href=/documentation/programming-guide/#side-inputs><em>side inputs</em></a> +to feed a PCollection into a transform and access its values.</p><p>Any transform that accepts a function, like +<a href=/documentation/transforms/python/elementwise/map><code>Map</code></a>, +can take side inputs. +If we only need a single value, we can use +<a href=https://beam.apache.org/releases/pydoc/current/apache_beam.pvalue.html#apache_beam.pvalue.AsSingleton><code>beam.pvalue.AsSingleton</code></a> and access them as a Python value. +If we need multiple values, we can use +<a href=https://beam.apache.org/releases/pydoc/current/apache_beam.pvalue.html#apache_beam.pvalue.AsIter><code>beam.pvalue.AsIter</code></a> +and access them as an <a href=https://docs.python.org/3/glossary.html#term-iterable><code>iterable</code></a>.</p><div class=language-py><div class=highlight><pre class=chroma><code class=language-py data-lang=py><span class=kn>import</span> <span class=nn>apache_beam</span> <span class=kn>as</span> <span class=nn>beam</span> + +<span class=k>with</span> <span class=n>beam</span><span class=o>.</span><span class=n>Pipeline</span><span class=p>()</span> <span class=k>as</span> <span class=n>pipeline</span><span class=p>:</span> + <span class=n>values</span> <span class=o>=</span> <span class=n>pipeline</span> <span class=o>|</span> <span class=n>beam</span><span class=o>.</span><span class=n>Create</span><span class=p>([</span><span class=mi>1</span><span class=p>,</span> <span class=mi>2</span><span class=p>,</span> <span class=mi>3</span><span class=p>,</span> <span class=mi>4</span><span class=p>])</span> + <span class=n>total</span> <span class=o>=</span> <span class=n>values</span> <span class=o>|</span> <span class=n>beam</span><span class=o>.</span><span class=n>CombineGlobally</span><span class=p>(</span><span class=nb>sum</span><span class=p>)</span> + + <span class=c1># To access `total`, we need to pass it as a side input.</span> + <span class=n>scaled_values</span> <span class=o>=</span> <span class=n>values</span> <span class=o>|</span> <span class=n>beam</span><span class=o>.</span><span class=n>Map</span><span class=p>(</span> + <span class=k>lambda</span> <span class=n>x</span><span class=p>,</span> <span class=n>total</span><span class=p>:</span> <span class=n>x</span> <span class=o>/</span> <span class=n>total</span><span class=p>,</span> + <span class=n>total</span><span class=o>=</span><span class=n>beam</span><span class=o>.</span><span class=n>pvalue</span><span class=o>.</span><span class=n>AsSingleton</span><span class=p>(</span><span class=n>total</span><span class=p>))</span> + + <span class=n>scaled_values</span> <span class=o>|</span> <span class=n>beam</span><span class=o>.</span><span class=n>Map</span><span class=p>(</span><span class=k>print</span><span class=p>)</span></code></pre></div></div><blockquote><p>ℹ️ In Beam we need to pass a side input explicitly, but we get the +benefit that a reduction or aggregation does <em>not</em> have to fit into memory.</p></blockquote><h2 id=next-steps>Next Steps</h2><ul><li>Take a look at all the available transforms in the <a href=/documentation/transforms/python/overview>Python transform gallery</a>.</li><li>Learn how to read from and write to files in the <a href=/documentation/programming-guide/#pipeline-io><em>Pipeline I/O</em> section of the <em>Programming guide</em></a></li><li>Walk through additional WordCount [...] +<a href=http://www.apache.org>The Apache Software Foundation</a> +| <a href=/privacy_policy>Privacy Policy</a> +| <a href=/feed.xml>RSS Feed</a><br><br>Apache Beam, Apache, Beam, the Beam logo, and the Apache feather logo are either registered trademarks or trademarks of The Apache Software Foundation. All other products or name brands are trademarks of their respective holders, including The Apache Software Foundation.</div></footer></body></html> \ No newline at end of file diff --git a/website/generated-content/get-started/index.html b/website/generated-content/get-started/index.html index a8bbed9..73f33a0 100644 --- a/website/generated-content/get-started/index.html +++ b/website/generated-content/get-started/index.html @@ -1,7 +1,7 @@ <!doctype html><html lang=en class=no-js><head><meta charset=utf-8><meta http-equiv=x-ua-compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1"><title>Use Beam</title><meta name=description content="Apache Beam is an open source, unified model and set of language-specific SDKs for defining and executing data processing workflows, and also data ingestion and integration flows, supporting Enterprise Integration Patterns (EIPs) and Domain Specific Lang [...] <span class=sr-only>Toggle navigation</span> <span class=icon-bar></span><span class=icon-bar></span><span class=icon-bar></span></button> -<a href=/ class=navbar-brand><img alt=Brand style=height:25px src=/images/beam_logo_navbar.png></a></div><div class="navbar-mask closed"></div><div id=navbar class="navbar-container closed"><ul class="nav navbar-nav"><li><a href=/get-started/beam-overview/>Get Started</a></li><li><a href=/documentation/>Documentation</a></li><li><a href=/documentation/sdks/java/>Languages</a></li><li><a href=/documentation/runners/capability-matrix/>RUNNERS</a></li><li><a href=/roadmap/>Roadmap</a></li>< [...] +<a href=/ class=navbar-brand><img alt=Brand style=height:25px src=/images/beam_logo_navbar.png></a></div><div class="navbar-mask closed"></div><div id=navbar class="navbar-container closed"><ul class="nav navbar-nav"><li><a href=/get-started/beam-overview/>Get Started</a></li><li><a href=/documentation/>Documentation</a></li><li><a href=/documentation/sdks/java/>Languages</a></li><li><a href=/documentation/runners/capability-matrix/>RUNNERS</a></li><li><a href=/roadmap/>Roadmap</a></li>< [...] <a href=http://www.apache.org>The Apache Software Foundation</a> | <a href=/privacy_policy>Privacy Policy</a> | <a href=/feed.xml>RSS Feed</a><br><br>Apache Beam, Apache, Beam, the Beam logo, and the Apache feather logo are either registered trademarks or trademarks of The Apache Software Foundation. All other products or name brands are trademarks of their respective holders, including The Apache Software Foundation.</div></footer></body></html> \ No newline at end of file diff --git a/website/generated-content/get-started/index.xml b/website/generated-content/get-started/index.xml index c8195e9..738a2b4 100644 --- a/website/generated-content/get-started/index.xml +++ b/website/generated-content/get-started/index.xml @@ -876,6 +876,10 @@ limitations under the License. <p><strong>Note:</strong> You can always execute your pipeline locally for testing and debugging purposes.</p> <h2 id="get-started">Get Started</h2> <p>Get started using Beam for your data processing tasks.</p> +<blockquote> +<p>If you already know <a href="http://spark.apache.org/">Apache Spark</a>, +check our <a href="/get-started/from-spark">Getting started from Apache Spark</a> page.</p> +</blockquote> <ol> <li> <p><a href="/get-started/try-apache-beam">Try Apache Beam</a> in an online interactive environment.</p> @@ -2980,6 +2984,295 @@ using <a href="https://beam.apache.org/releases/pydoc/2.26.0/apache_beam.io.g <li>Dive in to some of our favorite <a href="/documentation/resources/videos-and-podcasts">Videos and Podcasts</a>.</li> <li>Join the Beam <a href="/community/contact-us">users@</a> mailing list.</li> </ul> +<p>Please don&rsquo;t hesitate to <a href="/community/contact-us">reach out</a> if you encounter any issues!</p></description></item><item><title>Get-Started: Getting started from Apache Spark</title><link>/get-started/from-spark/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/get-started/from-spark/</guid><description> +<!-- +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at +http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +--> +<h1 id="getting-started-from-apache-spark">Getting started from Apache Spark</h1> +<script type="text/javascript"> +localStorage.setItem("language", "language-py") +</script> +<p>If you already know <a href="http://spark.apache.org/"><em>Apache Spark</em></a>, +learning <em>Apache Beam</em> is familiar. +The Beam and Spark APIs are similar, so you already know the basic concepts.</p> +<p>Spark stores data <em>Spark DataFrames</em> for structured data, +and in <em>Resilient Distributed Datasets</em> (RDD) for unstructured data. +We are using RDDs for this guide.</p> +<p>A Spark RDD represents a collection of elements, +while in Beam it&rsquo;s called a <em>Parallel Collection</em> (PCollection). +A PCollection in Beam does <em>not</em> have any ordering guarantees.</p> +<p>Likewise, a transform in Beam is called a <em>Parallel Transform</em> (PTransform).</p> +<p>Here are some examples of common operations and their equivalent between PySpark and Beam.</p> +<h2 id="overview">Overview</h2> +<p>Here&rsquo;s a simple example of a PySpark pipeline that takes the numbers from one to four, +multiplies them by two, adds all the values together, and prints the result.</p> +<div class=language-py> +<div class="highlight"><pre class="chroma"><code class="language-py" data-lang="py"><span class="kn">import</span> <span class="nn">pyspark</span> +<span class="n">sc</span> <span class="o">=</span> <span class="n">pyspark</span><span class="o">.</span><span class="n">SparkContext</span><span class="p">()</span> +<span class="n">result</span> <span class="o">=</span> <span class="p">(</span> +<span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">])</span> +<span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span> <span class="o">*</span> <span class="mi">2</span><span class="p">)</span> +<span class="o">.</span><span class="n">reduce</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">:</span> <span class="n">x</span> <span class="o">+</span> <span class="n">y</span><span class="p">)</span> +<span class="p">)</span> +<span class="k">print</span><span class="p">(</span><span class="n">result</span><span class="p">)</span></code></pre></div> +</div> +<p>In Beam you pipe your data through the pipeline using the +<em>pipe operator</em> <code>|</code> like <code>data | beam.Map(...)</code> instead of chaining +methods like <code>data.map(...)</code>, but they&rsquo;re doing the same thing.</p> +<p>Here&rsquo;s what an equivalent pipeline looks like in Beam.</p> +<div class=language-py> +<div class="highlight"><pre class="chroma"><code class="language-py" data-lang="py"><span class="kn">import</span> <span class="nn">apache_beam</span> <span class="kn">as</span> <span class="nn">beam</span> +<span class="k">with</span> <span class="n">beam</span><span class="o">.</span><span class="n">Pipeline</span><span class="p">()</span> <span class="k">as</span> <span class="n">pipeline</span><span class="p">:</span> +<span class="n">result</span> <span class="o">=</span> <span class="p">(</span> +<span class="n">pipeline</span> +<span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">Create</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">])</span> +<span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">Map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span> <span class="o">*</span> <span class="mi">2</span><span class="p">)</span> +<span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">CombineGlobally</span><span class="p">(</span><span class="nb">sum</span><span class="p">)</span> +<span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">Map</span><span class="p">(</span><span class="k">print</span><span class="p">)</span> +<span class="p">)</span></code></pre></div> +</div> +<blockquote> +<p>ℹ️ Note that we called <code>print</code> inside a <code>Map</code> transform. +That&rsquo;s because we can only access the elements of a PCollection +from within a PTransform.</p> +</blockquote> +<p>Another thing to note is that Beam pipelines are constructed lazily. +This means that when you pipe <code>|</code> data you&rsquo;re only declaring the +transformations and the order you want them to happen, +but the actual computation doesn&rsquo;t happen. +The pipeline is run after the <code>with beam.Pipeline() as pipeline</code> context has +closed.</p> +<blockquote> +<p>ℹ️ When the <code>with beam.Pipeline() as pipeline</code> context closes, +it implicitly calls <code>pipeline.run()</code> which triggers the computation to happen.</p> +</blockquote> +<p>The pipeline is then sent to your +<a href="https://beam.apache.org/documentation/runners/capability-matrix/">runner of choice</a> +and it processes the data.</p> +<blockquote> +<p>ℹ️ The pipeline can run locally with the <em>DirectRunner</em>, +or in a distributed runner such as Flink, Spark, or Dataflow. +The Spark runner is not related to PySpark.</p> +</blockquote> +<p>A label can optionally be added to a transform using the +<em>right shift operator</em> <code>&gt;&gt;</code> like <code>data | 'My description' &gt;&gt; beam.Map(...)</code>. +This serves both as comments and makes your pipeline easier to debug.</p> +<p>This is how the pipeline looks after adding labels.</p> +<div class=language-py> +<div class="highlight"><pre class="chroma"><code class="language-py" data-lang="py"><span class="kn">import</span> <span class="nn">apache_beam</span> <span class="kn">as</span> <span class="nn">beam</span> +<span class="k">with</span> <span class="n">beam</span><span class="o">.</span><span class="n">Pipeline</span><span class="p">()</span> <span class="k">as</span> <span class="n">pipeline</span><span class="p">:</span> +<span class="n">result</span> <span class="o">=</span> <span class="p">(</span> +<span class="n">pipeline</span> +<span class="o">|</span> <span class="s1">&#39;Create numbers&#39;</span> <span class="o">&gt;&gt;</span> <span class="n">beam</span><span class="o">.</span><span class="n">Create</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><sp [...] +<span class="o">|</span> <span class="s1">&#39;Multiply by two&#39;</span> <span class="o">&gt;&gt;</span> <span class="n">beam</span><span class="o">.</span><span class="n">Map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span> <span class="o">*</span> <span class="mi">2</span><span class="p">)</span> +<span class="o">|</span> <span class="s1">&#39;Sum everything&#39;</span> <span class="o">&gt;&gt;</span> <span class="n">beam</span><span class="o">.</span><span class="n">CombineGlobally</span><span class="p">(</span><span class="nb">sum</span><span class="p">)</span> +<span class="o">|</span> <span class="s1">&#39;Print results&#39;</span> <span class="o">&gt;&gt;</span> <span class="n">beam</span><span class="o">.</span><span class="n">Map</span><span class="p">(</span><span class="k">print</span><span class="p">)</span> +<span class="p">)</span></code></pre></div> +</div> +<h2 id="setup">Setup</h2> +<p>Here&rsquo;s a comparison on how to get started both in PySpark and Beam.</p> +<div class="table-wrapper"><table> +<tr> +<th></th> +<th>PySpark</th> +<th>Beam</th> +</tr> +<tr> +<td><b>Install</b></td> +<td><code>$ pip install pyspark</code></td> +<td><code>$ pip install apache-beam</code></td> +</tr> +<tr> +<td><b>Imports</b></td> +<td><code>import pyspark</code></td> +<td><code>import apache_beam as beam</code></td> +</tr> +<tr> +<td><b>Creating a<br>local pipeline</b></td> +<td> +<code>sc = pyspark.SparkContext() as sc:</code><br> +<code># Your pipeline code here.</code> +</td> +<td> +<code>with beam.Pipeline() as pipeline:</code><br> +<code>&nbsp;&nbsp;&nbsp;&nbsp;# Your pipeline code here.</code> +</td> +</tr> +<tr> +<td><b>Creating values</b></td> +<td><code>values = sc.parallelize([1, 2, 3, 4])</code></td> +<td><code>values = pipeline | beam.Create([1, 2, 3, 4])</code></td> +</tr> +<tr> +<td><b>Creating<br>key-value pairs</b></td> +<td> +<code>pairs = sc.parallelize([</code><br> +<code>&nbsp;&nbsp;&nbsp;&nbsp;('key1', 'value1'),</code><br> +<code>&nbsp;&nbsp;&nbsp;&nbsp;('key2', 'value2'),</code><br> +<code>&nbsp;&nbsp;&nbsp;&nbsp;('key3', 'value3'),</code><br> +<code>])</code> +</td> +<td> +<code>pairs = pipeline | beam.Create([</code><br> +<code>&nbsp;&nbsp;&nbsp;&nbsp;('key1', 'value1'),</code><br> +<code>&nbsp;&nbsp;&nbsp;&nbsp;('key2', 'value2'),</code><br> +<code>&nbsp;&nbsp;&nbsp;&nbsp;('key3', 'value3'),</code><br> +<code>])</code> +</td> +</tr> +<tr> +<td><b>Running a<br>local pipeline</b></td> +<td><code>$ spark-submit spark_pipeline.py</code></td> +<td><code>$ python beam_pipeline.py</code></td> +</tr> +</table></div> +<h2 id="transforms">Transforms</h2> +<p>Here are the equivalents of some common transforms in both PySpark and Beam.</p> +<div class="table-wrapper"><table> +<thead> +<tr> +<th></th> +<th>PySpark</th> +<th>Beam</th> +</tr> +</thead> +<tbody> +<tr> +<td><a href="/documentation/transforms/python/elementwise/map/"><strong>Map</strong></a></td> +<td><code>values.map(lambda x: x * 2)</code></td> +<td><code>values | beam.Map(lambda x: x * 2)</code></td> +</tr> +<tr> +<td><a href="/documentation/transforms/python/elementwise/filter/"><strong>Filter</strong></a></td> +<td><code>values.filter(lambda x: x % 2 == 0)</code></td> +<td><code>values | beam.Filter(lambda x: x % 2 == 0)</code></td> +</tr> +<tr> +<td><a href="/documentation/transforms/python/elementwise/flatmap/"><strong>FlatMap</strong></a></td> +<td><code>values.flatMap(lambda x: range(x))</code></td> +<td><code>values | beam.FlatMap(lambda x: range(x))</code></td> +</tr> +<tr> +<td><a href="/documentation/transforms/python/aggregation/groupbykey/"><strong>Group by key</strong></a></td> +<td><code>pairs.groupByKey()</code></td> +<td><code>pairs | beam.GroupByKey()</code></td> +</tr> +<tr> +<td><a href="/documentation/transforms/python/aggregation/combineglobally/"><strong>Reduce</strong></a></td> +<td><code>values.reduce(lambda x, y: x+y)</code></td> +<td><code>values | beam.CombineGlobally(sum)</code></td> +</tr> +<tr> +<td><a href="/documentation/transforms/python/aggregation/combineperkey/"><strong>Reduce by key</strong></a></td> +<td><code>pairs.reduceByKey(lambda x, y: x+y)</code></td> +<td><code>pairs | beam.CombinePerKey(sum)</code></td> +</tr> +<tr> +<td><a href="/documentation/transforms/python/aggregation/distinct/"><strong>Distinct</strong></a></td> +<td><code>values.distinct()</code></td> +<td><code>values | beam.Distinct()</code></td> +</tr> +<tr> +<td><a href="/documentation/transforms/python/aggregation/count/"><strong>Count</strong></a></td> +<td><code>values.count()</code></td> +<td><code>values | beam.combiners.Count.Globally()</code></td> +</tr> +<tr> +<td><a href="/documentation/transforms/python/aggregation/count/"><strong>Count by key</strong></a></td> +<td><code>pairs.countByKey()</code></td> +<td><code>pairs | beam.combiners.Count.PerKey()</code></td> +</tr> +<tr> +<td><a href="/documentation/transforms/python/aggregation/top/"><strong>Take smallest</strong></a></td> +<td><code>values.takeOrdered(3)</code></td> +<td><code>values | beam.combiners.Top.Smallest(3)</code></td> +</tr> +<tr> +<td><a href="/documentation/transforms/python/aggregation/top/"><strong>Take largest</strong></a></td> +<td><code>values.takeOrdered(3, lambda x: -x)</code></td> +<td><code>values | beam.combiners.Top.Largest(3)</code></td> +</tr> +<tr> +<td><a href="/documentation/transforms/python/aggregation/sample/"><strong>Random sample</strong></a></td> +<td><code>values.takeSample(False, 3)</code></td> +<td><code>values | beam.combiners.Sample.FixedSizeGlobally(3)</code></td> +</tr> +<tr> +<td><a href="/documentation/transforms/python/other/flatten/"><strong>Union</strong></a></td> +<td><code>values.union(otherValues)</code></td> +<td><code>(values, otherValues) | beam.Flatten()</code></td> +</tr> +<tr> +<td><a href="/documentation/transforms/python/aggregation/cogroupbykey/"><strong>Co-group</strong></a></td> +<td><code>pairs.cogroup(otherPairs)</code></td> +<td><code>{'Xs': pairs, 'Ys': otherPairs} | beam.CoGroupByKey()</code></td> +</tr> +</tbody> +</table></div> +<blockquote> +<p>ℹ️ To learn more about the transforms available in Beam, check the +<a href="/documentation/transforms/python/overview">Python transform gallery</a>.</p> +</blockquote> +<h2 id="using-calculated-values">Using calculated values</h2> +<p>Since we are working in potentially distributed environments, +we can&rsquo;t guarantee that the results we&rsquo;ve calculated are available at any given machine.</p> +<p>In PySpark, we can get a result from a collection of elements (RDD) by using +<code>data.collect()</code>, or other aggregations such as <code>reduce()</code>, <code>count()</code>, and more.</p> +<p>Here&rsquo;s an example to scale numbers into a range between zero and one.</p> +<div class=language-py> +<div class="highlight"><pre class="chroma"><code class="language-py" data-lang="py"><span class="kn">import</span> <span class="nn">pyspark</span> +<span class="n">sc</span> <span class="o">=</span> <span class="n">pyspark</span><span class="o">.</span><span class="n">SparkContext</span><span class="p">()</span> +<span class="n">values</span> <span class="o">=</span> <span class="n">sc</span><span class="o">.</span><span class="n">parallelize</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">])</span> +<span class="n">total</span> <span class="o">=</span> <span class="n">values</span><span class="o">.</span><span class="n">reduce</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">,</span> <span class="n">y</span><span class="p">:</span> <span class="n">x</span> <span class="o">+</span> <span class="n">y</span><span class="p">)</span> +<span class="c1"># We can simply use `total` since it&#39;s already a Python `int` value from `reduce`.</span> +<span class="n">scaled_values</span> <span class="o">=</span> <span class="n">values</span><span class="o">.</span><span class="n">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span> <span class="o">/</span> <span class="n">total</span><span class="p">)</span> +<span class="c1"># But to access `scaled_values`, we need to call `collect`.</span> +<span class="k">print</span><span class="p">(</span><span class="n">scaled_values</span><span class="o">.</span><span class="n">collect</span><span class="p">())</span></code></pre></div> +</div> +<p>In Beam the results from all transforms result in a PCollection. +We use <a href="/documentation/programming-guide/#side-inputs"><em>side inputs</em></a> +to feed a PCollection into a transform and access its values.</p> +<p>Any transform that accepts a function, like +<a href="/documentation/transforms/python/elementwise/map"><code>Map</code></a>, +can take side inputs. +If we only need a single value, we can use +<a href="https://beam.apache.org/releases/pydoc/current/apache_beam.pvalue.html#apache_beam.pvalue.AsSingleton"><code>beam.pvalue.AsSingleton</code></a> and access them as a Python value. +If we need multiple values, we can use +<a href="https://beam.apache.org/releases/pydoc/current/apache_beam.pvalue.html#apache_beam.pvalue.AsIter"><code>beam.pvalue.AsIter</code></a> +and access them as an <a href="https://docs.python.org/3/glossary.html#term-iterable"><code>iterable</code></a>.</p> +<div class=language-py> +<div class="highlight"><pre class="chroma"><code class="language-py" data-lang="py"><span class="kn">import</span> <span class="nn">apache_beam</span> <span class="kn">as</span> <span class="nn">beam</span> +<span class="k">with</span> <span class="n">beam</span><span class="o">.</span><span class="n">Pipeline</span><span class="p">()</span> <span class="k">as</span> <span class="n">pipeline</span><span class="p">:</span> +<span class="n">values</span> <span class="o">=</span> <span class="n">pipeline</span> <span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">Create</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span c [...] +<span class="n">total</span> <span class="o">=</span> <span class="n">values</span> <span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">CombineGlobally</span><span class="p">(</span><span class="nb">sum</span><span class="p">)</span> +<span class="c1"># To access `total`, we need to pass it as a side input.</span> +<span class="n">scaled_values</span> <span class="o">=</span> <span class="n">values</span> <span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">Map</span><span class="p">(</span> +<span class="k">lambda</span> <span class="n">x</span><span class="p">,</span> <span class="n">total</span><span class="p">:</span> <span class="n">x</span> <span class="o">/</span> <span class="n">total</span><span class="p">,</span> +<span class="n">total</span><span class="o">=</span><span class="n">beam</span><span class="o">.</span><span class="n">pvalue</span><span class="o">.</span><span class="n">AsSingleton</span><span class="p">(</span><span class="n">total</span><span class="p">))</span> +<span class="n">scaled_values</span> <span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">Map</span><span class="p">(</span><span class="k">print</span><span class="p">)</span></code></pre></div> +</div> +<blockquote> +<p>ℹ️ In Beam we need to pass a side input explicitly, but we get the +benefit that a reduction or aggregation does <em>not</em> have to fit into memory.</p> +</blockquote> +<h2 id="next-steps">Next Steps</h2> +<ul> +<li>Take a look at all the available transforms in the <a href="/documentation/transforms/python/overview">Python transform gallery</a>.</li> +<li>Learn how to read from and write to files in the <a href="/documentation/programming-guide/#pipeline-io"><em>Pipeline I/O</em> section of the <em>Programming guide</em></a></li> +<li>Walk through additional WordCount examples in the <a href="/get-started/wordcount-example">WordCount Example Walkthrough</a>.</li> +<li>Take a self-paced tour through our <a href="/documentation/resources/learning-resources">Learning Resources</a>.</li> +<li>Dive in to some of our favorite <a href="/documentation/resources/videos-and-podcasts">Videos and Podcasts</a>.</li> +<li>Join the Beam <a href="/community/contact-us">users@</a> mailing list.</li> +<li>If you&rsquo;re interested in contributing to the Apache Beam codebase, see the <a href="/contribute">Contribution Guide</a>.</li> +</ul> <p>Please don&rsquo;t hesitate to <a href="/community/contact-us">reach out</a> if you encounter any issues!</p></description></item><item><title>Get-Started: Try Apache Beam</title><link>/get-started/try-apache-beam/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>/get-started/try-apache-beam/</guid><description> <!-- Licensed under the Apache License, Version 2.0 (the "License"); diff --git a/website/generated-content/get-started/mobile-gaming-example/index.html b/website/generated-content/get-started/mobile-gaming-example/index.html index 676d8e4..0096a72 100644 --- a/website/generated-content/get-started/mobile-gaming-example/index.html +++ b/website/generated-content/get-started/mobile-gaming-example/index.html @@ -1,7 +1,7 @@ <!doctype html><html lang=en class=no-js><head><meta charset=utf-8><meta http-equiv=x-ua-compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1"><title>Beam Mobile Gaming Example</title><meta name=description content="Apache Beam is an open source, unified model and set of language-specific SDKs for defining and executing data processing workflows, and also data ingestion and integration flows, supporting Enterprise Integration Patterns (EIPs) and Do [...] <span class=sr-only>Toggle navigation</span> <span class=icon-bar></span><span class=icon-bar></span><span class=icon-bar></span></button> -<a href=/ class=navbar-brand><img alt=Brand style=height:25px src=/images/beam_logo_navbar.png></a></div><div class="navbar-mask closed"></div><div id=navbar class="navbar-container closed"><ul class="nav navbar-nav"><li><a href=/get-started/beam-overview/>Get Started</a></li><li><a href=/documentation/>Documentation</a></li><li><a href=/documentation/sdks/java/>Languages</a></li><li><a href=/documentation/runners/capability-matrix/>RUNNERS</a></li><li><a href=/roadmap/>Roadmap</a></li>< [...] +<a href=/ class=navbar-brand><img alt=Brand style=height:25px src=/images/beam_logo_navbar.png></a></div><div class="navbar-mask closed"></div><div id=navbar class="navbar-container closed"><ul class="nav navbar-nav"><li><a href=/get-started/beam-overview/>Get Started</a></li><li><a href=/documentation/>Documentation</a></li><li><a href=/documentation/sdks/java/>Languages</a></li><li><a href=/documentation/runners/capability-matrix/>RUNNERS</a></li><li><a href=/roadmap/>Roadmap</a></li>< [...] (<a href=https://issues.apache.org/jira/browse/BEAM-4293>BEAM-4293</a>).</p></blockquote><p>Every time a user plays an instance of our hypothetical mobile game, they generate a data event. Each data event consists of the following information:</p><ul><li>The unique ID of the user playing the game.</li><li>The team ID for the team to which the user belongs.</li><li>A score value for that particular instance of play.</li><li>A timestamp that records when the particular instance of play hap [...] occurred. The Y-axis represents processing time: the time at which a game event was processed. Ideally, events should be processed as they occur, depicted by diff --git a/website/generated-content/get-started/quickstart-go/index.html b/website/generated-content/get-started/quickstart-go/index.html index 24c4e46..9e191c8 100644 --- a/website/generated-content/get-started/quickstart-go/index.html +++ b/website/generated-content/get-started/quickstart-go/index.html @@ -1,7 +1,7 @@ <!doctype html><html lang=en class=no-js><head><meta charset=utf-8><meta http-equiv=x-ua-compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1"><title>Beam Quickstart for Go</title><meta name=description content="Apache Beam is an open source, unified model and set of language-specific SDKs for defining and executing data processing workflows, and also data ingestion and integration flows, supporting Enterprise Integration Patterns (EIPs) and Domain [...] <span class=sr-only>Toggle navigation</span> <span class=icon-bar></span><span class=icon-bar></span><span class=icon-bar></span></button> -<a href=/ class=navbar-brand><img alt=Brand style=height:25px src=/images/beam_logo_navbar.png></a></div><div class="navbar-mask closed"></div><div id=navbar class="navbar-container closed"><ul class="nav navbar-nav"><li><a href=/get-started/beam-overview/>Get Started</a></li><li><a href=/documentation/>Documentation</a></li><li><a href=/documentation/sdks/java/>Languages</a></li><li><a href=/documentation/runners/capability-matrix/>RUNNERS</a></li><li><a href=/roadmap/>Roadmap</a></li>< [...] +<a href=/ class=navbar-brand><img alt=Brand style=height:25px src=/images/beam_logo_navbar.png></a></div><div class="navbar-mask closed"></div><div id=navbar class="navbar-container closed"><ul class="nav navbar-nav"><li><a href=/get-started/beam-overview/>Get Started</a></li><li><a href=/documentation/>Documentation</a></li><li><a href=/documentation/sdks/java/>Languages</a></li><li><a href=/documentation/runners/capability-matrix/>RUNNERS</a></li><li><a href=/roadmap/>Roadmap</a></li>< [...] <a href=https://github.com/apache/beam/tree/master/sdks/go/examples>examples</a> directory has many examples. All examples can be run by passing the required arguments described in the examples.</p><p>For example, to run <code>wordcount</code>, run:</p><div class=runner-direct><pre><code>$ go install github.com/apache/beam/sdks/go/examples/wordcount diff --git a/website/generated-content/get-started/quickstart-java/index.html b/website/generated-content/get-started/quickstart-java/index.html index 637ebec..1324f64 100644 --- a/website/generated-content/get-started/quickstart-java/index.html +++ b/website/generated-content/get-started/quickstart-java/index.html @@ -1,7 +1,7 @@ <!doctype html><html lang=en class=no-js><head><meta charset=utf-8><meta http-equiv=x-ua-compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1"><title>Beam Quickstart for Java</title><meta name=description content="Apache Beam is an open source, unified model and set of language-specific SDKs for defining and executing data processing workflows, and also data ingestion and integration flows, supporting Enterprise Integration Patterns (EIPs) and Doma [...] <span class=sr-only>Toggle navigation</span> <span class=icon-bar></span><span class=icon-bar></span><span class=icon-bar></span></button> -<a href=/ class=navbar-brand><img alt=Brand style=height:25px src=/images/beam_logo_navbar.png></a></div><div class="navbar-mask closed"></div><div id=navbar class="navbar-container closed"><ul class="nav navbar-nav"><li><a href=/get-started/beam-overview/>Get Started</a></li><li><a href=/documentation/>Documentation</a></li><li><a href=/documentation/sdks/java/>Languages</a></li><li><a href=/documentation/runners/capability-matrix/>RUNNERS</a></li><li><a href=/roadmap/>Roadmap</a></li>< [...] +<a href=/ class=navbar-brand><img alt=Brand style=height:25px src=/images/beam_logo_navbar.png></a></div><div class="navbar-mask closed"></div><div id=navbar class="navbar-container closed"><ul class="nav navbar-nav"><li><a href=/get-started/beam-overview/>Get Started</a></li><li><a href=/documentation/>Documentation</a></li><li><a href=/documentation/sdks/java/>Languages</a></li><li><a href=/documentation/runners/capability-matrix/>RUNNERS</a></li><li><a href=/roadmap/>Roadmap</a></li>< [...] -DarchetypeGroupId=org.apache.beam \ -DarchetypeArtifactId=beam-sdks-java-maven-archetypes-examples \ -DarchetypeVersion=2.26.0 \ diff --git a/website/generated-content/get-started/quickstart-py/index.html b/website/generated-content/get-started/quickstart-py/index.html index 202011e..940e015 100644 --- a/website/generated-content/get-started/quickstart-py/index.html +++ b/website/generated-content/get-started/quickstart-py/index.html @@ -1,7 +1,7 @@ <!doctype html><html lang=en class=no-js><head><meta charset=utf-8><meta http-equiv=x-ua-compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1"><title>Beam Quickstart for Python</title><meta name=description content="Apache Beam is an open source, unified model and set of language-specific SDKs for defining and executing data processing workflows, and also data ingestion and integration flows, supporting Enterprise Integration Patterns (EIPs) and Do [...] <span class=sr-only>Toggle navigation</span> <span class=icon-bar></span><span class=icon-bar></span><span class=icon-bar></span></button> -<a href=/ class=navbar-brand><img alt=Brand style=height:25px src=/images/beam_logo_navbar.png></a></div><div class="navbar-mask closed"></div><div id=navbar class="navbar-container closed"><ul class="nav navbar-nav"><li><a href=/get-started/beam-overview/>Get Started</a></li><li><a href=/documentation/>Documentation</a></li><li><a href=/documentation/sdks/java/>Languages</a></li><li><a href=/documentation/runners/capability-matrix/>RUNNERS</a></li><li><a href=/roadmap/>Roadmap</a></li>< [...] +<a href=/ class=navbar-brand><img alt=Brand style=height:25px src=/images/beam_logo_navbar.png></a></div><div class="navbar-mask closed"></div><div id=navbar class="navbar-container closed"><ul class="nav navbar-nav"><li><a href=/get-started/beam-overview/>Get Started</a></li><li><a href=/documentation/>Documentation</a></li><li><a href=/documentation/sdks/java/>Languages</a></li><li><a href=/documentation/runners/capability-matrix/>RUNNERS</a></li><li><a href=/roadmap/>Roadmap</a></li>< [...] install it. This command might require administrative privileges.</p><div class=shell-unix><pre><code>pip install --upgrade pip</code></pre></div><div class=shell-PowerShell><pre><code>PS> python -m pip install --upgrade pip</code></pre></div><h3 id=install-python-virtual-environment>Install Python virtual environment</h3><p>It is recommended that you install a <a href=https://docs.python-guide.org/en/latest/dev/virtualenvs/>Python virtual environment</a> for initial experiments. If you do not have <code>virtualenv</code> version 13.1.0 or newer, run the following command to install it. This command might require diff --git a/website/generated-content/get-started/try-apache-beam/index.html b/website/generated-content/get-started/try-apache-beam/index.html index e43e84a..f5f6b05 100644 --- a/website/generated-content/get-started/try-apache-beam/index.html +++ b/website/generated-content/get-started/try-apache-beam/index.html @@ -1,7 +1,7 @@ <!doctype html><html lang=en class=no-js><head><meta charset=utf-8><meta http-equiv=x-ua-compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1"><title>Try Apache Beam</title><meta name=description content="Apache Beam is an open source, unified model and set of language-specific SDKs for defining and executing data processing workflows, and also data ingestion and integration flows, supporting Enterprise Integration Patterns (EIPs) and Domain Specif [...] <span class=sr-only>Toggle navigation</span> <span class=icon-bar></span><span class=icon-bar></span><span class=icon-bar></span></button> -<a href=/ class=navbar-brand><img alt=Brand style=height:25px src=/images/beam_logo_navbar.png></a></div><div class="navbar-mask closed"></div><div id=navbar class="navbar-container closed"><ul class="nav navbar-nav"><li><a href=/get-started/beam-overview/>Get Started</a></li><li><a href=/documentation/>Documentation</a></li><li><a href=/documentation/sdks/java/>Languages</a></li><li><a href=/documentation/runners/capability-matrix/>RUNNERS</a></li><li><a href=/roadmap/>Roadmap</a></li>< [...] +<a href=/ class=navbar-brand><img alt=Brand style=height:25px src=/images/beam_logo_navbar.png></a></div><div class="navbar-mask closed"></div><div id=navbar class="navbar-container closed"><ul class="nav navbar-nav"><li><a href=/get-started/beam-overview/>Get Started</a></li><li><a href=/documentation/>Documentation</a></li><li><a href=/documentation/sdks/java/>Languages</a></li><li><a href=/documentation/runners/capability-matrix/>RUNNERS</a></li><li><a href=/roadmap/>Roadmap</a></li>< [...] <span class=kn>import</span> <span class=nn>org.apache.beam.sdk.Pipeline</span><span class=o>;</span> <span class=kn>import</span> <span class=nn>org.apache.beam.sdk.io.TextIO</span><span class=o>;</span> diff --git a/website/generated-content/get-started/wordcount-example/index.html b/website/generated-content/get-started/wordcount-example/index.html index 73abcc8..a92833a 100644 --- a/website/generated-content/get-started/wordcount-example/index.html +++ b/website/generated-content/get-started/wordcount-example/index.html @@ -1,7 +1,7 @@ <!doctype html><html lang=en class=no-js><head><meta charset=utf-8><meta http-equiv=x-ua-compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1"><title>Beam WordCount Examples</title><meta name=description content="Apache Beam is an open source, unified model and set of language-specific SDKs for defining and executing data processing workflows, and also data ingestion and integration flows, supporting Enterprise Integration Patterns (EIPs) and Domai [...] <span class=sr-only>Toggle navigation</span> <span class=icon-bar></span><span class=icon-bar></span><span class=icon-bar></span></button> -<a href=/ class=navbar-brand><img alt=Brand style=height:25px src=/images/beam_logo_navbar.png></a></div><div class="navbar-mask closed"></div><div id=navbar class="navbar-container closed"><ul class="nav navbar-nav"><li><a href=/get-started/beam-overview/>Get Started</a></li><li><a href=/documentation/>Documentation</a></li><li><a href=/documentation/sdks/java/>Languages</a></li><li><a href=/documentation/runners/capability-matrix/>RUNNERS</a></li><li><a href=/roadmap/>Roadmap</a></li>< [...] +<a href=/ class=navbar-brand><img alt=Brand style=height:25px src=/images/beam_logo_navbar.png></a></div><div class="navbar-mask closed"></div><div id=navbar class="navbar-container closed"><ul class="nav navbar-nav"><li><a href=/get-started/beam-overview/>Get Started</a></li><li><a href=/documentation/>Documentation</a></li><li><a href=/documentation/sdks/java/>Languages</a></li><li><a href=/documentation/runners/capability-matrix/>RUNNERS</a></li><li><a href=/roadmap/>Roadmap</a></li>< [...] read text, tokenize the text lines into individual words, and perform a frequency count on each of those words. The Beam SDKs contain a series of these four successively more detailed WordCount examples that build on each other. The diff --git a/website/generated-content/security/cve-2020-1929/index.html b/website/generated-content/security/cve-2020-1929/index.html index 9946985..f469e20 100644 --- a/website/generated-content/security/cve-2020-1929/index.html +++ b/website/generated-content/security/cve-2020-1929/index.html @@ -1,7 +1,7 @@ <!doctype html><html lang=en class=no-js><head><meta charset=utf-8><meta http-equiv=x-ua-compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1"><title>CVE-2020-1929</title><meta name=description content="Apache Beam is an open source, unified model and set of language-specific SDKs for defining and executing data processing workflows, and also data ingestion and integration flows, supporting Enterprise Integration Patterns (EIPs) and Domain Specific [...] <span class=sr-only>Toggle navigation</span> <span class=icon-bar></span><span class=icon-bar></span><span class=icon-bar></span></button> -<a href=/ class=navbar-brand><img alt=Brand style=height:25px src=/images/beam_logo_navbar.png></a></div><div class="navbar-mask closed"></div><div id=navbar class="navbar-container closed"><ul class="nav navbar-nav"><li><a href=/get-started/beam-overview/>Get Started</a></li><li><a href=/documentation/>Documentation</a></li><li><a href=/documentation/sdks/java/>Languages</a></li><li><a href=/documentation/runners/capability-matrix/>RUNNERS</a></li><li><a href=/roadmap/>Roadmap</a></li>< [...] +<a href=/ class=navbar-brand><img alt=Brand style=height:25px src=/images/beam_logo_navbar.png></a></div><div class="navbar-mask closed"></div><div id=navbar class="navbar-container closed"><ul class="nav navbar-nav"><li><a href=/get-started/beam-overview/>Get Started</a></li><li><a href=/documentation/>Documentation</a></li><li><a href=/documentation/sdks/java/>Languages</a></li><li><a href=/documentation/runners/capability-matrix/>RUNNERS</a></li><li><a href=/roadmap/>Roadmap</a></li>< [...] <a href=http://www.apache.org>The Apache Software Foundation</a> | <a href=/privacy_policy>Privacy Policy</a> | <a href=/feed.xml>RSS Feed</a><br><br>Apache Beam, Apache, Beam, the Beam logo, and the Apache feather logo are either registered trademarks or trademarks of The Apache Software Foundation. All other products or name brands are trademarks of their respective holders, including The Apache Software Foundation.</div></footer></body></html> \ No newline at end of file diff --git a/website/generated-content/security/index.html b/website/generated-content/security/index.html index 45326e9..b34e8ec 100644 --- a/website/generated-content/security/index.html +++ b/website/generated-content/security/index.html @@ -1,7 +1,7 @@ <!doctype html><html lang=en class=no-js><head><meta charset=utf-8><meta http-equiv=x-ua-compatible content="IE=edge"><meta name=viewport content="width=device-width,initial-scale=1"><title>Beam Security</title><meta name=description content="Apache Beam is an open source, unified model and set of language-specific SDKs for defining and executing data processing workflows, and also data ingestion and integration flows, supporting Enterprise Integration Patterns (EIPs) and Domain Specific [...] <span class=sr-only>Toggle navigation</span> <span class=icon-bar></span><span class=icon-bar></span><span class=icon-bar></span></button> -<a href=/ class=navbar-brand><img alt=Brand style=height:25px src=/images/beam_logo_navbar.png></a></div><div class="navbar-mask closed"></div><div id=navbar class="navbar-container closed"><ul class="nav navbar-nav"><li><a href=/get-started/beam-overview/>Get Started</a></li><li><a href=/documentation/>Documentation</a></li><li><a href=/documentation/sdks/java/>Languages</a></li><li><a href=/documentation/runners/capability-matrix/>RUNNERS</a></li><li><a href=/roadmap/>Roadmap</a></li>< [...] +<a href=/ class=navbar-brand><img alt=Brand style=height:25px src=/images/beam_logo_navbar.png></a></div><div class="navbar-mask closed"></div><div id=navbar class="navbar-container closed"><ul class="nav navbar-nav"><li><a href=/get-started/beam-overview/>Get Started</a></li><li><a href=/documentation/>Documentation</a></li><li><a href=/documentation/sdks/java/>Languages</a></li><li><a href=/documentation/runners/capability-matrix/>RUNNERS</a></li><li><a href=/roadmap/>Roadmap</a></li>< [...] Team</a> for reporting vulnerabilities. Note that vulnerabilities should not be publicly disclosed until the project has responded.</p><p>To report a possible security vulnerability, please email diff --git a/website/generated-content/sitemap.xml b/website/generated-content/sitemap.xml index 32b57c8..2c94b99 100644 --- a/website/generated-content/sitemap.xml +++ b/website/generated-content/sitemap.xml @@ -1 +1 @@ -<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>/categories/blog/</loc><lastmod>2020-12-23T09:07:16-08:00</lastmod></url><url><loc>/blog/</loc><lastmod>2020-12-23T09:07:16-08:00</lastmod></url><url><loc>/categories/</loc><lastmod>2020-12-23T09:07:16-08:00</lastmod></url><url><loc>/blog/dataframe-api-preview-available/</loc><lastmod>2020-12-17T16:58:23-08:00</lastmod></u [...] \ No newline at end of file +<?xml version="1.0" encoding="utf-8" standalone="yes"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:xhtml="http://www.w3.org/1999/xhtml"><url><loc>/categories/blog/</loc><lastmod>2020-12-23T09:07:16-08:00</lastmod></url><url><loc>/blog/</loc><lastmod>2020-12-23T09:07:16-08:00</lastmod></url><url><loc>/categories/</loc><lastmod>2020-12-23T09:07:16-08:00</lastmod></url><url><loc>/blog/dataframe-api-preview-available/</loc><lastmod>2020-12-17T16:58:23-08:00</lastmod></u [...] \ No newline at end of file