This is an automated email from the ASF dual-hosted git repository. git-site-role pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/beam.git
The following commit(s) were added to refs/heads/asf-site by this push: new 2e735d7 Publishing website 2021/05/25 00:01:42 at commit 53ba247 2e735d7 is described below commit 2e735d7e6744ea354b918ec339b3e8fc6a41dbd7 Author: jenkins <bui...@apache.org> AuthorDate: Tue May 25 00:01:43 2021 +0000 Publishing website 2021/05/25 00:01:42 at commit 53ba247 --- website/generated-content/documentation/index.xml | 70 ++++++++++------------ .../io/built-in/google-bigquery/index.html | 12 ++-- .../io/developing-io-python/index.html | 12 ++-- .../documentation/patterns/ai-platform/index.html | 4 +- .../documentation/patterns/bigqueryio/index.html | 4 +- .../patterns/file-processing/index.html | 4 +- .../patterns/pipeline-options/index.html | 14 ++--- .../documentation/patterns/side-inputs/index.html | 7 +-- .../documentation/programming-guide/index.html | 52 ++++++++-------- website/generated-content/get-started/index.xml | 49 ++++++++------- .../get-started/wordcount-example/index.html | 70 ++++++++++++---------- 11 files changed, 149 insertions(+), 149 deletions(-) diff --git a/website/generated-content/documentation/index.xml b/website/generated-content/documentation/index.xml index cded920..c6067cd 100644 --- a/website/generated-content/documentation/index.xml +++ b/website/generated-content/documentation/index.xml @@ -35,9 +35,9 @@ limitations under the License. <span class="n">extract_entity_sentiment</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">extract_syntax</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="p">)</span> -<span class="k">with</span> <span class="n">beam</span><span class="o">.</span><span class="n">Pipeline</span><span class="p">()</span> <span class="k">as</span> <span class="n">p</span><span class="p">:</span> +<span class="k">with</span> <span class="n">beam</span><span class="o">.</span><span class="n">Pipeline</span><span class="p">()</span> <span class="k">as</span> <span class="n">pipeline</span><span class="p">:</span> <span class="n">responses</span> <span class="o">=</span> <span class="p">(</span> -<span class="n">p</span> +<span class="n">pipeline</span> <span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">Create</span><span class="p">([</span> <span class="s1">&#39;My experience so far has been fantastic! &#39;</span> <span class="s1">&#39;I</span><span class="se">\&#39;</span><span class="s1">d really recommend this product.&#39;</span> @@ -906,8 +906,8 @@ bundle_start = bundle_stop</code></pre> <a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard"> <img src="/images/copy-icon.svg"/> </a> -<pre><code>with beam.Pipeline(options=PipelineOptions()) as p: -numbers = p | &#39;ProduceNumbers&#39; &gt;&gt; beam.io.Read(CountingSource(count))</code></pre> +<pre><code>with beam.Pipeline() as pipeline: +numbers = pipeline | &#39;ProduceNumbers&#39; &gt;&gt; beam.io.Read(CountingSource(count))</code></pre> </div> </div> <p><strong>Note:</strong> When you create a source that end-users are going to use, we @@ -990,8 +990,8 @@ return pcoll | iobase.Read(_CountingSource(self._count))</code></pre> <a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard"> <img src="/images/copy-icon.svg"/> </a> -<pre><code>with beam.Pipeline(options=PipelineOptions()) as p: -numbers = p | &#39;ProduceNumbers&#39; &gt;&gt; ReadFromCountingSource(count)</code></pre> +<pre><code>with beam.Pipeline() as pipeline: +numbers = pipeline | &#39;ProduceNumbers&#39; &gt;&gt; ReadFromCountingSource(count)</code></pre> </div> </div> <p>For the sink, rename <code>SimpleKVSink</code> to <code>_SimpleKVSink</code>. Then, create the wrapper <code>PTransform</code>, called <code>WriteToKVSink</code>:</p> @@ -1017,8 +1017,8 @@ _SimpleKVSink(self._simplekv, self._url, self._final_table_name))</code></ <a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard"> <img src="/images/copy-icon.svg"/> </a> -<pre><code>with beam.Pipeline(options=PipelineOptions()) as p: -kvs = p | &#39;CreateKVs&#39; &gt;&gt; beam.core.Create(KVs) +<pre><code>with beam.Pipeline(options=PipelineOptions()) as pipeline: +kvs = pipeline | &#39;CreateKVs&#39; &gt;&gt; beam.core.Create(KVs) kvs | &#39;WriteToSimpleKV&#39; &gt;&gt; WriteToKVSink( simplekv, &#39;http://url_to_simple_kv/&#39;, final_table_name)</code></pre> </div> @@ -3797,8 +3797,7 @@ to the <code>Pipeline</code> object when you create the object.</p> <img src="/images/copy-icon.svg"/> </a> <div class="highlight"><pre class="chroma"><code class="language-py" data-lang="py"><span class="kn">import</span> <span class="nn">apache_beam</span> <span class="kn">as</span> <span class="nn">beam</span> -<span class="kn">from</span> <span class="nn">apache_beam.options.pipeline_options</span> <span class="kn">import</span> <span class="n">PipelineOptions</span> -<span class="k">with</span> <span class="n">beam</span><span class="o">.</span><span class="n">Pipeline</span><span class="p">(</span><span class="n">options</span><span class="o">=</span><span class="n">PipelineOptions</span><span class="p">())</span> <span class="k">as</span> <span class="n">p</span><span class="p">:</span> +<span class="k">with</span> <span class="n">beam</span><span class="o">.</span><span class="n">Pipeline</span><span class="p">()</span> <span class="k">as</span> <span class="n">pipeline</span><span class="p">:</span> <span class="k">pass</span> <span class="c1"># build your pipeline here</span></code></pre></div> </div> </div> @@ -3841,8 +3840,7 @@ as demonstrated in the following example code:</p> <img src="/images/copy-icon.svg"/> </a> <div class="highlight"><pre class="chroma"><code class="language-py" data-lang="py"><span class="kn">import</span> <span class="nn">apache_beam</span> <span class="kn">as</span> <span class="nn">beam</span> -<span class="kn">from</span> <span class="nn">apache_beam.options.pipeline_options</span> <span class="kn">import</span> <span class="n">PipelineOptions</span> -<span class="k">with</span> <span class="n">beam</span><span class="o">.</span><span class="n">Pipeline</span><span class="p">(</span><span class="n">options</span><span class="o">=</span><span class="n">PipelineOptions</span><span class="p">())</span> <span class="k">as</span> <span class="n">p</span><span class="p">:</span> +<span class="k">with</span> <span class="n">beam</span><span class="o">.</span><span class="n">Pipeline</span><span class="p">()</span> <span class="k">as</span> <span class="n">pipeline</span><span class="p">:</span> <span class="k">pass</span> <span class="c1"># build your pipeline here</span></code></pre></div> </div> </div> @@ -3895,8 +3893,8 @@ adding <code>input</code> and <code>output</code> custom options:< <span class="k">class</span> <span class="nc">MyOptions</span><span class="p">(</span><span class="n">PipelineOptions</span><span class="p">):</span> <span class="nd">@classmethod</span> <span class="k">def</span> <span class="nf">_add_argparse_args</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">parser</span><span class="p">):</span> -<span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s1">&#39;--input&#39;</span><span class="p">)</span> -<span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s1">&#39;--output&#39;</span><span class="p">)</span></code></pre></div> +<span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s1">&#39;--input-file&#39;</span><span class="p">)</span> +<span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span><span class="s1">&#39;--output-path&#39;</span><span class="p">)</span></code></pre></div> </div> </div> <div class='language-go snippet'> @@ -3940,13 +3938,13 @@ a command-line argument, and a default value.</p> <span class="nd">@classmethod</span> <span class="k">def</span> <span class="nf">_add_argparse_args</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">parser</span><span class="p">):</span> <span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span> -<span class="s1">&#39;--input&#39;</span><span class="p">,</span> -<span class="n">help</span><span class="o">=</span><span class="s1">&#39;Input for the pipeline&#39;</span><span class="p">,</span> -<span class="n">default</span><span class="o">=</span><span class="s1">&#39;gs://my-bucket/input&#39;</span><span class="p">)</span> +<span class="s1">&#39;--input-file&#39;</span><span class="p">,</span> +<span class="n">default</span><span class="o">=</span><span class="s1">&#39;gs://dataflow-samples/shakespeare/kinglear.txt&#39;</span><span class="p">,</span> +<span class="n">help</span><span class="o">=</span><span class="s1">&#39;The file path for the input text to process.&#39;</span><span class="p">)</span> <span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span> -<span class="s1">&#39;--output&#39;</span><span class="p">,</span> -<span class="n">help</span><span class="o">=</span><span class="s1">&#39;Output for the pipeline&#39;</span><span class="p">,</span> -<span class="n">default</span><span class="o">=</span><span class="s1">&#39;gs://my-bucket/output&#39;</span><span class="p">)</span></code></pre></div> +<span class="s1">&#39;--output-path&#39;</span><span class="p">,</span> +<span class="n">required</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> +<span class="n">help</span><span class="o">=</span><span class="s1">&#39;The path prefix for output files.&#39;</span><span class="p">)</span></code></pre></div> </div> </div> <div class='language-go snippet'> @@ -4033,7 +4031,7 @@ a <code>PCollection</code>:</p> <a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard"> <img src="/images/copy-icon.svg"/> </a> -<div class="highlight"><pre class="chroma"><code class="language-py" data-lang="py"><span class="n">lines</span> <span class="o">=</span> <span class="n">p</span> <span class="o">|</span> <span class="s1">&#39;ReadMyFile&#39;</span> <span class="o">&gt;&gt;</span> <span class="n">beam</span><span class="o">.</span><span class="n">io</span><span class="o">.</span><span class="n">ReadFromText</span>& [...] +<div class="highlight"><pre class="chroma"><code class="language-py" data-lang="py"><span class="n">lines</span> <span class="o">=</span> <span class="n">pipeline</span> <span class="o">|</span> <span class="s1">&#39;ReadMyFile&#39;</span> <span class="o">&gt;&gt;</span> <span class="n">beam</span><span class="o">.</span><span class="n">io</span><span class="o">.</span><span class="n">ReadFromText< [...] <span class="s1">&#39;gs://some/inputData.txt&#39;</span><span class="p">)</span></code></pre></div> </div> </div> @@ -4086,10 +4084,7 @@ itself.</p> <img src="/images/copy-icon.svg"/> </a> <div class="highlight"><pre class="chroma"><code class="language-py" data-lang="py"><span class="kn">import</span> <span class="nn">apache_beam</span> <span class="kn">as</span> <span class="nn">beam</span> -<span class="kn">from</span> <span class="nn">apache_beam.options.pipeline_options</span> <span class="kn">import</span> <span class="n">PipelineOptions</span> -<span class="c1"># argv = None # if None, uses sys.argv</span> -<span class="n">pipeline_options</span> <span class="o">=</span> <span class="n">PipelineOptions</span><span class="p">(</span><span class="n">argv</span><span class="p">)</span> -<span class="k">with</span> <span class="n">beam</span><span class="o">.</span><span class="n">Pipeline</span><span class="p">(</span><span class="n">options</span><span class="o">=</span><span class="n">pipeline_options</span><span class="p">)</span> <span class="k">as</span> <span class="n">pipeline</span><span class="p">:</span> +<span class="k">with</span> <span class="n">beam</span><span class="o">.</span><span class="n">Pipeline</span><span class="p">()</span> <span class="k">as</span> <span class="n">pipeline</span><span class="p">:</span> <span class="n">lines</span> <span class="o">=</span> <span class="p">(</span> <span class="n">pipeline</span> <span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">Create</span><span class="p">([</span> @@ -5738,8 +5733,8 @@ appeared in the original data.</li> <img src="/images/copy-icon.svg"/> </a> <div class="highlight"><pre class="chroma"><code class="language-py" data-lang="py"><span class="c1"># The CountWords Composite Transform inside the WordCount pipeline.</span> -<span class="k">class</span> <span class="nc">CountWords</span><span class="p">(</span><span class="n">beam</span><span class="o">.</span><span class="n">PTransform</span><span class="p">):</span> -<span class="k">def</span> <span class="nf">expand</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">pcoll</span><span class="p">):</span> +<span class="nd">@beam.ptransform_fn</span> +<span class="k">def</span> <span class="nf">CountWords</span><span class="p">(</span><span class="n">pcoll</span><span class="p">):</span> <span class="k">return</span> <span class="p">(</span> <span class="n">pcoll</span> <span class="c1"># Convert lines of text into individual words.</span> @@ -5909,7 +5904,8 @@ suffix &ldquo;.csv&rdquo; in the given location:</p> <a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard"> <img src="/images/copy-icon.svg"/> </a> -<div class="highlight"><pre class="chroma"><code class="language-py" data-lang="py"><span class="n">lines</span> <span class="o">=</span> <span class="n">p</span> <span class="o">|</span> <span class="s1">&#39;ReadFromText&#39;</span> <span class="o">&gt;&gt;</span> <span class="n">beam</span><span class="o">.</span><span class="n">io</span><span class="o">.</span><span class="n">ReadFromText</span [...] +<div class="highlight"><pre class="chroma"><code class="language-py" data-lang="py"><span class="n">lines</span> <span class="o">=</span> <span class="n">pipeline</span> <span class="o">|</span> <span class="s1">&#39;ReadFromText&#39;</span> <span class="o">&gt;&gt;</span> <span class="n">beam</span><span class="o">.</span><span class="n">io</span><span class="o">.</span><span class="n">ReadFromText&l [...] +<span class="s1">&#39;path/to/input-*.csv&#39;</span><span class="p">)</span></code></pre></div> </div> </div> <p>To read data from disparate sources into a single <code>PCollection</code>, read each one @@ -10278,9 +10274,9 @@ If you also set the <code>withExtendedErrorInfo</code> property , you will </a> <div class="highlight"><pre class="chroma"><code class="language-py" data-lang="py"> <span class="c1"># Create pipeline.</span> <span class="n">schema</span> <span class="o">=</span> <span class="p">({</span><span class="s1">&#39;fields&#39;</span><span class="p">:</span> <span class="p">[{</span><span class="s1">&#39;name&#39;</span><span class="p">:</span> <span class="s1">&#39;a&#39;</span><span class="p">,</span> <span class="s1">&#39;type&#39;</span><span class="p">:</span> <span class="s1">&#39;STR [...] -<span class="n">p</span> <span class="o">=</span> <span class="n">beam</span><span class="o">.</span><span class="n">Pipeline</span><span class="p">()</span> +<span class="n">pipeline</span> <span class="o">=</span> <span class="n">beam</span><span class="o">.</span><span class="n">Pipeline</span><span class="p">()</span> <span class="n">errors</span> <span class="o">=</span> <span class="p">(</span> -<span class="n">p</span> <span class="o">|</span> <span class="s1">&#39;Data&#39;</span> <span class="o">&gt;&gt;</span> <span class="n">beam</span><span class="o">.</span><span class="n">Create</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">])</span> +<span class="n">pipeline</span> <span class="o">|</span> <span class="s1">&#39;Data&#39;</span> <span class="o">&gt;&gt;</span> <span class="n">beam</span><span class="o">.</span><span class="n">Create</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">])</span> <span class="o">|</span> <span class="s1">&#39;CreateBrokenData&#39;</span> <span class="o">&gt;&gt;</span> <span class="n">beam</span><span class="o">.</span><span class="n">Map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">src</span><span class="p">:</span> <span class="p">{</span><span class="s1">&#39;a&#39;</span><span class="p">:</span> <span class="n">src</span><span class="p">}</span> <span class="k">if</span> <span class="n">src</span> <span class="o">==</span [...] <span class="o">|</span> <span class="s1">&#39;WriteToBigQuery&#39;</span> <span class="o">&gt;&gt;</span> <span class="n">beam</span><span class="o">.</span><span class="n">io</span><span class="o">.</span><span class="n">WriteToBigQuery</span><span class="p">(</span> @@ -13653,9 +13649,9 @@ limitations under the License. <a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard"> <img src="/images/copy-icon.svg"/> </a> -<div class="highlight"><pre class="chroma"><code class="language-py" data-lang="py"><span class="k">with</span> <span class="n">beam</span><span class="o">.</span><span class="n">Pipeline</span><span class="p">()</span> <span class="k">as</span> <span class="n">p</span><span class="p">:</span> +<div class="highlight"><pre class="chroma"><code class="language-py" data-lang="py"><span class="k">with</span> <span class="n">beam</span><span class="o">.</span><span class="n">Pipeline</span><span class="p">()</span> <span class="k">as</span> <span class="n">pipeline</span><span class="p">:</span> <span class="n">readable_files</span> <span class="o">=</span> <span class="p">(</span> -<span class="n">p</span> +<span class="n">pipeline</span> <span class="o">|</span> <span class="n">fileio</span><span class="o">.</span><span class="n">MatchFiles</span><span class="p">(</span><span class="s1">&#39;hdfs://path/to/*.txt&#39;</span><span class="p">)</span> <span class="o">|</span> <span class="n">fileio</span><span class="o">.</span><span class="n">ReadMatches</span><span class="p">()</span> <span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">Reshuffle</span><span class="p">())</span> @@ -15106,7 +15102,7 @@ then extracts the <code>max_temperature</code> column.</p> <img src="/images/copy-icon.svg"/> </a> <div class="highlight"><pre class="chroma"><code class="language-py" data-lang="py"><span class="n">max_temperatures</span> <span class="o">=</span> <span class="p">(</span> -<span class="n">p</span> +<span class="n">pipeline</span> <span class="o">|</span> <span class="s1">&#39;ReadTable&#39;</span> <span class="o">&gt;&gt;</span> <span class="n">beam</span><span class="o">.</span><span class="n">io</span><span class="o">.</span><span class="n">ReadFromBigQuery</span><span class="p">(</span><span class="n">table</span><span class="o">=</span><span class="n">table_spec</span><span class="p">)</span> <span class="c1"># Each row is a dictionary where the keys are the BigQuery columns</span> <span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">Map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">elem</span><span class="p">:</span> <span class="n">elem</span><span class="p">[</span><span class="s1">&#39;max_temperature&#39;</span><span class="p">]))</span></code></pre></div> @@ -15159,7 +15155,7 @@ the <code>fromQuery</code> method.</p> <img src="/images/copy-icon.svg"/> </a> <div class="highlight"><pre class="chroma"><code class="language-py" data-lang="py"><span class="n">max_temperatures</span> <span class="o">=</span> <span class="p">(</span> -<span class="n">p</span> +<span class="n">pipeline</span> <span class="o">|</span> <span class="s1">&#39;QueryTable&#39;</span> <span class="o">&gt;&gt;</span> <span class="n">beam</span><span class="o">.</span><span class="n">io</span><span class="o">.</span><span class="n">ReadFromBigQuery</span><span class="p">(</span> <span class="n">query</span><span class="o">=</span><span class="s1">&#39;SELECT max_temperature FROM &#39;</span>\ <span class="s1">&#39;[clouddataflow-readonly:samples.weather_stations]&#39;</span><span class="p">)</span> @@ -15190,7 +15186,7 @@ in the following example:</p> <img src="/images/copy-icon.svg"/> </a> <div class="highlight"><pre class="chroma"><code class="language-py" data-lang="py"><span class="n">max_temperatures</span> <span class="o">=</span> <span class="p">(</span> -<span class="n">p</span> +<span class="n">pipeline</span> <span class="o">|</span> <span class="s1">&#39;QueryTableStdSQL&#39;</span> <span class="o">&gt;&gt;</span> <span class="n">beam</span><span class="o">.</span><span class="n">io</span><span class="o">.</span><span class="n">ReadFromBigQuery</span><span class="p">(</span> <span class="n">query</span><span class="o">=</span><span class="s1">&#39;SELECT max_temperature FROM &#39;</span>\ <span class="s1">&#39;`clouddataflow-readonly.samples.weather_stations`&#39;</span><span class="p">,</span> @@ -15765,7 +15761,7 @@ table already exists, it will be replaced.</p> <a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard"> <img src="/images/copy-icon.svg"/> </a> -<div class="highlight"><pre class="chroma"><code class="language-py" data-lang="py"><span class="n">quotes</span> <span class="o">=</span> <span class="n">p</span> <span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">Create</span><span class="p">([</span> +<div class="highlight"><pre class="chroma"><code class="language-py" data-lang="py"><span class="n">quotes</span> <span class="o">=</span> <span class="n">pipeline</span> <span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">Create</span><span class="p">([</span> <span class="p">{</span> <span class="s1">&#39;source&#39;</span><span class="p">:</span> <span class="s1">&#39;Mahatma Gandhi&#39;</span><span class="p">,</span> <span class="s1">&#39;quote&#39;</span><span class="p">:</span> <span class="s1">&#39;My life is my message.&#39;</span> <span class="p">},</span> @@ -15948,7 +15944,7 @@ different table for each year.</p> <img src="/images/copy-icon.svg"/> </a> <div class="highlight"><pre class="chroma"><code class="language-py" data-lang="py"><span class="n">fictional_characters_view</span> <span class="o">=</span> <span class="n">beam</span><span class="o">.</span><span class="n">pvalue</span><span class="o">.</span><span class="n">AsDict</span><span class="p">(</span> -<span class="n">p</span> <span class="o">|</span> <span class="s1">&#39;CreateCharacters&#39;</span> <span class="o">&gt;&gt;</span> <span class="n">beam</span><span class="o">.</span><span class="n">Create</span><span class="p">([(</span><span class="s1">&#39;Yoda&#39;</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> +<span class="n">pipeline</span> <span class="o">|</span> <span class="s1">&#39;CreateCharacters&#39;</span> <span class="o">&gt;&gt;</span> <span class="n">beam</span><span class="o">.</span><span class="n">Create</span><span class="p">([(</span><span class="s1">&#39;Yoda&#39;</span><span class="p">,</span> <span class="bp">True</span><span class="p">),</span> <span class="p">(</span><span class="s1">&#39;Obi Wan Kenobi&#39;</span><span class="p">,</span> <span class="bp">True</span><span class="p">)]))</span> <span class="k">def</span> <span class="nf">table_fn</span><span class="p">(</span><span class="n">element</span><span class="p">,</span> <span class="n">fictional_characters</span><span class="p">):</span> <span class="k">if</span> <span class="n">element</span> <span class="ow">in</span> <span class="n">fictional_characters</span><span class="p">:</span> diff --git a/website/generated-content/documentation/io/built-in/google-bigquery/index.html b/website/generated-content/documentation/io/built-in/google-bigquery/index.html index 5a82938..cdb1e35 100644 --- a/website/generated-content/documentation/io/built-in/google-bigquery/index.html +++ b/website/generated-content/documentation/io/built-in/google-bigquery/index.html @@ -206,7 +206,7 @@ then extracts the <code>max_temperature</code> column.</p><div class="language-j <span class=k>return</span> <span class=n>rows</span><span class=o>;</span> <span class=o>}</span> <span class=o>}</span></code></pre></div></div></div><div class="language-py snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-py data-lang=py><span class=n>max_temperatures</span> <span class=o>=</span> <span class=p>(</span> - <span class=n>p</span> + <span class=n>pipeline</span> <span class=o>|</span> <span class=s1>'ReadTable'</span> <span class=o>>></span> <span class=n>beam</span><span class=o>.</span><span class=n>io</span><span class=o>.</span><span class=n>ReadFromBigQuery</span><span class=p>(</span><span class=n>table</span><span class=o>=</span><span class=n>table_spec</span><span class=p>)</span> <span class=c1># Each row is a dictionary where the keys are the BigQuery columns</span> <span class=o>|</span> <span class=n>beam</span><span class=o>.</span><span class=n>Map</span><span class=p>(</span><span class=k>lambda</span> <span class=n>elem</span><span class=p>:</span> <span class=n>elem</span><span class=p>[</span><span class=s1>'max_temperature'</span><span class=p>]))</span></code></pre></div></div></div><h3 id=reading-with-a-query-string>Reading with a query string</h3><p class=language-java>If you don’t want to read an entire table, you can [...] @@ -242,7 +242,7 @@ the <code>fromQuery</code> method.</p><p class=language-py>If you don’t wa <span class=k>return</span> <span class=n>rows</span><span class=o>;</span> <span class=o>}</span> <span class=o>}</span></code></pre></div></div></div><div class="language-py snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-py data-lang=py><span class=n>max_temperatures</span> <span class=o>=</span> <span class=p>(</span> - <span class=n>p</span> + <span class=n>pipeline</span> <span class=o>|</span> <span class=s1>'QueryTable'</span> <span class=o>>></span> <span class=n>beam</span><span class=o>.</span><span class=n>io</span><span class=o>.</span><span class=n>ReadFromBigQuery</span><span class=p>(</span> <span class=n>query</span><span class=o>=</span><span class=s1>'SELECT max_temperature FROM '</span>\ <span class=s1>'[clouddataflow-readonly:samples.weather_stations]'</span><span class=p>)</span> @@ -256,7 +256,7 @@ in the following example:</p><div class="language-java snippet"><div class="note <span class=s>"SELECT max_temperature FROM `clouddataflow-readonly.samples.weather_stations`"</span><span class=o>)</span> <span class=o>.</span><span class=na>usingStandardSql</span><span class=o>()</span> <span class=o>.</span><span class=na>withCoder</span><span class=o>(</span><span class=n>DoubleCoder</span><span class=o>.</span><span class=na>of</span><span class=o>()));</span></code></pre></div></div></div><div class="language-py snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-py [...] - <span class=n>p</span> + <span class=n>pipeline</span> <span class=o>|</span> <span class=s1>'QueryTableStdSQL'</span> <span class=o>>></span> <span class=n>beam</span><span class=o>.</span><span class=n>io</span><span class=o>.</span><span class=n>ReadFromBigQuery</span><span class=p>(</span> <span class=n>query</span><span class=o>=</span><span class=s1>'SELECT max_temperature FROM '</span>\ <span class=s1>'`clouddataflow-readonly.samples.weather_stations`'</span><span class=p>,</span> @@ -584,7 +584,7 @@ table already exists, it will be replaced.</p><div class="language-java snippet" <span class=c1>// pipeline.run().waitUntilFinish(); </span><span class=c1></span> <span class=o>}</span> -<span class=o>}</span></code></pre></div></div></div><div class="language-py snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-py data-lang=py><span class=n>quotes</span> <span class=o>=</span> <span class=n>p</span> <span class=o>|</span> <span class=n>beam</span><span class=o>.</span><span class [...] +<span class=o>}</span></code></pre></div></div></div><div class="language-py snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-py data-lang=py><span class=n>quotes</span> <span class=o>=</span> <span class=n>pipeline</span> <span class=o>|</span> <span class=n>beam</span><span class=o>.</span><spa [...] <span class=p>{</span> <span class=s1>'source'</span><span class=p>:</span> <span class=s1>'Mahatma Gandhi'</span><span class=p>,</span> <span class=s1>'quote'</span><span class=p>:</span> <span class=s1>'My life is my message.'</span> <span class=p>},</span> @@ -715,8 +715,8 @@ different table for each year.</p><div class="language-java snippet"><div class= <span class=o>.</span><span class=na>set</span><span class=o>(</span><span class=s>"maxTemp"</span><span class=o>,</span> <span class=n>elem</span><span class=o>.</span><span class=na>maxTemp</span><span class=o>))</span> <span class=o>.</span><span class=na>withCreateDisposition</span><span class=o>(</span><span class=n>CreateDisposition</span><span class=o>.</span><span class=na>CREATE_IF_NEEDED</span><span class=o>)</span> <span class=o>.</span><span class=na>withWriteDisposition</span><span class=o>(</span><span class=n>WriteDisposition</span><span class=o>.</span><span class=na>WRITE_TRUNCATE</span><span class=o>));</span></code></pre></div></div></div><div class="language-py snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma>< [...] - <span class=n>p</span> <span class=o>|</span> <span class=s1>'CreateCharacters'</span> <span class=o>>></span> <span class=n>beam</span><span class=o>.</span><span class=n>Create</span><span class=p>([(</span><span class=s1>'Yoda'</span><span class=p>,</span> <span class=bp>True</span><span class=p>),</span> - <span class=p>(</span><span class=s1>'Obi Wan Kenobi'</span><span class=p>,</span> <span class=bp>True</span><span class=p>)]))</span> + <span class=n>pipeline</span> <span class=o>|</span> <span class=s1>'CreateCharacters'</span> <span class=o>>></span> <span class=n>beam</span><span class=o>.</span><span class=n>Create</span><span class=p>([(</span><span class=s1>'Yoda'</span><span class=p>,</span> <span class=bp>True</span><span class=p>),</span> + <span class=p>(</span><span class=s1>'Obi Wan Kenobi'</span><span class=p>,</span> <span class=bp>True</span><span class=p>)]))</span> <span class=k>def</span> <span class=nf>table_fn</span><span class=p>(</span><span class=n>element</span><span class=p>,</span> <span class=n>fictional_characters</span><span class=p>):</span> <span class=k>if</span> <span class=n>element</span> <span class=ow>in</span> <span class=n>fictional_characters</span><span class=p>:</span> diff --git a/website/generated-content/documentation/io/developing-io-python/index.html b/website/generated-content/documentation/io/developing-io-python/index.html index b88a75b..bae8cd8 100644 --- a/website/generated-content/documentation/io/developing-io-python/index.html +++ b/website/generated-content/documentation/io/developing-io-python/index.html @@ -90,8 +90,8 @@ a wrapper.</li></ul><p>You can find these classes in the source=self, start_position=bundle_start, stop_position=bundle_stop) - bundle_start = bundle_stop</code></pre></div></div><p>To read data from the source in your pipeline, use the <code>Read</code> transform:</p><div class=snippet><div class="notebook-skip code-snippet without_switcher"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><pre><code>with beam.Pipeline(options=PipelineOptions()) as p: - numbers = p | 'ProduceNumbers' >> beam.io.Read(CountingSource(count))</code></pre></div></div><p><strong>Note:</strong> When you create a source that end-users are going to use, we + bundle_start = bundle_stop</code></pre></div></div><p>To read data from the source in your pipeline, use the <code>Read</code> transform:</p><div class=snippet><div class="notebook-skip code-snippet without_switcher"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><pre><code>with beam.Pipeline() as pipeline: + numbers = pipeline | 'ProduceNumbers' >> beam.io.Read(CountingSource(count))</code></pre></div></div><p><strong>Note:</strong> When you create a source that end-users are going to use, we recommended that you do not expose the code for the source itself as demonstrated in the example above. Use a wrapping <code>PTransform</code> instead. <a href=#ptransform-wrappers>PTransform wrappers</a> discusses why you should avoid @@ -130,8 +130,8 @@ to <code>_CountingSource</code>. Then, create the wrapper <code>PTransform</code self._count = count def expand(self, pcoll): - return pcoll | iobase.Read(_CountingSource(self._count))</code></pre></div></div><p>Finally, read from the source:</p><div class=snippet><div class="notebook-skip code-snippet without_switcher"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><pre><code>with beam.Pipeline(options=PipelineOptions()) as p: - numbers = p | 'ProduceNumbers' >> ReadFromCountingSource(count)</code></pre></div></div><p>For the sink, rename <code>SimpleKVSink</code> to <code>_SimpleKVSink</code>. Then, create the wrapper <code>PTransform</code>, called <code>WriteToKVSink</code>:</p><div class=snippet><div class="notebook-skip code-snippet without_switcher"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><pre><co [...] + return pcoll | iobase.Read(_CountingSource(self._count))</code></pre></div></div><p>Finally, read from the source:</p><div class=snippet><div class="notebook-skip code-snippet without_switcher"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><pre><code>with beam.Pipeline() as pipeline: + numbers = pipeline | 'ProduceNumbers' >> ReadFromCountingSource(count)</code></pre></div></div><p>For the sink, rename <code>SimpleKVSink</code> to <code>_SimpleKVSink</code>. Then, create the wrapper <code>PTransform</code>, called <code>WriteToKVSink</code>:</p><div class=snippet><div class="notebook-skip code-snippet without_switcher"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a>< [...] def __init__(self, simplekv, url, final_table_name): self._simplekv = simplekv super(WriteToKVSink, self).__init__() @@ -140,8 +140,8 @@ to <code>_CountingSource</code>. Then, create the wrapper <code>PTransform</code def expand(self, pcoll): return pcoll | iobase.Write( - _SimpleKVSink(self._simplekv, self._url, self._final_table_name))</code></pre></div></div><p>Finally, write to the sink:</p><div class=snippet><div class="notebook-skip code-snippet without_switcher"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><pre><code>with beam.Pipeline(options=PipelineOptions()) as p: - kvs = p | 'CreateKVs' >> beam.core.Create(KVs) + _SimpleKVSink(self._simplekv, self._url, self._final_table_name))</code></pre></div></div><p>Finally, write to the sink:</p><div class=snippet><div class="notebook-skip code-snippet without_switcher"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><pre><code>with beam.Pipeline(options=PipelineOptions()) as pipeline: + kvs = pipeline | 'CreateKVs' >> beam.core.Create(KVs) kvs | 'WriteToSimpleKV' >> WriteToKVSink( simplekv, 'http://url_to_simple_kv/', final_table_name)</code></pre></div></div><div class=feedback><p class=update>Last updated on 2020/10/29</p><h3>Have you found everything you were looking for?</h3><p class=description>Was it all useful and clear? Is there anything that you would like to change? Let us know!</p><button class=load-button><a href="mailto:d...@beam.apache.org?subject=Beam Website Feedback">SEND FEEDBACK</a></button></div></div></div><footer class=footer><di [...] <a href=http://www.apache.org>The Apache Software Foundation</a> diff --git a/website/generated-content/documentation/patterns/ai-platform/index.html b/website/generated-content/documentation/patterns/ai-platform/index.html index e18e39c..12c8310 100644 --- a/website/generated-content/documentation/patterns/ai-platform/index.html +++ b/website/generated-content/documentation/patterns/ai-platform/index.html @@ -26,9 +26,9 @@ function openMenu(){addPlaceholder();blockScroll();}</script><div class="clearfi <span class=n>extract_syntax</span><span class=o>=</span><span class=bp>True</span><span class=p>,</span> <span class=p>)</span> -<span class=k>with</span> <span class=n>beam</span><span class=o>.</span><span class=n>Pipeline</span><span class=p>()</span> <span class=k>as</span> <span class=n>p</span><span class=p>:</span> +<span class=k>with</span> <span class=n>beam</span><span class=o>.</span><span class=n>Pipeline</span><span class=p>()</span> <span class=k>as</span> <span class=n>pipeline</span><span class=p>:</span> <span class=n>responses</span> <span class=o>=</span> <span class=p>(</span> - <span class=n>p</span> + <span class=n>pipeline</span> <span class=o>|</span> <span class=n>beam</span><span class=o>.</span><span class=n>Create</span><span class=p>([</span> <span class=s1>'My experience so far has been fantastic! '</span> <span class=s1>'I</span><span class=se>\'</span><span class=s1>d really recommend this product.'</span> diff --git a/website/generated-content/documentation/patterns/bigqueryio/index.html b/website/generated-content/documentation/patterns/bigqueryio/index.html index 246c53a..7fc7a9a 100644 --- a/website/generated-content/documentation/patterns/bigqueryio/index.html +++ b/website/generated-content/documentation/patterns/bigqueryio/index.html @@ -70,10 +70,10 @@ If you also set the <code>withExtendedErrorInfo</code> property , you will be ab <span class=o>}</span></code></pre></div></div></div><div class="language-py snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-py data-lang=py> <span class=c1># Create pipeline.</span> <span class=n>schema</span> <span class=o>=</span> <span class=p>({</span><span class=s1>'fields'</span><span class=p>:</span> <span class=p>[{</span><span class=s1>'name'</span><span class=p>:</span> <span class=s1>'a'</span><span class=p>,</span> <span class=s1>'type'</span><span class=p>:</span> <span class=s1>'STRING'</span><span class=p>,</span> <span class=s1>'mode'</span><span class=p>:</span> <span class=s1>'REQUIRED'</spa [...] - <span class=n>p</span> <span class=o>=</span> <span class=n>beam</span><span class=o>.</span><span class=n>Pipeline</span><span class=p>()</span> + <span class=n>pipeline</span> <span class=o>=</span> <span class=n>beam</span><span class=o>.</span><span class=n>Pipeline</span><span class=p>()</span> <span class=n>errors</span> <span class=o>=</span> <span class=p>(</span> - <span class=n>p</span> <span class=o>|</span> <span class=s1>'Data'</span> <span class=o>>></span> <span class=n>beam</span><span class=o>.</span><span class=n>Create</span><span class=p>([</span><span class=mi>1</span><span class=p>,</span> <span class=mi>2</span><span class=p>])</span> + <span class=n>pipeline</span> <span class=o>|</span> <span class=s1>'Data'</span> <span class=o>>></span> <span class=n>beam</span><span class=o>.</span><span class=n>Create</span><span class=p>([</span><span class=mi>1</span><span class=p>,</span> <span class=mi>2</span><span class=p>])</span> <span class=o>|</span> <span class=s1>'CreateBrokenData'</span> <span class=o>>></span> <span class=n>beam</span><span class=o>.</span><span class=n>Map</span><span class=p>(</span><span class=k>lambda</span> <span class=n>src</span><span class=p>:</span> <span class=p>{</span><span class=s1>'a'</span><span class=p>:</span> <span class=n>src</span><span class=p>}</span> <span class=k>if</span> <span class=n>src</span> <span class=o>==</span> <span class=mi>2</span> <span class=k>else</span> <span class=p>{</span><span class=s1>'a'</span><span class=p>: [...] <span class=o>|</span> <span class=s1>'WriteToBigQuery'</span> <span class=o>>></span> <span class=n>beam</span><span class=o>.</span><span class=n>io</span><span class=o>.</span><span class=n>WriteToBigQuery</span><span class=p>(</span> diff --git a/website/generated-content/documentation/patterns/file-processing/index.html b/website/generated-content/documentation/patterns/file-processing/index.html index 55fd38b..5cf1727 100644 --- a/website/generated-content/documentation/patterns/file-processing/index.html +++ b/website/generated-content/documentation/patterns/file-processing/index.html @@ -44,9 +44,9 @@ function openMenu(){addPlaceholder();blockScroll();}</script><div class="clearfi <span class=c1>// We can now access the file and its metadata. </span><span class=c1></span> <span class=n>LOG</span><span class=o>.</span><span class=na>info</span><span class=o>(</span><span class=s>"File Metadata resourceId is {} "</span><span class=o>,</span> <span class=n>file</span><span class=o>.</span><span class=na>getMetadata</span><span class=o>().</span><span class=na>resourceId</span><span class=o>());</span> <span class=o>}</span> - <span class=o>}));</span></code></pre></div></div></div><div class="language-py snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-py data-lang=py><span class=k>with</span> <span class=n>beam</span><span class=o>.</span><span class=n>Pipeline</span><span class=p>()</span> <span class=k> [...] + <span class=o>}));</span></code></pre></div></div></div><div class="language-py snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-py data-lang=py><span class=k>with</span> <span class=n>beam</span><span class=o>.</span><span class=n>Pipeline</span><span class=p>()</span> <span class=k> [...] <span class=n>readable_files</span> <span class=o>=</span> <span class=p>(</span> - <span class=n>p</span> + <span class=n>pipeline</span> <span class=o>|</span> <span class=n>fileio</span><span class=o>.</span><span class=n>MatchFiles</span><span class=p>(</span><span class=s1>'hdfs://path/to/*.txt'</span><span class=p>)</span> <span class=o>|</span> <span class=n>fileio</span><span class=o>.</span><span class=n>ReadMatches</span><span class=p>()</span> <span class=o>|</span> <span class=n>beam</span><span class=o>.</span><span class=n>Reshuffle</span><span class=p>())</span> diff --git a/website/generated-content/documentation/patterns/pipeline-options/index.html b/website/generated-content/documentation/patterns/pipeline-options/index.html index df04277..9ee865e 100644 --- a/website/generated-content/documentation/patterns/pipeline-options/index.html +++ b/website/generated-content/documentation/patterns/pipeline-options/index.html @@ -82,21 +82,21 @@ function openMenu(){addPlaceholder();blockScroll();}</script><div class="clearfi <span class=s1>'The string value is </span><span class=si>%s</span><span class=s1>'</span> <span class=o>%</span> <span class=n>RuntimeValueProvider</span><span class=o>.</span><span class=n>get_value</span><span class=p>(</span><span class=s1>'string_value'</span><span class=p>,</span> <span class=nb>str</span><span class=p>,</span> <span class=s1>''</span><span class=p>))</span> -<span class=n>pipeline_options</span> <span class=o>=</span> <span class=n>PipelineOptions</span><span class=p>()</span> +<span class=n>beam_options</span> <span class=o>=</span> <span class=n>PipelineOptions</span><span class=p>()</span> +<span class=n>args</span> <span class=o>=</span> <span class=n>beam_options</span><span class=o>.</span><span class=n>view_as</span><span class=p>(</span><span class=n>MyOptions</span><span class=p>)</span> + <span class=c1># Create pipeline.</span> -<span class=k>with</span> <span class=n>beam</span><span class=o>.</span><span class=n>Pipeline</span><span class=p>(</span><span class=n>options</span><span class=o>=</span><span class=n>pipeline_options</span><span class=p>)</span> <span class=k>as</span> <span class=n>p</span><span class=p>:</span> +<span class=k>with</span> <span class=n>beam</span><span class=o>.</span><span class=n>Pipeline</span><span class=p>(</span><span class=n>options</span><span class=o>=</span><span class=n>beam_options</span><span class=p>)</span> <span class=k>as</span> <span class=n>pipeline</span><span class=p>:</span> - <span class=n>my_options</span> <span class=o>=</span> <span class=n>pipeline_options</span><span class=o>.</span><span class=n>view_as</span><span class=p>(</span><span class=n>MyOptions</span><span class=p>)</span> <span class=c1># Add a branch for logging the ValueProvider value.</span> <span class=n>_</span> <span class=o>=</span> <span class=p>(</span> - <span class=n>p</span> + <span class=n>pipeline</span> <span class=o>|</span> <span class=n>beam</span><span class=o>.</span><span class=n>Create</span><span class=p>([</span><span class=bp>None</span><span class=p>])</span> - <span class=o>|</span> <span class=s1>'LogValueProvs'</span> <span class=o>>></span> <span class=n>beam</span><span class=o>.</span><span class=n>ParDo</span><span class=p>(</span> - <span class=n>LogValueProvidersFn</span><span class=p>(</span><span class=n>my_options</span><span class=o>.</span><span class=n>string_value</span><span class=p>)))</span> + <span class=o>|</span> <span class=s1>'LogValueProvs'</span> <span class=o>>></span> <span class=n>beam</span><span class=o>.</span><span class=n>ParDo</span><span class=p>(</span><span class=n>LogValueProvidersFn</span><span class=p>(</span><span class=n>args</span><span class=o>.</span><span class=n>string_value</span><span class=p>)))</span> <span class=c1># The main pipeline.</span> <span class=n>result_pc</span> <span class=o>=</span> <span class=p>(</span> - <span class=n>p</span> + <span class=n>pipeline</span> <span class=o>|</span> <span class=s2>"main_pc"</span> <span class=o>>></span> <span class=n>beam</span><span class=o>.</span><span class=n>Create</span><span class=p>([</span><span class=mi>1</span><span class=p>,</span> <span class=mi>2</span><span class=p>,</span> <span class=mi>3</span><span class=p>])</span> <span class=o>|</span> <span class=n>beam</span><span class=o>.</span><span class=n>combiners</span><span class=o>.</span><span class=n>Sum</span><span class=o>.</span><span class=n>Globally</span><span class=p>())</span></code></pre></div></div></div><div class=feedback><p class=update>Last updated on 2020/05/28</p><h3>Have you found everything you were looking for?</h3><p class=description>Was it all useful and clear? Is there anything that you would like to change? Let us know!< [...] <a href=http://www.apache.org>The Apache Software Foundation</a> diff --git a/website/generated-content/documentation/patterns/side-inputs/index.html b/website/generated-content/documentation/patterns/side-inputs/index.html index 1648dc7..9739a95 100644 --- a/website/generated-content/documentation/patterns/side-inputs/index.html +++ b/website/generated-content/documentation/patterns/side-inputs/index.html @@ -156,17 +156,16 @@ PCollection element.</li><li>Apply the side input.</li></ol><div class="language <span class=k>yield</span> <span class=p>(</span><span class=n>left</span><span class=p>,</span> <span class=n>x</span><span class=p>)</span> <span class=c1># Create pipeline.</span> -<span class=n>pipeline_options</span> <span class=o>=</span> <span class=n>PipelineOptions</span><span class=p>()</span> -<span class=n>p</span> <span class=o>=</span> <span class=n>beam</span><span class=o>.</span><span class=n>Pipeline</span><span class=p>(</span><span class=n>options</span><span class=o>=</span><span class=n>pipeline_options</span><span class=p>)</span> +<span class=n>pipeline</span> <span class=o>=</span> <span class=n>beam</span><span class=o>.</span><span class=n>Pipeline</span><span class=p>()</span> <span class=n>side_input</span> <span class=o>=</span> <span class=p>(</span> - <span class=n>p</span> + <span class=n>pipeline</span> <span class=o>|</span> <span class=s1>'PeriodicImpulse'</span> <span class=o>>></span> <span class=n>PeriodicImpulse</span><span class=p>(</span> <span class=n>first_timestamp</span><span class=p>,</span> <span class=n>last_timestamp</span><span class=p>,</span> <span class=n>interval</span><span class=p>,</span> <span class=bp>True</span><span class=p>)</span> <span class=o>|</span> <span class=s1>'MapToFileName'</span> <span class=o>>></span> <span class=n>beam</span><span class=o>.</span><span class=n>Map</span><span class=p>(</span><span class=k>lambda</span> <span class=n>x</span><span class=p>:</span> <span class=n>src_file_pattern</span> <span class=o>+</span> <span class=nb>str</span><span class=p>(</span><span class=n>x</span><span class=p>))</span> <span class=o>|</span> <span class=s1>'ReadFromFile'</span> <span class=o>>></span> <span class=n>beam</span><span class=o>.</span><span class=n>io</span><span class=o>.</span><span class=n>ReadAllFromText</span><span class=p>())</span> <span class=n>main_input</span> <span class=o>=</span> <span class=p>(</span> - <span class=n>p</span> + <span class=n>pipeline</span> <span class=o>|</span> <span class=s1>'MpImpulse'</span> <span class=o>>></span> <span class=n>beam</span><span class=o>.</span><span class=n>Create</span><span class=p>(</span><span class=n>sample_main_input_elements</span><span class=p>)</span> <span class=o>|</span> <span class=s1>'MapMpToTimestamped'</span> <span class=o>>></span> <span class=n>beam</span><span class=o>.</span><span class=n>Map</span><span class=p>(</span><span class=k>lambda</span> <span class=n>src</span><span class=p>:</span> <span class=n>TimestampedValue</span><span class=p>(</span><span class=n>src</span><span class=p>,</span> <span class=n>src</span><span class=p>))</span> diff --git a/website/generated-content/documentation/programming-guide/index.html b/website/generated-content/documentation/programming-guide/index.html index f4dfb52..d3fdf98 100644 --- a/website/generated-content/documentation/programming-guide/index.html +++ b/website/generated-content/documentation/programming-guide/index.html @@ -78,9 +78,8 @@ to the <code>Pipeline</code> object when you create the object.</p><div class="l <span class=c1>// Then create the pipeline. </span><span class=c1></span><span class=n>Pipeline</span> <span class=n>p</span> <span class=o>=</span> <span class=n>Pipeline</span><span class=o>.</span><span class=na>create</span><span class=o>(</span><span class=n>options</span><span class=o>);</span></code></pre></div></div></div><div class="language-py snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg> [...] -<span class=kn>from</span> <span class=nn>apache_beam.options.pipeline_options</span> <span class=kn>import</span> <span class=n>PipelineOptions</span> -<span class=k>with</span> <span class=n>beam</span><span class=o>.</span><span class=n>Pipeline</span><span class=p>(</span><span class=n>options</span><span class=o>=</span><span class=n>PipelineOptions</span><span class=p>())</span> <span class=k>as</span> <span class=n>p</span><span class=p>:</span> +<span class=k>with</span> <span class=n>beam</span><span class=o>.</span><span class=n>Pipeline</span><span class=p>()</span> <span class=k>as</span> <span class=n>pipeline</span><span class=p>:</span> <span class=k>pass</span> <span class=c1># build your pipeline here</span></code></pre></div></div></div><div class="language-go snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-go data-lang=go><span class=c1>// In order to start creating the pipeline for execution, a Pipeline object and a Sco [...] </span><span class=c1></span><span class=nx>p</span><span class=p>,</span> <span class=nx>s</span> <span class=o>:=</span> <span class=nx>beam</span><span class=p>.</span><span class=nf>NewPipelineWithRoot</span><span class=p>()</span></code></pre></div></div></div><h3 id=configuring-pipeline-options>2.1. Configuring pipeline options</h3><p>Use the pipeline options to configure different aspects of your pipeline, such as the pipeline runner that will execute your pipeline and any runner-specific @@ -93,9 +92,8 @@ setting the fields directly, the Beam SDKs include a command-line parser that you can use to set fields in <code>PipelineOptions</code> using command-line arguments.</p><p>To read options from the command-line, construct your <code>PipelineOptions</code> object as demonstrated in the following example code:</p><div class="language-java snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-java data-lang=java><span class=n>PipelineOptions</span> <span class=n>options</span> <span class=o>=</span> <span class=n>PipelineOptionsFactory</span><span class=o>.</span><span class=na>fromArgs</span><span class=o>(</span><span class=n>args</span><span class=o>).</span><span class=na>withValidation</span><span class=o>().</span><span class=na>create</span><span class=o>();</span></code></pre></div></div></div><div class="language-py snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img sr [...] -<span class=kn>from</span> <span class=nn>apache_beam.options.pipeline_options</span> <span class=kn>import</span> <span class=n>PipelineOptions</span> -<span class=k>with</span> <span class=n>beam</span><span class=o>.</span><span class=n>Pipeline</span><span class=p>(</span><span class=n>options</span><span class=o>=</span><span class=n>PipelineOptions</span><span class=p>())</span> <span class=k>as</span> <span class=n>p</span><span class=p>:</span> +<span class=k>with</span> <span class=n>beam</span><span class=o>.</span><span class=n>Pipeline</span><span class=p>()</span> <span class=k>as</span> <span class=n>pipeline</span><span class=p>:</span> <span class=k>pass</span> <span class=c1># build your pipeline here</span></code></pre></div></div></div><div class="language-go snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-go data-lang=go><span class=c1>// If beamx or Go flags are used, flags must be parsed first. </span><span class=c1></span><span class=nx>flag</span><span class=p>.</span><span class=nf>Parse</span><span class=p>()</span></code></pre></div></div></div><p>This interprets command-line arguments that follow the format:</p><pre><code>--<option>=<value> </code></pre><blockquote><p><strong>Note:</strong> Appending the method <code>.withValidation</code> will check for required @@ -116,8 +114,8 @@ adding <code>input</code> and <code>output</code> custom options:</p><div class= <span class=k>class</span> <span class=nc>MyOptions</span><span class=p>(</span><span class=n>PipelineOptions</span><span class=p>):</span> <span class=nd>@classmethod</span> <span class=k>def</span> <span class=nf>_add_argparse_args</span><span class=p>(</span><span class=bp>cls</span><span class=p>,</span> <span class=n>parser</span><span class=p>):</span> - <span class=n>parser</span><span class=o>.</span><span class=n>add_argument</span><span class=p>(</span><span class=s1>'--input'</span><span class=p>)</span> - <span class=n>parser</span><span class=o>.</span><span class=n>add_argument</span><span class=p>(</span><span class=s1>'--output'</span><span class=p>)</span></code></pre></div></div></div><div class="language-go snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-go data-lang=go><span c [...] + <span class=n>parser</span><span class=o>.</span><span class=n>add_argument</span><span class=p>(</span><span class=s1>'--input-file'</span><span class=p>)</span> + <span class=n>parser</span><span class=o>.</span><span class=n>add_argument</span><span class=p>(</span><span class=s1>'--output-path'</span><span class=p>)</span></code></pre></div></div></div><div class="language-go snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-go data-lang=go><s [...] <span class=nx>input</span> <span class=p>=</span> <span class=nx>flag</span><span class=p>.</span><span class=nf>String</span><span class=p>(</span><span class=s>"input"</span><span class=p>,</span> <span class=s>""</span><span class=p>,</span> <span class=s>""</span><span class=p>)</span> <span class=nx>output</span> <span class=p>=</span> <span class=nx>flag</span><span class=p>.</span><span class=nf>String</span><span class=p>(</span><span class=s>"output"</span><span class=p>,</span> <span class=s>""</span><span class=p>,</span> <span class=s>""</span><span class=p>)</span> <span class=p>)</span></code></pre></div></div></div><p>You can also specify a description, which appears when a user passes <code>--help</code> as @@ -137,13 +135,13 @@ a command-line argument, and a default value.</p><p>You set the description and <span class=nd>@classmethod</span> <span class=k>def</span> <span class=nf>_add_argparse_args</span><span class=p>(</span><span class=bp>cls</span><span class=p>,</span> <span class=n>parser</span><span class=p>):</span> <span class=n>parser</span><span class=o>.</span><span class=n>add_argument</span><span class=p>(</span> - <span class=s1>'--input'</span><span class=p>,</span> - <span class=n>help</span><span class=o>=</span><span class=s1>'Input for the pipeline'</span><span class=p>,</span> - <span class=n>default</span><span class=o>=</span><span class=s1>'gs://my-bucket/input'</span><span class=p>)</span> + <span class=s1>'--input-file'</span><span class=p>,</span> + <span class=n>default</span><span class=o>=</span><span class=s1>'gs://dataflow-samples/shakespeare/kinglear.txt'</span><span class=p>,</span> + <span class=n>help</span><span class=o>=</span><span class=s1>'The file path for the input text to process.'</span><span class=p>)</span> <span class=n>parser</span><span class=o>.</span><span class=n>add_argument</span><span class=p>(</span> - <span class=s1>'--output'</span><span class=p>,</span> - <span class=n>help</span><span class=o>=</span><span class=s1>'Output for the pipeline'</span><span class=p>,</span> - <span class=n>default</span><span class=o>=</span><span class=s1>'gs://my-bucket/output'</span><span class=p>)</span></code></pre></div></div></div><div class="language-go snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-go data-lang=go><span class=kd>var</span> <span class=p>(</span> + <span class=s1>'--output-path'</span><span class=p>,</span> + <span class=n>required</span><span class=o>=</span><span class=bp>True</span><span class=p>,</span> + <span class=n>help</span><span class=o>=</span><span class=s1>'The path prefix for output files.'</span><span class=p>)</span></code></pre></div></div></div><div class="language-go snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-go data-lang=go><span class=kd>var</span> <span cla [...] <span class=nx>input</span> <span class=p>=</span> <span class=nx>flag</span><span class=p>.</span><span class=nf>String</span><span class=p>(</span><span class=s>"input"</span><span class=p>,</span> <span class=s>"gs://my-bucket/input"</span><span class=p>,</span> <span class=s>"Input for the pipeline"</span><span class=p>)</span> <span class=nx>output</span> <span class=p>=</span> <span class=nx>flag</span><span class=p>.</span><span class=nf>String</span><span class=p>(</span><span class=s>"output"</span><span class=p>,</span> <span class=s>"gs://my-bucket/output"</span><span class=p>,</span> <span class=s>"Output for the pipeline"</span><span class=p>)</span> <span class=p>)</span></code></pre></div></div></div><p class=language-java>It’s recommended that you register your interface with <code>PipelineOptionsFactory</code> @@ -188,7 +186,7 @@ a <code>PCollection</code>:</p><div class="language-java snippet"><div class="no <span class=c1>// Create the PCollection 'lines' by applying a 'Read' transform. </span><span class=c1></span> <span class=n>PCollection</span><span class=o><</span><span class=n>String</span><span class=o>></span> <span class=n>lines</span> <span class=o>=</span> <span class=n>p</span><span class=o>.</span><span class=na>apply</span><span class=o>(</span> <span class=s>"ReadMyFile"</span><span class=o>,</span> <span class=n>TextIO</span><span class=o>.</span><span class=na>read</span><span class=o>().</span><span class=na>from</span><span class=o>(</span><span class=s>"gs://some/inputData.txt"</span><span class=o>));</span> -<span class=o>}</span></code></pre></div></div></div><div class="language-py snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-py data-lang=py><span class=n>lines</span> <span class=o>=</span> <span class=n>p</span> <span class=o>|</span> <span class=s1>'ReadMyFile'</span> <span class=o>&g [...] +<span class=o>}</span></code></pre></div></div></div><div class="language-py snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-py data-lang=py><span class=n>lines</span> <span class=o>=</span> <span class=n>pipeline</span> <span class=o>|</span> <span class=s1>'ReadMyFile'</span> <span cla [...] <span class=s1>'gs://some/inputData.txt'</span><span class=p>)</span></code></pre></div></div></div><div class="language-go snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-go data-lang=go><span class=nx>lines</span> <span class=o>:=</span> <span class=nx>textio</span><span class=p>.< [...] various data sources supported by the Beam SDK.</p><h4 id=creating-pcollection-in-memory>3.1.2. Creating a PCollection from in-memory data</h4><p class=language-java>To create a <code>PCollection</code> from an in-memory Java <code>Collection</code>, you use the Beam-provided <code>Create</code> transform. Much like a data adapter’s <code>Read</code>, you apply @@ -213,11 +211,8 @@ itself.</p><p>The following example code shows how to create a <code>PCollection <span class=c1>// Apply Create, passing the list and the coder, to create the PCollection. </span><span class=c1></span> <span class=n>p</span><span class=o>.</span><span class=na>apply</span><span class=o>(</span><span class=n>Create</span><span class=o>.</span><span class=na>of</span><span class=o>(</span><span class=n>LINES</span><span class=o>)).</span><span class=na>setCoder</span><span class=o>(</span><span class=n>StringUtf8Coder</span><span class=o>.</span><span class=na>of</span><span class=o>());</span> <span class=o>}</span></code></pre></div></div></div><div class="language-py snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-py data-lang=py><span class=kn>import</span> <span class=nn>apache_beam</span> <span class=kn>as</span> <span class=nn>beam</span> -<span class=kn>from</span> <span class=nn>apache_beam.options.pipeline_options</span> <span class=kn>import</span> <span class=n>PipelineOptions</span> -<span class=c1># argv = None # if None, uses sys.argv</span> -<span class=n>pipeline_options</span> <span class=o>=</span> <span class=n>PipelineOptions</span><span class=p>(</span><span class=n>argv</span><span class=p>)</span> -<span class=k>with</span> <span class=n>beam</span><span class=o>.</span><span class=n>Pipeline</span><span class=p>(</span><span class=n>options</span><span class=o>=</span><span class=n>pipeline_options</span><span class=p>)</span> <span class=k>as</span> <span class=n>pipeline</span><span class=p>:</span> +<span class=k>with</span> <span class=n>beam</span><span class=o>.</span><span class=n>Pipeline</span><span class=p>()</span> <span class=k>as</span> <span class=n>pipeline</span><span class=p>:</span> <span class=n>lines</span> <span class=o>=</span> <span class=p>(</span> <span class=n>pipeline</span> <span class=o>|</span> <span class=n>beam</span><span class=o>.</span><span class=n>Create</span><span class=p>([</span> @@ -1278,16 +1273,16 @@ appeared in the original data.</li></ol><div class="language-java snippet"><div <span class=k>return</span> <span class=n>wordCounts</span><span class=o>;</span> <span class=o>}</span> <span class=o>}</span></code></pre></div></div></div><div class="language-py snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-py data-lang=py><span class=c1># The CountWords Composite Transform inside the WordCount pipeline.</span> -<span class=k>class</span> <span class=nc>CountWords</span><span class=p>(</span><span class=n>beam</span><span class=o>.</span><span class=n>PTransform</span><span class=p>):</span> - <span class=k>def</span> <span class=nf>expand</span><span class=p>(</span><span class=bp>self</span><span class=p>,</span> <span class=n>pcoll</span><span class=p>):</span> - <span class=k>return</span> <span class=p>(</span> - <span class=n>pcoll</span> - <span class=c1># Convert lines of text into individual words.</span> - <span class=o>|</span> <span class=s1>'ExtractWords'</span> <span class=o>>></span> <span class=n>beam</span><span class=o>.</span><span class=n>ParDo</span><span class=p>(</span><span class=n>ExtractWordsFn</span><span class=p>())</span> - <span class=c1># Count the number of times each word occurs.</span> - <span class=o>|</span> <span class=n>beam</span><span class=o>.</span><span class=n>combiners</span><span class=o>.</span><span class=n>Count</span><span class=o>.</span><span class=n>PerElement</span><span class=p>()</span> - <span class=c1># Format each word and count into a printable string.</span> - <span class=o>|</span> <span class=s1>'FormatCounts'</span> <span class=o>>></span> <span class=n>beam</span><span class=o>.</span><span class=n>ParDo</span><span class=p>(</span><span class=n>FormatCountsFn</span><span class=p>()))</span></code></pre></div></div></div><blockquote><p><strong>Note:</strong> Because <code>Count</code> is itself a composite transform, +<span class=nd>@beam.ptransform_fn</span> +<span class=k>def</span> <span class=nf>CountWords</span><span class=p>(</span><span class=n>pcoll</span><span class=p>):</span> + <span class=k>return</span> <span class=p>(</span> + <span class=n>pcoll</span> + <span class=c1># Convert lines of text into individual words.</span> + <span class=o>|</span> <span class=s1>'ExtractWords'</span> <span class=o>>></span> <span class=n>beam</span><span class=o>.</span><span class=n>ParDo</span><span class=p>(</span><span class=n>ExtractWordsFn</span><span class=p>())</span> + <span class=c1># Count the number of times each word occurs.</span> + <span class=o>|</span> <span class=n>beam</span><span class=o>.</span><span class=n>combiners</span><span class=o>.</span><span class=n>Count</span><span class=o>.</span><span class=n>PerElement</span><span class=p>()</span> + <span class=c1># Format each word and count into a printable string.</span> + <span class=o>|</span> <span class=s1>'FormatCounts'</span> <span class=o>>></span> <span class=n>beam</span><span class=o>.</span><span class=n>ParDo</span><span class=p>(</span><span class=n>FormatCountsFn</span><span class=p>()))</span></code></pre></div></div></div><blockquote><p><strong>Note:</strong> Because <code>Count</code> is itself a composite transform, <code>CountWords</code> is also a nested composite transform.</p></blockquote><h4 id=composite-transform-creation>4.6.2. Creating a composite transform</h4><p>To create your own composite transform, create a subclass of the <code>PTransform</code> class and override the <code>expand</code> method to specify the actual processing logic. You can then use this transform just as you would a built-in transform from the @@ -1346,7 +1341,8 @@ operator you provide. Note that glob operators are filesystem-specific and obey filesystem-specific consistency models. The following TextIO example uses a glob operator (<code>*</code>) to read all matching input files that have prefix “input-” and the suffix “.csv” in the given location:</p><div class="language-java snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-java data-lang=java><span class=n>p</span><span class=o>.</span><span class=na>apply</span><span class=o>(</span><span class=s>"ReadFromText"</span><span class=o> [...] - <span class=n>TextIO</span><span class=o>.</span><span class=na>read</span><span class=o>().</span><span class=na>from</span><span class=o>(</span><span class=s>"protocol://my_bucket/path/to/input-*.csv"</span><span class=o>));</span></code></pre></div></div></div><div class="language-py snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div cl [...] + <span class=n>TextIO</span><span class=o>.</span><span class=na>read</span><span class=o>().</span><span class=na>from</span><span class=o>(</span><span class=s>"protocol://my_bucket/path/to/input-*.csv"</span><span class=o>));</span></code></pre></div></div></div><div class="language-py snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div cl [...] + <span class=s1>'path/to/input-*.csv'</span><span class=p>)</span></code></pre></div></div></div><p>To read data from disparate sources into a single <code>PCollection</code>, read each one independently and then use the <a href=#flatten>Flatten</a> transform to create a single <code>PCollection</code>.</p><h4 id=file-based-writing-multiple-files>5.3.2. Writing to multiple output files</h4><p>For file-based output data, write transforms write to multiple output files by default. When you pass an output file name to a write transform, the file name diff --git a/website/generated-content/get-started/index.xml b/website/generated-content/get-started/index.xml index ddbb128..312daab 100644 --- a/website/generated-content/get-started/index.xml +++ b/website/generated-content/get-started/index.xml @@ -2448,13 +2448,15 @@ sections, we will specify the pipeline&rsquo;s runner.</p> <a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard"> <img src="/images/copy-icon.svg"/> </a> -<div class="highlight"><pre class="chroma"><code class="language-py" data-lang="py"><span class="n">options</span> <span class="o">=</span> <span class="n">PipelineOptions</span><span class="p">()</span> -<span class="n">google_cloud_options</span> <span class="o">=</span> <span class="n">options</span><span class="o">.</span><span class="n">view_as</span><span class="p">(</span><span class="n">GoogleCloudOptions</span><span class="p">)</span> -<span class="n">google_cloud_options</span><span class="o">.</span><span class="n">project</span> <span class="o">=</span> <span class="s1">&#39;my-project-id&#39;</span> -<span class="n">google_cloud_options</span><span class="o">.</span><span class="n">job_name</span> <span class="o">=</span> <span class="s1">&#39;myjob&#39;</span> -<span class="n">google_cloud_options</span><span class="o">.</span><span class="n">staging_location</span> <span class="o">=</span> <span class="s1">&#39;gs://your-bucket-name-here/staging&#39;</span> -<span class="n">google_cloud_options</span><span class="o">.</span><span class="n">temp_location</span> <span class="o">=</span> <span class="s1">&#39;gs://your-bucket-name-here/temp&#39;</span> -<span class="n">options</span><span class="o">.</span><span class="n">view_as</span><span class="p">(</span><span class="n">StandardOptions</span><span class="p">)</span><span class="o">.</span><span class="n">runner</span> <span class="o">=</span> <span class="s1">&#39;DataflowRunner&#39;</span></code></pre></div> +<div class="highlight"><pre class="chroma"><code class="language-py" data-lang="py"><span class="kn">from</span> <span class="nn">apache_beam.options.pipeline_options</span> <span class="kn">import</span> <span class="n">PipelineOptions</span> +<span class="n">input_file</span> <span class="o">=</span> <span class="s1">&#39;gs://dataflow-samples/shakespeare/kinglear.txt&#39;</span> +<span class="n">output_path</span> <span class="o">=</span> <span class="s1">&#39;gs://my-bucket/counts.txt&#39;</span> +<span class="n">beam_options</span> <span class="o">=</span> <span class="n">PipelineOptions</span><span class="p">(</span> +<span class="n">runner</span><span class="o">=</span><span class="s1">&#39;DataflowRunner&#39;</span><span class="p">,</span> +<span class="n">project</span><span class="o">=</span><span class="s1">&#39;my-project-id&#39;</span><span class="p">,</span> +<span class="n">job_name</span><span class="o">=</span><span class="s1">&#39;unique-job-name&#39;</span><span class="p">,</span> +<span class="n">temp_location</span><span class="o">=</span><span class="s1">&#39;gs://my-bucket/temp&#39;</span><span class="p">,</span> +<span class="p">)</span></code></pre></div> </div> </div> <p class="language-java language-py">The next step is to create a <code>Pipeline</code> object with the options we&rsquo;ve just @@ -2476,7 +2478,7 @@ The scope allows grouping into composite transforms.</p> <a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard"> <img src="/images/copy-icon.svg"/> </a> -<div class="highlight"><pre class="chroma"><code class="language-py" data-lang="py"><span class="n">p</span> <span class="o">=</span> <span class="n">beam</span><span class="o">.</span><span class="n">Pipeline</span><span class="p">(</span><span class="n">options</span><span class="o">=</span><span class="n">options</span><span class="p">)</span></code></pre></div> +<div class="highlight"><pre class="chroma"><code class="language-py" data-lang="py"><span class="n">pipeline</span> <span class="o">=</span> <span class="n">beam</span><span class="o">.</span><span class="n">Pipeline</span><span class="p">(</span><span class="n">options</span><span class="o">=</span><span class="n">beam_options</span><span class="p">)</span></code></pre></div> </div> </div> <div class='language-go snippet'> @@ -2519,8 +2521,8 @@ data stored in a publicly accessible Google Cloud Storage bucket (&ldquo;gs: <a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard"> <img src="/images/copy-icon.svg"/> </a> -<div class="highlight"><pre class="chroma"><code class="language-py" data-lang="py"><span class="n">p</span> -<span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">io</span><span class="o">.</span><span class="n">ReadFromText</span><span class="p">(</span><span class="s1">&#39;gs://dataflow-samples/shakespeare/kinglear.txt&#39;</span><span class="p">)</span></code></pre></div> +<div class="highlight"><pre class="chroma"><code class="language-py" data-lang="py"><span class="n">pipeline</span> +<span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">io</span><span class="o">.</span><span class="n">ReadFromText</span><span class="p">(</span><span class="n">input_file</span><span class="p">)</span></code></pre></div> </div> </div> <div class='language-go snippet'> @@ -2667,7 +2669,7 @@ resulting output file.</li> <a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard"> <img src="/images/copy-icon.svg"/> </a> -<div class="highlight"><pre class="chroma"><code class="language-py" data-lang="py"><span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">io</span><span class="o">.</span><span class="n">WriteToText</span><span class="p">(</span><span class="s1">&#39;gs://my-bucket/counts.txt&#39;</span><span class="p">)</span></code></pre></div> +<div class="highlight"><pre class="chroma"><code class="language-py" data-lang="py"><span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">io</span><span class="o">.</span><span class="n">WriteToText</span><span class="p">(</span><span class="n">output_path</span><span class="p">)</span></code></pre></div> </div> </div> <div class='language-go snippet'> @@ -3070,8 +3072,8 @@ is the <code>PCollection&lt;KV&lt;String, Long&gt;&gt;</co <a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard"> <img src="/images/copy-icon.svg"/> </a> -<div class="highlight"><pre class="chroma"><code class="language-py" data-lang="py"><span class="k">class</span> <span class="nc">CountWords</span><span class="p">(</span><span class="n">beam</span><span class="o">.</span><span class="n">PTransform</span><span class="p">):</span> -<span class="k">def</span> <span class="nf">expand</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">pcoll</span><span class="p">):</span> +<div class="highlight"><pre class="chroma"><code class="language-py" data-lang="py"><span class="nd">@beam.ptransform_fn</span> +<span class="k">def</span> <span class="nf">CountWords</span><span class="p">(</span><span class="n">pcoll</span><span class="p">):</span> <span class="k">return</span> <span class="p">(</span> <span class="n">pcoll</span> <span class="c1"># Convert lines of text into individual words.</span> @@ -3129,17 +3131,18 @@ values for them. You can then access the options values in your pipeline code.&l <a class="copy" type="button" data-bs-toggle="tooltip" data-bs-placement="bottom" title="Copy to clipboard"> <img src="/images/copy-icon.svg"/> </a> -<div class="highlight"><pre class="chroma"><code class="language-py" data-lang="py"><span class="k">class</span> <span class="nc">WordCountOptions</span><span class="p">(</span><span class="n">PipelineOptions</span><span class="p">):</span> -<span class="nd">@classmethod</span> -<span class="k">def</span> <span class="nf">_add_argparse_args</span><span class="p">(</span><span class="bp">cls</span><span class="p">,</span> <span class="n">parser</span><span class="p">):</span> +<div class="highlight"><pre class="chroma"><code class="language-py" data-lang="py"><span class="kn">import</span> <span class="nn">argparse</span> +<span class="n">parser</span> <span class="o">=</span> <span class="n">argparse</span><span class="o">.</span><span class="n">ArgumentParser</span><span class="p">()</span> <span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span> -<span class="s1">&#39;--input&#39;</span><span class="p">,</span> -<span class="n">help</span><span class="o">=</span><span class="s1">&#39;Input for the pipeline&#39;</span><span class="p">,</span> -<span class="n">default</span><span class="o">=</span><span class="s1">&#39;gs://my-bucket/input&#39;</span><span class="p">)</span> -<span class="n">options</span> <span class="o">=</span> <span class="n">PipelineOptions</span><span class="p">(</span><span class="n">argv</span><span class="p">)</span> -<span class="n">word_count_options</span> <span class="o">=</span> <span class="n">options</span><span class="o">.</span><span class="n">view_as</span><span class="p">(</span><span class="n">WordCountOptions</span><span class="p">)</span> -<span class="k">with</span> <span class="n">beam</span><span class="o">.</span><span class="n">Pipeline</span><span class="p">(</span><span class="n">options</span><span class="o">=</span><span class="n">options</span><span class="p">)</span> <span class="k">as</span> <span class="n">p</span><span class="p">:</span> -<span class="n">lines</span> <span class="o">=</span> <span class="n">p</span> <span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">io</span><span class="o">.</span><span class="n">ReadFromText</span><span class="p">(</span><span class="n">word_count_options</span><span class="o">.</span><span class="n">input</span><span class="p">)</span></code></pre></div> +<span class="s1">&#39;--input-file&#39;</span><span class="p">,</span> +<span class="n">default</span><span class="o">=</span><span class="s1">&#39;gs://dataflow-samples/shakespeare/kinglear.txt&#39;</span><span class="p">,</span> +<span class="n">help</span><span class="o">=</span><span class="s1">&#39;The file path for the input text to process.&#39;</span><span class="p">)</span> +<span class="n">parser</span><span class="o">.</span><span class="n">add_argument</span><span class="p">(</span> +<span class="s1">&#39;--output-path&#39;</span><span class="p">,</span> <span class="n">required</span><span class="o">=</span><span class="bp">True</span><span class="p">,</span> <span class="n">help</span><span class="o">=</span><span class="s1">&#39;The path prefix for output files.&#39;</span><span class="p">)</span> +<span class="n">args</span><span class="p">,</span> <span class="n">beam_args</span> <span class="o">=</span> <span class="n">parser</span><span class="o">.</span><span class="n">parse_known_args</span><span class="p">()</span> +<span class="n">beam_options</span> <span class="o">=</span> <span class="n">PipelineOptions</span><span class="p">(</span><span class="n">beam_args</span><span class="p">)</span> +<span class="k">with</span> <span class="n">beam</span><span class="o">.</span><span class="n">Pipeline</span><span class="p">(</span><span class="n">options</span><span class="o">=</span><span class="n">beam_options</span><span class="p">)</span> <span class="k">as</span> <span class="n">pipeline</span><span class="p">:</span> +<span class="n">lines</span> <span class="o">=</span> <span class="n">pipeline</span> <span class="o">|</span> <span class="n">beam</span><span class="o">.</span><span class="n">io</span><span class="o">.</span><span class="n">ReadFromText</span><span class="p">(</span><span class="n">args</span><span class="o">.</span><span class="n">input_file</span><span class="p">)</span></code></pre></div> </div> </div> <div class='language-go snippet'> diff --git a/website/generated-content/get-started/wordcount-example/index.html b/website/generated-content/get-started/wordcount-example/index.html index d4b52c8..33a3376 100644 --- a/website/generated-content/get-started/wordcount-example/index.html +++ b/website/generated-content/get-started/wordcount-example/index.html @@ -51,17 +51,21 @@ sections, we will specify the pipeline’s runner.</p><div class="language-j </span><span class=c1></span> <span class=c1>// options for our pipeline, such as the runner you wish to use. This example </span><span class=c1></span> <span class=c1>// will run with the DirectRunner by default, based on the class path configured </span><span class=c1></span> <span class=c1>// in its dependencies. -</span><span class=c1></span> <span class=n>PipelineOptions</span> <span class=n>options</span> <span class=o>=</span> <span class=n>PipelineOptionsFactory</span><span class=o>.</span><span class=na>create</span><span class=o>();</span></code></pre></div></div></div><div class="language-py snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highli [...] -<span class=n>google_cloud_options</span> <span class=o>=</span> <span class=n>options</span><span class=o>.</span><span class=n>view_as</span><span class=p>(</span><span class=n>GoogleCloudOptions</span><span class=p>)</span> -<span class=n>google_cloud_options</span><span class=o>.</span><span class=n>project</span> <span class=o>=</span> <span class=s1>'my-project-id'</span> -<span class=n>google_cloud_options</span><span class=o>.</span><span class=n>job_name</span> <span class=o>=</span> <span class=s1>'myjob'</span> -<span class=n>google_cloud_options</span><span class=o>.</span><span class=n>staging_location</span> <span class=o>=</span> <span class=s1>'gs://your-bucket-name-here/staging'</span> -<span class=n>google_cloud_options</span><span class=o>.</span><span class=n>temp_location</span> <span class=o>=</span> <span class=s1>'gs://your-bucket-name-here/temp'</span> -<span class=n>options</span><span class=o>.</span><span class=n>view_as</span><span class=p>(</span><span class=n>StandardOptions</span><span class=p>)</span><span class=o>.</span><span class=n>runner</span> <span class=o>=</span> <span class=s1>'DataflowRunner'</span></code></pre></div></div></div><p class="language-java language-py">The next step is to create a <code>Pipeline</code> object with the options we’ve just +</span><span class=c1></span> <span class=n>PipelineOptions</span> <span class=n>options</span> <span class=o>=</span> <span class=n>PipelineOptionsFactory</span><span class=o>.</span><span class=na>create</span><span class=o>();</span></code></pre></div></div></div><div class="language-py snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highli [...] + +<span class=n>input_file</span> <span class=o>=</span> <span class=s1>'gs://dataflow-samples/shakespeare/kinglear.txt'</span> +<span class=n>output_path</span> <span class=o>=</span> <span class=s1>'gs://my-bucket/counts.txt'</span> + +<span class=n>beam_options</span> <span class=o>=</span> <span class=n>PipelineOptions</span><span class=p>(</span> + <span class=n>runner</span><span class=o>=</span><span class=s1>'DataflowRunner'</span><span class=p>,</span> + <span class=n>project</span><span class=o>=</span><span class=s1>'my-project-id'</span><span class=p>,</span> + <span class=n>job_name</span><span class=o>=</span><span class=s1>'unique-job-name'</span><span class=p>,</span> + <span class=n>temp_location</span><span class=o>=</span><span class=s1>'gs://my-bucket/temp'</span><span class=p>,</span> +<span class=p>)</span></code></pre></div></div></div><p class="language-java language-py">The next step is to create a <code>Pipeline</code> object with the options we’ve just constructed. The Pipeline object builds up the graph of transformations to be executed, associated with that particular pipeline.</p><p class=language-go>The first step is to create a <code>Pipeline</code> object. It builds up the graph of transformations to be executed, associated with that particular pipeline. -The scope allows grouping into composite transforms.</p><div class="language-java snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-java data-lang=java><span class=n>Pipeline</span> <span class=n>p</span> <span class=o>=</span> <span class=n>Pipeline</span><span class=o>.</span><span class=na>crea [...] +The scope allows grouping into composite transforms.</p><div class="language-java snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-java data-lang=java><span class=n>Pipeline</span> <span class=n>p</span> <span class=o>=</span> <span class=n>Pipeline</span><span class=o>.</span><span class=na>crea [...] <span class=nx>s</span> <span class=o>:=</span> <span class=nx>p</span><span class=p>.</span><span class=nf>Root</span><span class=p>()</span></code></pre></div></div></div><h3 id=applying-pipeline-transforms>Applying pipeline transforms</h3><p>The MinimalWordCount pipeline contains several transforms to read data into the pipeline, manipulate or otherwise transform the data, and write out the results. Transforms can consist of an individual operation, or can contain multiple @@ -71,8 +75,8 @@ input and output data is often represented by the SDK class <code>PCollection</c represent a dataset of virtually any size, including unbounded datasets.</p><img src=/images/wordcount-pipeline.svg width=800px alt="The MinimalWordCount pipeline data flow."><p><em>Figure 1: The MinimalWordCount pipeline data flow.</em></p><p>The MinimalWordCount pipeline contains five transforms:</p><ol><li>A text file <code>Read</code> transform is applied to the <code>Pipeline</code> object itself, and produces a <code>PCollection</code> as output. Each element in the output <code>PCollection</code> represents one line of text from the input file. This example uses input -data stored in a publicly accessible Google Cloud Storage bucket (“gs://").</li></ol><div class="language-java snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-java data-lang=java><span class=n>p</span><span class=o>.</span><span class=na>apply</span><span class=o>(</span><span class=n>Text [...] -<span class=o>|</span> <span class=n>beam</span><span class=o>.</span><span class=n>io</span><span class=o>.</span><span class=n>ReadFromText</span><span class=p>(</span><span class=s1>'gs://dataflow-samples/shakespeare/kinglear.txt'</span><span class=p>)</span></code></pre></div></div></div><div class="language-go snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/c [...] +data stored in a publicly accessible Google Cloud Storage bucket (“gs://").</li></ol><div class="language-java snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-java data-lang=java><span class=n>p</span><span class=o>.</span><span class=na>apply</span><span class=o>(</span><span class=n>Text [...] +<span class=o>|</span> <span class=n>beam</span><span class=o>.</span><span class=n>io</span><span class=o>.</span><span class=n>ReadFromText</span><span class=p>(</span><span class=n>input_file</span><span class=p>)</span></code></pre></div></div></div><div class="language-go snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre clas [...] is an individual word in Shakespeare’s collected texts. As an alternative, it would have been possible to use a <a href=/documentation/programming-guide/#pardo>ParDo</a> @@ -106,7 +110,7 @@ transform applies a function that produces exactly one output element.</p></li>< <span class=p>},</span> <span class=nx>counted</span><span class=p>)</span></code></pre></div></div></div><ol start=5><li>A text file write transform. This transform takes the final <code>PCollection</code> of formatted Strings as input and writes each element to an output text file. Each element in the input <code>PCollection</code> represents one line of text in the -resulting output file.</li></ol><div class="language-java snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-java data-lang=java><span class=o>.</span><span class=na>apply</span><span class=o>(</span><span class=n>TextIO</span><span class=o>.</span><span class=na>write</span><span class=o>().</span [...] +resulting output file.</li></ol><div class="language-java snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-java data-lang=java><span class=o>.</span><span class=na>apply</span><span class=o>(</span><span class=n>TextIO</span><span class=o>.</span><span class=na>write</span><span class=o>().</span [...] which in this case is ignored.</p><p class=language-go>Note that the <code>Write</code> transform returns no PCollections.</p><h3 id=running-the-pipeline>Running the pipeline</h3><p class="language-java language-py">Run the pipeline by calling the <code>run</code> method, which sends your pipeline to be executed by the pipeline runner that you specified in your <code>PipelineOptions</code>.</p><p class=language-go>Run the pipeline by passing it to a runner.</p><div class="language-java snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-java data-lang=java><span class=n>p</span><span class=o>.</spa [...] <span class=p>[</span><span class=n>construction</span><span class=p>]</span> @@ -224,16 +228,16 @@ is the <code>PCollection<KV<String, Long>></code> produced by the count op <span class=n>p</span><span class=o>.</span><span class=na>apply</span><span class=o>(...)</span> <span class=o>.</span><span class=na>apply</span><span class=o>(</span><span class=k>new</span> <span class=n>CountWords</span><span class=o>())</span> <span class=o>...</span> -<span class=o>}</span></code></pre></div></div></div><div class="language-py snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-py data-lang=py><span class=k>class</span> <span class=nc>CountWords</span><span class=p>(</span><span class=n>beam</span><span class=o>.</span><span class=n>PTransform</s [...] - <span class=k>def</span> <span class=nf>expand</span><span class=p>(</span><span class=bp>self</span><span class=p>,</span> <span class=n>pcoll</span><span class=p>):</span> - <span class=k>return</span> <span class=p>(</span> - <span class=n>pcoll</span> - <span class=c1># Convert lines of text into individual words.</span> - <span class=o>|</span> <span class=s1>'ExtractWords'</span> <span class=o>>></span> - <span class=n>beam</span><span class=o>.</span><span class=n>FlatMap</span><span class=p>(</span><span class=k>lambda</span> <span class=n>x</span><span class=p>:</span> <span class=n>re</span><span class=o>.</span><span class=n>findall</span><span class=p>(</span><span class=sa>r</span><span class=s1>'[A-Za-z</span><span class=se>\'</span><span class=s1>]+'</span><span class=p>,</span> <span class=n>x</span><span class=p>))</span> +<span class=o>}</span></code></pre></div></div></div><div class="language-py snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-py data-lang=py><span class=nd>@beam.ptransform_fn</span> +<span class=k>def</span> <span class=nf>CountWords</span><span class=p>(</span><span class=n>pcoll</span><span class=p>):</span> + <span class=k>return</span> <span class=p>(</span> + <span class=n>pcoll</span> + <span class=c1># Convert lines of text into individual words.</span> + <span class=o>|</span> <span class=s1>'ExtractWords'</span> <span class=o>>></span> + <span class=n>beam</span><span class=o>.</span><span class=n>FlatMap</span><span class=p>(</span><span class=k>lambda</span> <span class=n>x</span><span class=p>:</span> <span class=n>re</span><span class=o>.</span><span class=n>findall</span><span class=p>(</span><span class=sa>r</span><span class=s1>'[A-Za-z</span><span class=se>\'</span><span class=s1>]+'</span><span class=p>,</span> <span class=n>x</span><span class=p>))</span> - <span class=c1># Count the number of times each word occurs.</span> - <span class=o>|</span> <span class=n>beam</span><span class=o>.</span><span class=n>combiners</span><span class=o>.</span><span class=n>Count</span><span class=o>.</span><span class=n>PerElement</span><span class=p>())</span> + <span class=c1># Count the number of times each word occurs.</span> + <span class=o>|</span> <span class=n>beam</span><span class=o>.</span><span class=n>combiners</span><span class=o>.</span><span class=n>Count</span><span class=o>.</span><span class=n>PerElement</span><span class=p>())</span> <span class=n>counts</span> <span class=o>=</span> <span class=n>lines</span> <span class=o>|</span> <span class=n>CountWords</span><span class=p>()</span></code></pre></div></div></div><div class="language-go snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-go data-lang=go><span class=kd>func</s [...] <span class=nx>s</span> <span class=p>=</span> <span class=nx>s</span><span class=p>.</span><span class=nf>Scope</span><span class=p>(</span><span class=s>"CountWords"</span><span class=p>)</span> @@ -260,18 +264,20 @@ values for them. You can then access the options values in your pipeline code.</ <span class=o>.</span><span class=na>as</span><span class=o>(</span><span class=n>WordCountOptions</span><span class=o>.</span><span class=na>class</span><span class=o>);</span> <span class=n>Pipeline</span> <span class=n>p</span> <span class=o>=</span> <span class=n>Pipeline</span><span class=o>.</span><span class=na>create</span><span class=o>(</span><span class=n>options</span><span class=o>);</span> <span class=o>...</span> -<span class=o>}</span></code></pre></div></div></div><div class="language-py snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-py data-lang=py><span class=k>class</span> <span class=nc>WordCountOptions</span><span class=p>(</span><span class=n>PipelineOptions</span><span class=p>):</span> - <span class=nd>@classmethod</span> - <span class=k>def</span> <span class=nf>_add_argparse_args</span><span class=p>(</span><span class=bp>cls</span><span class=p>,</span> <span class=n>parser</span><span class=p>):</span> - <span class=n>parser</span><span class=o>.</span><span class=n>add_argument</span><span class=p>(</span> - <span class=s1>'--input'</span><span class=p>,</span> - <span class=n>help</span><span class=o>=</span><span class=s1>'Input for the pipeline'</span><span class=p>,</span> - <span class=n>default</span><span class=o>=</span><span class=s1>'gs://my-bucket/input'</span><span class=p>)</span> - -<span class=n>options</span> <span class=o>=</span> <span class=n>PipelineOptions</span><span class=p>(</span><span class=n>argv</span><span class=p>)</span> -<span class=n>word_count_options</span> <span class=o>=</span> <span class=n>options</span><span class=o>.</span><span class=n>view_as</span><span class=p>(</span><span class=n>WordCountOptions</span><span class=p>)</span> -<span class=k>with</span> <span class=n>beam</span><span class=o>.</span><span class=n>Pipeline</span><span class=p>(</span><span class=n>options</span><span class=o>=</span><span class=n>options</span><span class=p>)</span> <span class=k>as</span> <span class=n>p</span><span class=p>:</span> - <span class=n>lines</span> <span class=o>=</span> <span class=n>p</span> <span class=o>|</span> <span class=n>beam</span><span class=o>.</span><span class=n>io</span><span class=o>.</span><span class=n>ReadFromText</span><span class=p>(</span><span class=n>word_count_options</span><span class=o>.</span><span class=n>input</span><span class=p>)</span></code></pre></div></div></div><div class="language-go snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs- [...] +<span class=o>}</span></code></pre></div></div></div><div class="language-py snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-toggle=tooltip data-bs-placement=bottom title="Copy to clipboard"><img src=/images/copy-icon.svg></a><div class=highlight><pre class=chroma><code class=language-py data-lang=py><span class=kn>import</span> <span class=nn>argparse</span> + +<span class=n>parser</span> <span class=o>=</span> <span class=n>argparse</span><span class=o>.</span><span class=n>ArgumentParser</span><span class=p>()</span> +<span class=n>parser</span><span class=o>.</span><span class=n>add_argument</span><span class=p>(</span> + <span class=s1>'--input-file'</span><span class=p>,</span> + <span class=n>default</span><span class=o>=</span><span class=s1>'gs://dataflow-samples/shakespeare/kinglear.txt'</span><span class=p>,</span> + <span class=n>help</span><span class=o>=</span><span class=s1>'The file path for the input text to process.'</span><span class=p>)</span> +<span class=n>parser</span><span class=o>.</span><span class=n>add_argument</span><span class=p>(</span> + <span class=s1>'--output-path'</span><span class=p>,</span> <span class=n>required</span><span class=o>=</span><span class=bp>True</span><span class=p>,</span> <span class=n>help</span><span class=o>=</span><span class=s1>'The path prefix for output files.'</span><span class=p>)</span> +<span class=n>args</span><span class=p>,</span> <span class=n>beam_args</span> <span class=o>=</span> <span class=n>parser</span><span class=o>.</span><span class=n>parse_known_args</span><span class=p>()</span> + +<span class=n>beam_options</span> <span class=o>=</span> <span class=n>PipelineOptions</span><span class=p>(</span><span class=n>beam_args</span><span class=p>)</span> +<span class=k>with</span> <span class=n>beam</span><span class=o>.</span><span class=n>Pipeline</span><span class=p>(</span><span class=n>options</span><span class=o>=</span><span class=n>beam_options</span><span class=p>)</span> <span class=k>as</span> <span class=n>pipeline</span><span class=p>:</span> + <span class=n>lines</span> <span class=o>=</span> <span class=n>pipeline</span> <span class=o>|</span> <span class=n>beam</span><span class=o>.</span><span class=n>io</span><span class=o>.</span><span class=n>ReadFromText</span><span class=p>(</span><span class=n>args</span><span class=o>.</span><span class=n>input_file</span><span class=p>)</span></code></pre></div></div></div><div class="language-go snippet"><div class="notebook-skip code-snippet"><a class=copy type=button data-bs-to [...] <span class=kd>func</span> <span class=nf>main</span><span class=p>()</span> <span class=p>{</span> <span class=o>...</span>