20 15:10:43 at commit bb6f9ed

git-site-role Fri, 20 Sep 2019 08:11:16 -0700

This is an automated email from the ASF dual-hosted git repository.

git-site-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/beam.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new f50c31f  Publishing website 2019/09/20 15:10:43 at commit bb6f9ed
f50c31f is described below

commit f50c31fe1bca4949f5f3edd70ff080dc54ddec09
Author: jenkins <bui...@apache.org>
AuthorDate: Fri Sep 20 15:10:43 2019 +0000

    Publishing website 2019/09/20 15:10:43 at commit bb6f9ed
---
 .../documentation/io/testing/index.html            | 368 ++-------------------
 1 file changed, 23 insertions(+), 345 deletions(-)

diff --git a/website/generated-content/documentation/io/testing/index.html 
b/website/generated-content/documentation/io/testing/index.html
index d840b62..acce2e8 100644
--- a/website/generated-content/documentation/io/testing/index.html
+++ b/website/generated-content/documentation/io/testing/index.html
@@ -466,9 +466,11 @@
     <ul>
       <li><a href="#it-goals">Goals</a></li>
       <li><a href="#integration-tests-data-stores-and-kubernetes">Integration 
tests, data stores, and Kubernetes</a></li>
-      <li><a href="#running-integration-tests">Running integration 
tests</a></li>
+      <li><a href="#running-integration-tests-on-your-machine">Running 
integration tests on your machine</a></li>
+      <li><a href="#running-integration-tests-on-pull-requests">Running 
Integration Tests on Pull Requests</a></li>
       <li><a href="#performance-testing-dashboard">Performance testing 
dashboard</a></li>
       <li><a href="#implementing-integration-tests">Implementing Integration 
Tests</a></li>
+      <li><a href="#small-scale-and-large-scale-integration-tests">Small Scale 
and Large Scale Integration Tests</a></li>
     </ul>
   </li>
 </ul>
@@ -623,120 +625,23 @@ limitations under the License.
 
 <p>However, when working locally, there is no requirement to use Kubernetes. 
All of the test infrastructure allows you to pass in connection info, so 
developers can use their preferred hosting infrastructure for local 
development.</p>
 
-<h3 id="running-integration-tests">Running integration tests</h3>
+<h3 id="running-integration-tests-on-your-machine">Running integration tests 
on your machine</h3>
 
-<p>The high level steps for running an integration test are:</p>
+<p>You can always run the IO integration tests on your own machine. The high 
level steps for running an integration test are:</p>
 <ol>
   <li>Set up the data store corresponding to the test being run.</li>
   <li>Run the test, passing it connection info from the just created data 
store.</li>
   <li>Clean up the data store.</li>
 </ol>
 
-<p>Since setting up data stores and running the tests involves a number of 
steps, and we wish to time these tests when running performance benchmarks, we 
use PerfKit Benchmarker to manage the process end to end. With a single 
command, you can go from an empty Kubernetes cluster to a running integration 
test.</p>
-
-<p>However, <strong>PerfKit Benchmarker is not required for running 
integration tests</strong>. Therefore, we have listed the steps for both using 
PerfKit Benchmarker, and manually running the tests below.</p>
-
-<h4 id="using-perfkit-benchmarker">Using PerfKit Benchmarker</h4>
-
-<p>Prerequisites:</p>
-<ol>
-  <li><a 
href="https://github.com/GoogleCloudPlatform/PerfKitBenchmarker";>Install 
PerfKit Benchmarker</a></li>
-  <li>Have a running Kubernetes cluster you can connect to locally using 
kubectl. We recommend using Google Kubernetes Engine - it’s proven working for 
all the use cases we tested.</li>
-</ol>
-
-<p>You won’t need to invoke PerfKit Benchmarker directly. Run <code 
class="highlighter-rouge">./gradlew performanceTest</code> task in project’s 
root directory, passing kubernetes scripts of your choice (located in 
.test_infra/kubernetes directory). It will setup PerfKitBenchmarker for you.</p>
-
-<p>Example run with the <a href="/documentation/runners/direct/">Direct</a> 
runner:</p>
-<div class="highlighter-rouge"><pre class="highlight"><code>./gradlew 
performanceTest -DpkbLocation="/Users/me/PerfKitBenchmarker/pkb.py" 
-DintegrationTestPipelineOptions='["--numberOfRecords=1000"]' 
-DitModule=sdks/java/io/jdbc/ 
-DintegrationTest=org.apache.beam.sdk.io.jdbc.JdbcIOIT 
-DkubernetesScripts="/Users/me/beam/.test-infra/kubernetes/postgres/postgres-service-for-local-dev.yml"
 
-DbeamITOptions="/Users/me/beam/.test-infra/kubernetes/postgres/pkb-config-local.yml"
 -DintegrationTest [...]
-</code></pre>
-</div>
-
-<p>Example run with the <a href="/documentation/runners/dataflow/">Google 
Cloud Dataflow</a> runner:</p>
-<div class="highlighter-rouge"><pre class="highlight"><code>./gradlew 
performanceTest -DpkbLocation="/Users/me/PerfKitBenchmarker/pkb.py" 
-DintegrationTestPipelineOptions='["--numberOfRecords=1000", 
"--project=GOOGLE_CLOUD_PROJECT", "--tempRoot=GOOGLE_STORAGE_BUCKET"]' 
-DitModule=sdks/java/io/jdbc/ 
-DintegrationTest=org.apache.beam.sdk.io.jdbc.JdbcIOIT 
-DkubernetesScripts="/Users/me/beam/.test-infra/kubernetes/postgres/postgres-service-for-local-dev.yml"
 -DbeamITOptions="/Users/me/beam/. [...]
-</code></pre>
-</div>
-
-<p>Example run with the HDFS filesystem and Cloud Dataflow runner:</p>
-
-<div class="highlighter-rouge"><pre class="highlight"><code>./gradlew 
performanceTest -DpkbLocation="/Users/me/PerfKitBenchmarker/pkb.py" 
-DintegrationTestPipelineOptions='["--numberOfRecords=100000", 
"--project=GOOGLE_CLOUD_PROJECT", "--tempRoot=GOOGLE_STORAGE_BUCKET"]' 
-DitModule=sdks/java/io/file-based-io-tests/ 
-DintegrationTest=org.apache.beam.sdk.io.text.TextIOIT 
-DkubernetesScripts=".test-infra/kubernetes/hadoop/LargeITCluster/hdfs-multi-datanode-cluster.yml,.test-infra/kubernetes
 [...]
-</code></pre>
-</div>
-
-<p>NOTE: When using Direct runner along with HDFS cluster, please set <code 
class="highlighter-rouge">export HADOOP_USER_NAME=root</code> before runnning 
<code class="highlighter-rouge">performanceTest</code> task.</p>
-
-<p>Parameter descriptions:</p>
-
-<table class="table">
-  <thead>
-    <tr>
-     <td>
-      <strong>Option</strong>
-     </td>
-     <td>
-       <strong>Function</strong>
-     </td>
-    </tr>
-  </thead>
-  <tbody>
-    <tr>
-     <td>-DpkbLocation
-     </td>
-     <td>Path to PerfKit Benchmarker project.
-     </td>
-    </tr>
-    <tr>
-     <td>-DintegrationTestPipelineOptions
-     </td>
-     <td>Passes pipeline options directly to the test being run. Note that 
some pipeline options may be runner specific (like "--project" or 
"--tempRoot"). 
-     </td>
-    </tr>
-    <tr>
-     <td>-DitModule
-     </td>
-     <td>Specifies the project submodule of the I/O to test.
-     </td>
-    </tr>
-    <tr>
-     <td>-DintegrationTest
-     </td>
-     <td>Specifies the test to be run (fully qualified reference to class/test 
method).
-     </td>
-    </tr>
-    <tr>
-     <td>-DkubernetesScripts
-     </td>
-     <td>Paths to scripts with necessary kubernetes infrastructure.
-     </td>
-    </tr>
-    <tr>
-      <td>-DbeamITOptions
-      </td>
-      <td>Path to file with Benchmark configuration (static and dynamic 
pipeline options. See below for description).
-      </td>
-    </tr>
-    <tr>
-      <td>-DintegrationTestRunner
-      </td>
-      <td>Runner to be used for running the test. Currently possible options 
are: direct, dataflow.
-      </td>
-    </tr>
-    <tr>
-      <td>-DbeamExtraProperties
-      </td>
-      <td>Any other "extra properties" to be passed to Gradle, eg. 
"'[filesystem=hdfs]'". 
-      </td>
-    </tr>
-  </tbody>
-</table>
-
-<h4 id="without-perfkit-benchmarker">Without PerfKit Benchmarker</h4>
+<h4 id="datastore-setup-cleanup">Data store setup/cleanup</h4>
 
 <p>If you’re using Kubernetes scripts to host data stores, make sure you can 
connect to your cluster locally using kubectl. If you have your own data stores 
already setup, you just need to execute step 3 from below list.</p>
 
 <ol>
   <li>Set up the data store corresponding to the test you wish to run. You can 
find Kubernetes scripts for all currently supported data stores in <a 
href="https://github.com/apache/beam/tree/master/.test-infra/kubernetes";>.test-infra/kubernetes</a>.
     <ol>
-      <li>In some cases, there is a setup script (*.sh). In other cases, you 
can just run <code class="highlighter-rouge">kubectl create -f 
[scriptname]</code> to create the data store.</li>
+      <li>In some cases, there is a dedicated setup script (*.sh). In other 
cases, you can just run <code class="highlighter-rouge">kubectl create -f 
[scriptname]</code> to create the data store. You can also let <a 
href="https://github.com/apache/beam/blob/master/.test-infra/kubernetes/kubernetes.sh";>kubernetes.sh</a>
 script perform some standard steps for you.</li>
       <li>Convention dictates there will be:
         <ol>
           <li>A yml script for the data store itself, plus a <code 
class="highlighter-rouge">NodePort</code> service. The <code 
class="highlighter-rouge">NodePort</code> service opens a port to the data 
store for anyone who connects to the Kubernetes cluster’s machines from within 
same subnetwork. Such scripts are typically useful when running the scripts on 
Minikube Kubernetes Engine.</li>
@@ -766,9 +671,9 @@ limitations under the License.
   </li>
 </ol>
 
-<h5 id="integration-test-task">integrationTest Task</h5>
+<h4 id="running-a-test">Running a particular test</h4>
 
-<p>Since <code class="highlighter-rouge">performanceTest</code> task involved 
running PerfkitBenchmarker, we can’t use it to run the tests manually. For such 
purposes a more “low-level” task called <code 
class="highlighter-rouge">integrationTest</code> was introduced.</p>
+<p><code class="highlighter-rouge">integrationTest</code> is a dedicated 
gradle task for running IO integration tests.</p>
 
 <p>Example usage on Cloud Dataflow runner:</p>
 
@@ -833,9 +738,11 @@ limitations under the License.
   </tbody>
 </table>
 
-<h4 id="running-on-pull-requests">Running Integration Tests on Pull 
Requests</h4>
+<h3 id="running-integration-tests-on-pull-requests">Running Integration Tests 
on Pull Requests</h3>
 
-<p>Thanks to <a href="https://github.com/janinko/ghprb";>ghprb</a> plugin it is 
possible to run Jenkins jobs when specific phrase is typed in a Github Pull 
Request’s comment. Integration tests that have Jenkins job defined can be 
triggered this way. You can run integration tests using these phrases:</p>
+<p>Most of the IO integration tests have dedicated Jenkins jobs that run 
periodically to collect metrics and avoid regressions. Thanks to <a 
href="https://github.com/janinko/ghprb";>ghprb</a> plugin it is also possible to 
trigger these jobs on demand once a specific phrase is typed in a Github Pull 
Request’s comment. This way tou can check if your contribution to a certain IO 
is an improvement or if it makes things worse (hopefully not!).</p>
+
+<p>To run IO Integration Tests type the following comments in your Pull 
Request:</p>
 
 <table class="table">
   <thead>
@@ -935,7 +842,7 @@ If you modified/added new Jenkins job definitions in your 
Pull Request, run the
 
 <h3 id="performance-testing-dashboard">Performance testing dashboard</h3>
 
-<p>We measure the performance of IOITs by gathering test execution times from 
Jenkins jobs that run periodically. The consequent results are stored in a 
database (BigQuery), therefore we can display them in a form of plots.</p>
+<p>As mentioned before, we measure the performance of IOITs by gathering test 
execution times from Jenkins jobs that run periodically. The consequent results 
are stored in a database (BigQuery), therefore we can display them in a form of 
plots.</p>
 
 <p>The dashboard gathering all the results is available here: <a 
href="https://s.apache.org/io-test-dashboards";>Performance Testing 
Dashboard</a></p>
 
@@ -945,10 +852,10 @@ If you modified/added new Jenkins job definitions in your 
Pull Request, run the
 <ul>
   <li><strong>Test code</strong>: the code that does the actual testing: 
interacting with the I/O transform, reading and writing data, and verifying the 
data.</li>
   <li><strong>Kubernetes scripts</strong>: a Kubernetes script that sets up 
the data store that will be used by the test code.</li>
-  <li><strong>Integrate with PerfKit Benchmarker</strong>: this allows users 
to easily invoke PerfKit Benchmarker, creating the Kubernetes resources and 
running the test code.</li>
+  <li><strong>Jenkins jobs</strong>: a Jenkins Job DSL script that performs 
all necessary steps for setting up the data sources, running and cleaning up 
after the test.</li>
 </ul>
 
-<p>These three pieces are discussed in detail below.</p>
+<p>These two pieces are discussed in detail below.</p>
 
 <h4 id="test-code">Test Code</h4>
 
@@ -1022,247 +929,18 @@ If you modified/added new Jenkins job definitions in 
your Pull Request, run the
   </li>
 </ol>
 
-<h4 id="integrate-with-perfkit-benchmarker">Integrate with PerfKit 
Benchmarker</h4>
-
-<p>To allow developers to easily invoke your I/O integration test, you should 
create a PerfKit Benchmarker benchmark configuration file for the data store. 
Each pipeline option needed by the integration test should have a configuration 
entry. This is to be passed to perfkit via “beamITOptions” option in 
“performanceTest” task (described above). The goal is that a checked in config 
has defaults such that other developers can run the test without changing the 
configuration.</p>
-
-<h4 id="defining-the-benchmark-configuration-file">Defining the benchmark 
configuration file</h4>
-
-<p>The benchmark configuration file is a yaml file that defines the set of 
pipeline options for a specific data store. Some of these pipeline options are 
<strong>static</strong> - they are known ahead of time, before the data store 
is created (e.g. username/password). Others options are 
<strong>dynamic</strong> - they are only known once the data store is created 
(or after we query the Kubernetes cluster for current status).</p>
-
-<p>All known cases of dynamic pipeline options are for extracting the IP 
address that the test needs to connect to. For I/O integration tests, we must 
allow users to specify:</p>
+<h4 id="jenkins-jobs">Jenkins jobs</h4>
 
+<p>You can find examples of existing IOIT jenkins job definitions in <a 
href="https://github.com/apache/beam/tree/master/.test-infra/jenkins";>.test-infra/jenkins</a>
 directory. Look for files caled job_PerformanceTest_*.groovy. The most 
prominent examples are:</p>
 <ul>
-  <li>The type of the IP address to get (load balancer/node address)</li>
-  <li>The pipeline option to pass that IP address to</li>
-  <li>How to find the Kubernetes resource with that value (ie. what load 
balancer service name? what node selector?)</li>
+  <li><a 
href="https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_PerformanceTests_JDBC.groovy";>JDBC</a>
 IOIT job</li>
+  <li><a 
href="https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_PerformanceTests_MongoDBIO_IT.groovy";>MongoDB</a>
 IOIT job</li>
+  <li><a 
href="https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_PerformanceTests_FileBasedIO_IT.groovy";>File-based</a>
 IOIT jobs</li>
 </ul>
 
-<p>The style of dynamic pipeline options used here should support a variety of 
other types of values derived from Kubernetes, but we do not have specific 
examples.</p>
-
-<p>The dynamic pipeline options are:</p>
-
-<table class="table">
-  <thead>
-    <tr>
-     <td>
-       <strong>Type name</strong>
-     </td>
-     <td>
-       <strong>Meaning</strong>
-     </td>
-     <td>
-       <strong>Selector field name</strong>
-     </td>
-     <td>
-       <strong>Selector field value</strong>
-     </td>
-    </tr>
-  </thead>
-  <tbody>
-    <tr>
-     <td>NodePortIp
-     </td>
-     <td>We will be using the IP address of a k8s NodePort service, the value 
will be an IP address of a Pod
-     </td>
-     <td>podLabel
-     </td>
-     <td>A kubernetes label selector for a pod whose IP address can be used to 
connect to
-     </td>
-    </tr>
-    <tr>
-     <td>LoadBalancerIp
-     </td>
-     <td>We will be using the IP address of a k8s LoadBalancer, the value will 
be an IP address of the load balancer
-     </td>
-     <td>serviceName
-     </td>
-     <td>The name of the LoadBalancer kubernetes service.
-     </td>
-    </tr>
-  </tbody>
-</table>
-
-<h4 
id="benchmark-configuration-files-full-example-configuration-file">Benchmark 
configuration files: full example configuration file</h4>
-
-<p>A configuration file will look like this:</p>
-<div class="highlighter-rouge"><pre 
class="highlight"><code>static_pipeline_options:
-  -postgresUser: postgres
-  -postgresPassword: postgres
-dynamic_pipeline_options:
-  - paramName: PostgresIp
-    type: NodePortIp
-    podLabel: app=postgres
-</code></pre>
-</div>
-
-<p>and may contain the following elements:</p>
-
-<table class="table">
-  <thead>
-    <tr>
-     <td><strong>Configuration element</strong>
-     </td>
-     <td><strong>Description and how to change when adding a new test</strong>
-     </td>
-    </tr>
-  </thead>
-  <tbody>
-    <tr>
-     <td>static_pipeline_options
-     </td>
-     <td>The set of preconfigured pipeline options.
-     </td>
-    </tr>
-    <tr>
-     <td>dynamic_pipeline_options
-     </td>
-     <td>The set of pipeline options that PerfKit Benchmarker will determine 
at runtime.
-     </td>
-    </tr>
-    <tr>
-     <td>dynamic_pipeline_options.name
-     </td>
-     <td>The name of the parameter to be passed to gradle's invocation of the 
I/O integration test.
-     </td>
-    </tr>
-    <tr>
-     <td>dynamic_pipeline_options.type
-     </td>
-     <td>The method of determining the value of the pipeline options.
-     </td>
-    </tr>
-    <tr>
-     <td>dynamic_pipeline_options - other attributes
-     </td>
-     <td>These vary depending on the type of the dynamic pipeline option - see 
the table of dynamic pipeline options for a description.
-     </td>
-    </tr>
-  </tbody>
-</table>
-
-<h4 id="customizing-perf-kit-benchmarker-behaviour">Customizing PerfKit 
Benchmarker behaviour</h4>
-
-<p>In most cases, to run the <em>performanceTest</em> task it is sufficient to 
pass the properties described above, which makes it easy to use. However, users 
can customize Perfkit Benchmarker’s behavior even more by pasing some extra 
Gradle properties:</p>
-
-<table class="table">
-  <thead>
-    <tr>
-     <td><strong>PerfKit Benchmarker Parameter</strong>
-     </td>
-     <td><strong>Corresponding Gradle property</strong>
-     </td>
-     <td><strong>Default value</strong>
-     </td>
-     <td><strong>Description</strong>
-     </td>
-    </tr>
-  </thead>
-  <tbody>
-    <tr>
-     <td>dpb_log_level
-     </td>
-     <td>-DlogLevel
-     </td>
-     <td>INFO
-     </td>
-     <td>Data Processing Backend's log level.
-     </td>
-    </tr>
-    <tr>
-     <td>gradle_binary
-     </td>
-     <td>-DgradleBinary
-     </td>
-     <td>./gradlew
-     </td>
-     <td>Path to gradle binary.
-     </td>
-    </tr>
-    <tr>
-     <td>official
-     </td>
-     <td>-Dofficial
-     </td>
-     <td>false
-     </td>
-     <td>If true, the benchmark results are marked as "official" and can be 
displayed on PerfKitExplorer dashboards.
-     </td>
-    </tr>
-    <tr>
-     <td>benchmarks
-     </td>
-     <td>-Dbenchmarks
-     </td>
-     <td>beam_integration_benchmark
-     </td>
-     <td>Defines the PerfKit Benchmarker benchmark to run. This is same for 
all I/O integration tests.
-     </td>
-    </tr>
-    <tr>
-     <td>beam_prebuilt
-     </td>
-     <td>-DbeamPrebuilt
-     </td>
-     <td>true
-     </td>
-     <td>If false, PerfKit Benchmarker runs the build task before running the 
tests.
-     </td>
-    </tr>
-    <tr>
-     <td>beam_sdk
-     </td>
-     <td>-DbeamSdk
-     </td>
-     <td>java
-     </td>
-     <td>Beam's sdk to be used by PerfKit Benchmarker.
-     </td>
-    </tr>
-    <tr>
-     <td>beam_timeout
-     </td>
-     <td>-DitTimeout
-     </td>
-     <td>1200
-     </td>
-     <td>Timeout (in seconds) after which PerfKit Benchmarker will stop 
executing the benchmark (and will fail).
-     </td>
-    </tr>
-    <tr>
-     <td>kubeconfig
-     </td>
-     <td>-Dkubeconfig
-     </td>
-     <td>~/.kube/config
-     </td>
-     <td>Path to kubernetes configuration file.
-     </td>
-    </tr>
-    <tr>
-     <td>kubectl
-     </td>
-     <td>-Dkubectl
-     </td>
-     <td>kubectl
-     </td>
-     <td>Path to kubernetes executable.
-     </td>
-    </tr>
-    <tr>
-     <td>beam_extra_properties
-     </td>
-     <td>-DbeamExtraProperties
-     </td>
-     <td>(empty string)
-     </td>
-     <td>Any additional properties to be appended to benchmark execution 
command.
-     </td>
-    </tr>
-  </tbody>
-</table>
+<p>Notice that there is a utility class helpful in creating the jobs easily 
without forgetting important steps or repeating code. See <a 
href="https://github.com/apache/beam/blob/master/.test-infra/jenkins/Kubernetes.groovy";>Kubernetes.groovy</a>
 for more details.</p>
 
-<h4 id="small-scale-and-large-scale-integration-tests">Small Scale and Large 
Scale Integration Tests</h4>
+<h3 id="small-scale-and-large-scale-integration-tests">Small Scale and Large 
Scale Integration Tests</h3>
 
 <p>Apache Beam expects that it can run integration tests in multiple 
configurations:</p>
 <ul>

[beam] branch asf-site updated: Publishing website 2019/09/20 15:10:43 at commit bb6f9ed

Reply via email to