This is an automated email from the ASF dual-hosted git repository. git-site-role pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/beam.git
The following commit(s) were added to refs/heads/asf-site by this push: new f50c31f Publishing website 2019/09/20 15:10:43 at commit bb6f9ed f50c31f is described below commit f50c31fe1bca4949f5f3edd70ff080dc54ddec09 Author: jenkins <bui...@apache.org> AuthorDate: Fri Sep 20 15:10:43 2019 +0000 Publishing website 2019/09/20 15:10:43 at commit bb6f9ed --- .../documentation/io/testing/index.html | 368 ++------------------- 1 file changed, 23 insertions(+), 345 deletions(-) diff --git a/website/generated-content/documentation/io/testing/index.html b/website/generated-content/documentation/io/testing/index.html index d840b62..acce2e8 100644 --- a/website/generated-content/documentation/io/testing/index.html +++ b/website/generated-content/documentation/io/testing/index.html @@ -466,9 +466,11 @@ <ul> <li><a href="#it-goals">Goals</a></li> <li><a href="#integration-tests-data-stores-and-kubernetes">Integration tests, data stores, and Kubernetes</a></li> - <li><a href="#running-integration-tests">Running integration tests</a></li> + <li><a href="#running-integration-tests-on-your-machine">Running integration tests on your machine</a></li> + <li><a href="#running-integration-tests-on-pull-requests">Running Integration Tests on Pull Requests</a></li> <li><a href="#performance-testing-dashboard">Performance testing dashboard</a></li> <li><a href="#implementing-integration-tests">Implementing Integration Tests</a></li> + <li><a href="#small-scale-and-large-scale-integration-tests">Small Scale and Large Scale Integration Tests</a></li> </ul> </li> </ul> @@ -623,120 +625,23 @@ limitations under the License. <p>However, when working locally, there is no requirement to use Kubernetes. All of the test infrastructure allows you to pass in connection info, so developers can use their preferred hosting infrastructure for local development.</p> -<h3 id="running-integration-tests">Running integration tests</h3> +<h3 id="running-integration-tests-on-your-machine">Running integration tests on your machine</h3> -<p>The high level steps for running an integration test are:</p> +<p>You can always run the IO integration tests on your own machine. The high level steps for running an integration test are:</p> <ol> <li>Set up the data store corresponding to the test being run.</li> <li>Run the test, passing it connection info from the just created data store.</li> <li>Clean up the data store.</li> </ol> -<p>Since setting up data stores and running the tests involves a number of steps, and we wish to time these tests when running performance benchmarks, we use PerfKit Benchmarker to manage the process end to end. With a single command, you can go from an empty Kubernetes cluster to a running integration test.</p> - -<p>However, <strong>PerfKit Benchmarker is not required for running integration tests</strong>. Therefore, we have listed the steps for both using PerfKit Benchmarker, and manually running the tests below.</p> - -<h4 id="using-perfkit-benchmarker">Using PerfKit Benchmarker</h4> - -<p>Prerequisites:</p> -<ol> - <li><a href="https://github.com/GoogleCloudPlatform/PerfKitBenchmarker">Install PerfKit Benchmarker</a></li> - <li>Have a running Kubernetes cluster you can connect to locally using kubectl. We recommend using Google Kubernetes Engine - it’s proven working for all the use cases we tested.</li> -</ol> - -<p>You won’t need to invoke PerfKit Benchmarker directly. Run <code class="highlighter-rouge">./gradlew performanceTest</code> task in project’s root directory, passing kubernetes scripts of your choice (located in .test_infra/kubernetes directory). It will setup PerfKitBenchmarker for you.</p> - -<p>Example run with the <a href="/documentation/runners/direct/">Direct</a> runner:</p> -<div class="highlighter-rouge"><pre class="highlight"><code>./gradlew performanceTest -DpkbLocation="/Users/me/PerfKitBenchmarker/pkb.py" -DintegrationTestPipelineOptions='["--numberOfRecords=1000"]' -DitModule=sdks/java/io/jdbc/ -DintegrationTest=org.apache.beam.sdk.io.jdbc.JdbcIOIT -DkubernetesScripts="/Users/me/beam/.test-infra/kubernetes/postgres/postgres-service-for-local-dev.yml" -DbeamITOptions="/Users/me/beam/.test-infra/kubernetes/postgres/pkb-config-local.yml" -DintegrationTest [...] -</code></pre> -</div> - -<p>Example run with the <a href="/documentation/runners/dataflow/">Google Cloud Dataflow</a> runner:</p> -<div class="highlighter-rouge"><pre class="highlight"><code>./gradlew performanceTest -DpkbLocation="/Users/me/PerfKitBenchmarker/pkb.py" -DintegrationTestPipelineOptions='["--numberOfRecords=1000", "--project=GOOGLE_CLOUD_PROJECT", "--tempRoot=GOOGLE_STORAGE_BUCKET"]' -DitModule=sdks/java/io/jdbc/ -DintegrationTest=org.apache.beam.sdk.io.jdbc.JdbcIOIT -DkubernetesScripts="/Users/me/beam/.test-infra/kubernetes/postgres/postgres-service-for-local-dev.yml" -DbeamITOptions="/Users/me/beam/. [...] -</code></pre> -</div> - -<p>Example run with the HDFS filesystem and Cloud Dataflow runner:</p> - -<div class="highlighter-rouge"><pre class="highlight"><code>./gradlew performanceTest -DpkbLocation="/Users/me/PerfKitBenchmarker/pkb.py" -DintegrationTestPipelineOptions='["--numberOfRecords=100000", "--project=GOOGLE_CLOUD_PROJECT", "--tempRoot=GOOGLE_STORAGE_BUCKET"]' -DitModule=sdks/java/io/file-based-io-tests/ -DintegrationTest=org.apache.beam.sdk.io.text.TextIOIT -DkubernetesScripts=".test-infra/kubernetes/hadoop/LargeITCluster/hdfs-multi-datanode-cluster.yml,.test-infra/kubernetes [...] -</code></pre> -</div> - -<p>NOTE: When using Direct runner along with HDFS cluster, please set <code class="highlighter-rouge">export HADOOP_USER_NAME=root</code> before runnning <code class="highlighter-rouge">performanceTest</code> task.</p> - -<p>Parameter descriptions:</p> - -<table class="table"> - <thead> - <tr> - <td> - <strong>Option</strong> - </td> - <td> - <strong>Function</strong> - </td> - </tr> - </thead> - <tbody> - <tr> - <td>-DpkbLocation - </td> - <td>Path to PerfKit Benchmarker project. - </td> - </tr> - <tr> - <td>-DintegrationTestPipelineOptions - </td> - <td>Passes pipeline options directly to the test being run. Note that some pipeline options may be runner specific (like "--project" or "--tempRoot"). - </td> - </tr> - <tr> - <td>-DitModule - </td> - <td>Specifies the project submodule of the I/O to test. - </td> - </tr> - <tr> - <td>-DintegrationTest - </td> - <td>Specifies the test to be run (fully qualified reference to class/test method). - </td> - </tr> - <tr> - <td>-DkubernetesScripts - </td> - <td>Paths to scripts with necessary kubernetes infrastructure. - </td> - </tr> - <tr> - <td>-DbeamITOptions - </td> - <td>Path to file with Benchmark configuration (static and dynamic pipeline options. See below for description). - </td> - </tr> - <tr> - <td>-DintegrationTestRunner - </td> - <td>Runner to be used for running the test. Currently possible options are: direct, dataflow. - </td> - </tr> - <tr> - <td>-DbeamExtraProperties - </td> - <td>Any other "extra properties" to be passed to Gradle, eg. "'[filesystem=hdfs]'". - </td> - </tr> - </tbody> -</table> - -<h4 id="without-perfkit-benchmarker">Without PerfKit Benchmarker</h4> +<h4 id="datastore-setup-cleanup">Data store setup/cleanup</h4> <p>If you’re using Kubernetes scripts to host data stores, make sure you can connect to your cluster locally using kubectl. If you have your own data stores already setup, you just need to execute step 3 from below list.</p> <ol> <li>Set up the data store corresponding to the test you wish to run. You can find Kubernetes scripts for all currently supported data stores in <a href="https://github.com/apache/beam/tree/master/.test-infra/kubernetes">.test-infra/kubernetes</a>. <ol> - <li>In some cases, there is a setup script (*.sh). In other cases, you can just run <code class="highlighter-rouge">kubectl create -f [scriptname]</code> to create the data store.</li> + <li>In some cases, there is a dedicated setup script (*.sh). In other cases, you can just run <code class="highlighter-rouge">kubectl create -f [scriptname]</code> to create the data store. You can also let <a href="https://github.com/apache/beam/blob/master/.test-infra/kubernetes/kubernetes.sh">kubernetes.sh</a> script perform some standard steps for you.</li> <li>Convention dictates there will be: <ol> <li>A yml script for the data store itself, plus a <code class="highlighter-rouge">NodePort</code> service. The <code class="highlighter-rouge">NodePort</code> service opens a port to the data store for anyone who connects to the Kubernetes cluster’s machines from within same subnetwork. Such scripts are typically useful when running the scripts on Minikube Kubernetes Engine.</li> @@ -766,9 +671,9 @@ limitations under the License. </li> </ol> -<h5 id="integration-test-task">integrationTest Task</h5> +<h4 id="running-a-test">Running a particular test</h4> -<p>Since <code class="highlighter-rouge">performanceTest</code> task involved running PerfkitBenchmarker, we can’t use it to run the tests manually. For such purposes a more “low-level” task called <code class="highlighter-rouge">integrationTest</code> was introduced.</p> +<p><code class="highlighter-rouge">integrationTest</code> is a dedicated gradle task for running IO integration tests.</p> <p>Example usage on Cloud Dataflow runner:</p> @@ -833,9 +738,11 @@ limitations under the License. </tbody> </table> -<h4 id="running-on-pull-requests">Running Integration Tests on Pull Requests</h4> +<h3 id="running-integration-tests-on-pull-requests">Running Integration Tests on Pull Requests</h3> -<p>Thanks to <a href="https://github.com/janinko/ghprb">ghprb</a> plugin it is possible to run Jenkins jobs when specific phrase is typed in a Github Pull Request’s comment. Integration tests that have Jenkins job defined can be triggered this way. You can run integration tests using these phrases:</p> +<p>Most of the IO integration tests have dedicated Jenkins jobs that run periodically to collect metrics and avoid regressions. Thanks to <a href="https://github.com/janinko/ghprb">ghprb</a> plugin it is also possible to trigger these jobs on demand once a specific phrase is typed in a Github Pull Request’s comment. This way tou can check if your contribution to a certain IO is an improvement or if it makes things worse (hopefully not!).</p> + +<p>To run IO Integration Tests type the following comments in your Pull Request:</p> <table class="table"> <thead> @@ -935,7 +842,7 @@ If you modified/added new Jenkins job definitions in your Pull Request, run the <h3 id="performance-testing-dashboard">Performance testing dashboard</h3> -<p>We measure the performance of IOITs by gathering test execution times from Jenkins jobs that run periodically. The consequent results are stored in a database (BigQuery), therefore we can display them in a form of plots.</p> +<p>As mentioned before, we measure the performance of IOITs by gathering test execution times from Jenkins jobs that run periodically. The consequent results are stored in a database (BigQuery), therefore we can display them in a form of plots.</p> <p>The dashboard gathering all the results is available here: <a href="https://s.apache.org/io-test-dashboards">Performance Testing Dashboard</a></p> @@ -945,10 +852,10 @@ If you modified/added new Jenkins job definitions in your Pull Request, run the <ul> <li><strong>Test code</strong>: the code that does the actual testing: interacting with the I/O transform, reading and writing data, and verifying the data.</li> <li><strong>Kubernetes scripts</strong>: a Kubernetes script that sets up the data store that will be used by the test code.</li> - <li><strong>Integrate with PerfKit Benchmarker</strong>: this allows users to easily invoke PerfKit Benchmarker, creating the Kubernetes resources and running the test code.</li> + <li><strong>Jenkins jobs</strong>: a Jenkins Job DSL script that performs all necessary steps for setting up the data sources, running and cleaning up after the test.</li> </ul> -<p>These three pieces are discussed in detail below.</p> +<p>These two pieces are discussed in detail below.</p> <h4 id="test-code">Test Code</h4> @@ -1022,247 +929,18 @@ If you modified/added new Jenkins job definitions in your Pull Request, run the </li> </ol> -<h4 id="integrate-with-perfkit-benchmarker">Integrate with PerfKit Benchmarker</h4> - -<p>To allow developers to easily invoke your I/O integration test, you should create a PerfKit Benchmarker benchmark configuration file for the data store. Each pipeline option needed by the integration test should have a configuration entry. This is to be passed to perfkit via “beamITOptions” option in “performanceTest” task (described above). The goal is that a checked in config has defaults such that other developers can run the test without changing the configuration.</p> - -<h4 id="defining-the-benchmark-configuration-file">Defining the benchmark configuration file</h4> - -<p>The benchmark configuration file is a yaml file that defines the set of pipeline options for a specific data store. Some of these pipeline options are <strong>static</strong> - they are known ahead of time, before the data store is created (e.g. username/password). Others options are <strong>dynamic</strong> - they are only known once the data store is created (or after we query the Kubernetes cluster for current status).</p> - -<p>All known cases of dynamic pipeline options are for extracting the IP address that the test needs to connect to. For I/O integration tests, we must allow users to specify:</p> +<h4 id="jenkins-jobs">Jenkins jobs</h4> +<p>You can find examples of existing IOIT jenkins job definitions in <a href="https://github.com/apache/beam/tree/master/.test-infra/jenkins">.test-infra/jenkins</a> directory. Look for files caled job_PerformanceTest_*.groovy. The most prominent examples are:</p> <ul> - <li>The type of the IP address to get (load balancer/node address)</li> - <li>The pipeline option to pass that IP address to</li> - <li>How to find the Kubernetes resource with that value (ie. what load balancer service name? what node selector?)</li> + <li><a href="https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_PerformanceTests_JDBC.groovy">JDBC</a> IOIT job</li> + <li><a href="https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_PerformanceTests_MongoDBIO_IT.groovy">MongoDB</a> IOIT job</li> + <li><a href="https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_PerformanceTests_FileBasedIO_IT.groovy">File-based</a> IOIT jobs</li> </ul> -<p>The style of dynamic pipeline options used here should support a variety of other types of values derived from Kubernetes, but we do not have specific examples.</p> - -<p>The dynamic pipeline options are:</p> - -<table class="table"> - <thead> - <tr> - <td> - <strong>Type name</strong> - </td> - <td> - <strong>Meaning</strong> - </td> - <td> - <strong>Selector field name</strong> - </td> - <td> - <strong>Selector field value</strong> - </td> - </tr> - </thead> - <tbody> - <tr> - <td>NodePortIp - </td> - <td>We will be using the IP address of a k8s NodePort service, the value will be an IP address of a Pod - </td> - <td>podLabel - </td> - <td>A kubernetes label selector for a pod whose IP address can be used to connect to - </td> - </tr> - <tr> - <td>LoadBalancerIp - </td> - <td>We will be using the IP address of a k8s LoadBalancer, the value will be an IP address of the load balancer - </td> - <td>serviceName - </td> - <td>The name of the LoadBalancer kubernetes service. - </td> - </tr> - </tbody> -</table> - -<h4 id="benchmark-configuration-files-full-example-configuration-file">Benchmark configuration files: full example configuration file</h4> - -<p>A configuration file will look like this:</p> -<div class="highlighter-rouge"><pre class="highlight"><code>static_pipeline_options: - -postgresUser: postgres - -postgresPassword: postgres -dynamic_pipeline_options: - - paramName: PostgresIp - type: NodePortIp - podLabel: app=postgres -</code></pre> -</div> - -<p>and may contain the following elements:</p> - -<table class="table"> - <thead> - <tr> - <td><strong>Configuration element</strong> - </td> - <td><strong>Description and how to change when adding a new test</strong> - </td> - </tr> - </thead> - <tbody> - <tr> - <td>static_pipeline_options - </td> - <td>The set of preconfigured pipeline options. - </td> - </tr> - <tr> - <td>dynamic_pipeline_options - </td> - <td>The set of pipeline options that PerfKit Benchmarker will determine at runtime. - </td> - </tr> - <tr> - <td>dynamic_pipeline_options.name - </td> - <td>The name of the parameter to be passed to gradle's invocation of the I/O integration test. - </td> - </tr> - <tr> - <td>dynamic_pipeline_options.type - </td> - <td>The method of determining the value of the pipeline options. - </td> - </tr> - <tr> - <td>dynamic_pipeline_options - other attributes - </td> - <td>These vary depending on the type of the dynamic pipeline option - see the table of dynamic pipeline options for a description. - </td> - </tr> - </tbody> -</table> - -<h4 id="customizing-perf-kit-benchmarker-behaviour">Customizing PerfKit Benchmarker behaviour</h4> - -<p>In most cases, to run the <em>performanceTest</em> task it is sufficient to pass the properties described above, which makes it easy to use. However, users can customize Perfkit Benchmarker’s behavior even more by pasing some extra Gradle properties:</p> - -<table class="table"> - <thead> - <tr> - <td><strong>PerfKit Benchmarker Parameter</strong> - </td> - <td><strong>Corresponding Gradle property</strong> - </td> - <td><strong>Default value</strong> - </td> - <td><strong>Description</strong> - </td> - </tr> - </thead> - <tbody> - <tr> - <td>dpb_log_level - </td> - <td>-DlogLevel - </td> - <td>INFO - </td> - <td>Data Processing Backend's log level. - </td> - </tr> - <tr> - <td>gradle_binary - </td> - <td>-DgradleBinary - </td> - <td>./gradlew - </td> - <td>Path to gradle binary. - </td> - </tr> - <tr> - <td>official - </td> - <td>-Dofficial - </td> - <td>false - </td> - <td>If true, the benchmark results are marked as "official" and can be displayed on PerfKitExplorer dashboards. - </td> - </tr> - <tr> - <td>benchmarks - </td> - <td>-Dbenchmarks - </td> - <td>beam_integration_benchmark - </td> - <td>Defines the PerfKit Benchmarker benchmark to run. This is same for all I/O integration tests. - </td> - </tr> - <tr> - <td>beam_prebuilt - </td> - <td>-DbeamPrebuilt - </td> - <td>true - </td> - <td>If false, PerfKit Benchmarker runs the build task before running the tests. - </td> - </tr> - <tr> - <td>beam_sdk - </td> - <td>-DbeamSdk - </td> - <td>java - </td> - <td>Beam's sdk to be used by PerfKit Benchmarker. - </td> - </tr> - <tr> - <td>beam_timeout - </td> - <td>-DitTimeout - </td> - <td>1200 - </td> - <td>Timeout (in seconds) after which PerfKit Benchmarker will stop executing the benchmark (and will fail). - </td> - </tr> - <tr> - <td>kubeconfig - </td> - <td>-Dkubeconfig - </td> - <td>~/.kube/config - </td> - <td>Path to kubernetes configuration file. - </td> - </tr> - <tr> - <td>kubectl - </td> - <td>-Dkubectl - </td> - <td>kubectl - </td> - <td>Path to kubernetes executable. - </td> - </tr> - <tr> - <td>beam_extra_properties - </td> - <td>-DbeamExtraProperties - </td> - <td>(empty string) - </td> - <td>Any additional properties to be appended to benchmark execution command. - </td> - </tr> - </tbody> -</table> +<p>Notice that there is a utility class helpful in creating the jobs easily without forgetting important steps or repeating code. See <a href="https://github.com/apache/beam/blob/master/.test-infra/jenkins/Kubernetes.groovy">Kubernetes.groovy</a> for more details.</p> -<h4 id="small-scale-and-large-scale-integration-tests">Small Scale and Large Scale Integration Tests</h4> +<h3 id="small-scale-and-large-scale-integration-tests">Small Scale and Large Scale Integration Tests</h3> <p>Apache Beam expects that it can run integration tests in multiple configurations:</p> <ul>