This is an automated email from the ASF dual-hosted git repository. mergebot-role pushed a commit to branch mergebot in repository https://gitbox.apache.org/repos/asf/beam-site.git
commit b295157aa7e503ce06ce793b263c9fc81ae83b55 Author: Pei He <p...@apache.org> AuthorDate: Tue Sep 12 10:25:39 2017 +0800 fixup: address comments. --- src/_data/capability-matrix.yml | 2 +- src/documentation/runners/mapreduce.md | 16 ++++++++-------- src/get-started/beam-overview.md | 2 -- src/images/logos/runners/mapreduce.png | Bin 37095 -> 0 bytes 4 files changed, 9 insertions(+), 11 deletions(-) diff --git a/src/_data/capability-matrix.yml b/src/_data/capability-matrix.yml index c4bbb3b..191679e 100644 --- a/src/_data/capability-matrix.yml +++ b/src/_data/capability-matrix.yml @@ -12,7 +12,7 @@ columns: - class: gearpump name: Apache Gearpump - class: mapreduce - name: MapReduce + name: Apache Hadoop MapReduce categories: - description: What is being computed? diff --git a/src/documentation/runners/mapreduce.md b/src/documentation/runners/mapreduce.md index c88870e..8773025 100644 --- a/src/documentation/runners/mapreduce.md +++ b/src/documentation/runners/mapreduce.md @@ -13,10 +13,10 @@ The [Beam Capability Matrix]({{ site.baseurl }}/documentation/runners/capability ## Apache Hadoop MapReduce Runner prerequisites and setup You need to have an Apache Hadoop environment with either [Single Node Setup](https://hadoop.apache.org/docs/r1.2.1/single_node_setup.html) or [Cluster Setup](https://hadoop.apache.org/docs/r1.2.1/cluster_setup.html) -The Apache Hadoop MapReduce runner currently supports Apache Hadoop 2.8.1 version. +The Apache Hadoop MapReduce runner currently supports Apache Hadoop version 2.8.1. -You can add a dependency on the latest version of the Apache Hadoop MapReduce runner by adding to your pom.xml the following: -```java +You can add a dependency on the latest version of the Apache Hadoop MapReduce runner by adding the following to your pom.xml: +``` <dependency> <groupId>org.apache.beam</groupId> <artifactId>beam-runners-mapreduce</artifactId> @@ -25,7 +25,7 @@ You can add a dependency on the latest version of the Apache Hadoop MapReduce ru ``` ## Deploying Apache Hadoop MapReduce with your application -To execute in a local hadoop environment, use this command: +To execute in a local Hadoop environment, use this command: ``` $ mvn exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \ -Pmapreduce-runner \ @@ -35,9 +35,9 @@ $ mvn exec:java -Dexec.mainClass=org.apache.beam.examples.WordCount \ --fileOutputDir=<directory for intermediate outputs>" ``` -To execute in a hadoop cluster, you need to package your program along will all dependencies in a so-called fat jar. +To execute in a Hadoop cluster, package your program along with all dependencies in a fat jar. -If you follow along the [Beam Quickstart]({{ site.baseurl }}/get-started/quickstart/) this is the command that you can run: +If you are following through the [Beam Java SDK Quickstart]({{ site.baseurl }}/get-started/quickstart-java/), you can run this command: ``` $ mvn package -Pflink-runner ``` @@ -65,7 +65,7 @@ When executing your pipeline with the Apache Hadoop MapReduce Runner, you should <tr> <td><code>runner</code></td> <td>The pipeline runner to use. This option allows you to determine the pipeline runner at runtime.</td> - <td>Set to <code>MapReduceRunner</code> to run using the Apache Hadoop MapReduce.</td> + <td>Set to <code>MapReduceRunner</code> to run using Apache Hadoop MapReduce.</td> </tr> <tr> <td><code>jarClass</code></td> @@ -74,7 +74,7 @@ When executing your pipeline with the Apache Hadoop MapReduce Runner, you should </tr> <tr> <td><code>fileOutputDir</code></td> - <td>The directory for files output.</td> + <td>The directory for output files.</td> <td>"/tmp/mapreduce/"</td> </tr> </table> diff --git a/src/get-started/beam-overview.md b/src/get-started/beam-overview.md index e320c3f..1d3bbc6 100644 --- a/src/get-started/beam-overview.md +++ b/src/get-started/beam-overview.md @@ -36,8 +36,6 @@ Beam currently supports Runners that work with the following distributed process alt="Apache Flink"> * Apache Gearpump (incubating) <img src="{{ site.baseurl }}/images/logos/runners/gearpump.png" alt="Apache Gearpump"> -* Apache Hadoop MapReduce <img src="{{ site.baseurl }}/images/logos/runners/mapreduce.png" - alt="Apache Hadoop MapReduce"> * Apache Spark <img src="{{ site.baseurl }}/images/logos/runners/spark.png" alt="Apache Spark"> * Google Cloud Dataflow <img src="{{ site.baseurl }}/images/logos/runners/dataflow.png" diff --git a/src/images/logos/runners/mapreduce.png b/src/images/logos/runners/mapreduce.png deleted file mode 100644 index 78af2c6..0000000 Binary files a/src/images/logos/runners/mapreduce.png and /dev/null differ -- To stop receiving notification emails like this one, please contact "commits@beam.apache.org" <commits@beam.apache.org>.