This is an automated email from the ASF dual-hosted git repository. dongjoon pushed a commit to branch asf-site in repository https://gitbox.apache.org/repos/asf/spark-website.git
The following commit(s) were added to refs/heads/asf-site by this push: new 8555334 Fix configuration pages (2.4.1/2.3.3/2.2.3) (#192) 8555334 is described below commit 85553345bc1735e7fa3d8b90cf822fc8758ab2fa Author: Dongjoon Hyun <dongj...@apache.org> AuthorDate: Fri Apr 5 11:28:14 2019 -0700 Fix configuration pages (2.4.1/2.3.3/2.2.3) (#192) The followings are the fixed screen shots. **2.4.1** <img width="948" alt="Screen Shot 2019-04-05 at 9 32 22 AM" src="https://user-images.githubusercontent.com/9700541/55642682-01dcea00-5786-11e9-94b1-412e348fe04d.png"> **2.3.3** <img width="925" alt="Screen Shot 2019-04-05 at 9 33 03 AM" src="https://user-images.githubusercontent.com/9700541/55642683-05707100-5786-11e9-8904-c33124aa3e7d.png"> **2.2.3** <img width="925" alt="Screen Shot 2019-04-05 at 9 33 19 AM" src="https://user-images.githubusercontent.com/9700541/55642694-086b6180-5786-11e9-830a-dc16fc20a227.png"> --- site/docs/2.2.3/configuration.html | 152 +++++++++++++++++-------------- site/docs/2.3.3/configuration.html | 182 ++++++++++++++++++++----------------- site/docs/2.4.1/configuration.html | 182 ++++++++++++++++++++----------------- 3 files changed, 288 insertions(+), 228 deletions(-) diff --git a/site/docs/2.2.3/configuration.html b/site/docs/2.2.3/configuration.html index b0fa1dd..201ba6c 100644 --- a/site/docs/2.2.3/configuration.html +++ b/site/docs/2.2.3/configuration.html @@ -134,10 +134,33 @@ <li><a href="#runtime-environment" id="markdown-toc-runtime-environment">Runtime Environment</a></li> <li><a href="#shuffle-behavior" id="markdown-toc-shuffle-behavior">Shuffle Behavior</a></li> <li><a href="#spark-ui" id="markdown-toc-spark-ui">Spark UI</a></li> + <li><a href="#compression-and-serialization" id="markdown-toc-compression-and-serialization">Compression and Serialization</a></li> + <li><a href="#memory-management" id="markdown-toc-memory-management">Memory Management</a></li> + <li><a href="#execution-behavior" id="markdown-toc-execution-behavior">Execution Behavior</a></li> + <li><a href="#networking" id="markdown-toc-networking">Networking</a></li> + <li><a href="#scheduling" id="markdown-toc-scheduling">Scheduling</a></li> + <li><a href="#dynamic-allocation" id="markdown-toc-dynamic-allocation">Dynamic Allocation</a></li> + <li><a href="#security" id="markdown-toc-security">Security</a></li> + <li><a href="#tls--ssl" id="markdown-toc-tls--ssl">TLS / SSL</a></li> + <li><a href="#spark-sql" id="markdown-toc-spark-sql">Spark SQL</a></li> + <li><a href="#spark-streaming" id="markdown-toc-spark-streaming">Spark Streaming</a></li> + <li><a href="#sparkr" id="markdown-toc-sparkr">SparkR</a></li> + <li><a href="#graphx" id="markdown-toc-graphx">GraphX</a></li> + <li><a href="#deploy" id="markdown-toc-deploy">Deploy</a></li> + <li><a href="#cluster-managers" id="markdown-toc-cluster-managers">Cluster Managers</a> <ul> + <li><a href="#yarn" id="markdown-toc-yarn">YARN</a></li> + <li><a href="#mesos" id="markdown-toc-mesos">Mesos</a></li> + <li><a href="#standalone-mode" id="markdown-toc-standalone-mode">Standalone Mode</a></li> + </ul> + </li> </ul> </li> </ul> </li> + <li><a href="#environment-variables" id="markdown-toc-environment-variables">Environment Variables</a></li> + <li><a href="#configuring-logging" id="markdown-toc-configuring-logging">Configuring Logging</a></li> + <li><a href="#overriding-configuration-directory" id="markdown-toc-overriding-configuration-directory">Overriding configuration directory</a></li> + <li><a href="#inheriting-hadoop-cluster-configuration" id="markdown-toc-inheriting-hadoop-cluster-configuration">Inheriting Hadoop Cluster Configuration</a></li> </ul> <p>Spark provides three locations to configure the system:</p> @@ -940,11 +963,11 @@ of the most common options to set are:</p> <td> The maximum allowed size for a HTTP request header, in bytes unless otherwise specified. This setting applies for the Spark History Server too. - <td> -</tr> -</table> + </td> +</tr> +</table> -### Compression and Serialization +<h3 id="compression-and-serialization">Compression and Serialization</h3> <table class="table"> <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr> @@ -1093,7 +1116,7 @@ of the most common options to set are:</p> </tr> </table> -### Memory Management +<h3 id="memory-management">Memory Management</h3> <table class="table"> <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr> @@ -1193,7 +1216,7 @@ of the most common options to set are:</p> </tr> </table> -### Execution Behavior +<h3 id="execution-behavior">Execution Behavior</h3> <table class="table"> <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr> @@ -1330,7 +1353,7 @@ of the most common options to set are:</p> </tr> </table> -### Networking +<h3 id="networking">Networking</h3> <table class="table"> <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr> @@ -1440,7 +1463,7 @@ of the most common options to set are:</p> </tr> </table> -### Scheduling +<h3 id="scheduling">Scheduling</h3> <table class="table"> <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr> @@ -1707,7 +1730,7 @@ of the most common options to set are:</p> </tr> </table> -### Dynamic Allocation +<h3 id="dynamic-allocation">Dynamic Allocation</h3> <table class="table"> <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr> @@ -1789,7 +1812,7 @@ of the most common options to set are:</p> </tr> </table> -### Security +<h3 id="security">Security</h3> <table class="table"> <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr> @@ -1979,7 +2002,7 @@ of the most common options to set are:</p> </tr> </table> -### TLS / SSL +<h3 id="tls--ssl">TLS / SSL</h3> <table class="table"> <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr> @@ -2098,44 +2121,42 @@ of the most common options to set are:</p> </tr> </table> +<h3 id="spark-sql">Spark SQL</h3> -### Spark SQL - -Running the <code>SET -v</code> command will show the entire list of the SQL configuration. +<p>Running the <code>SET -v</code> command will show the entire list of the SQL configuration.</p> <div class="codetabs"> <div data-lang="scala"> - <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="c1">// spark is an existing SparkSession</span> + <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="c1">// spark is an existing SparkSession</span> <span class="n">spark</span><span class="o">.</span><span class="n">sql</span><span class="o">(</span><span class="s">"SET -v"</span><span class="o">).</span><span class="n">show</span><span class="o">(</span><span class="n">numRows</span> <span class="k">=</span> <span class="mi">200</span><span class="o">,</span> <span class="n">truncate</span> <span class="k">=</span> <span class="kc">false</span><span class="o">)</span></code></pre></figure> - </div> + </div> <div data-lang="java"> - <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="c1">// spark is an existing SparkSession</span> + <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="c1">// spark is an existing SparkSession</span> <span class="n">spark</span><span class="o">.</span><span class="na">sql</span><span class="o">(</span><span class="s">"SET -v"</span><span class="o">).</span><span class="na">show</span><span class="o">(</span><span class="mi">200</span><span class="o">,</span> <span class="kc">false</span><span class="o">);</span></code></pre></figure> - </div> + </div> <div data-lang="python"> - <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="c1"># spark is an existing SparkSession</span> + <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="c1"># spark is an existing SparkSession</span> <span class="n">spark</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span><span class="s2">"SET -v"</span><span class="p">)</span><span class="o">.</span><span class="n">show</span><span class="p">(</span><span class="n">n</span><span class="o">=</span><span class="mi">200</span><span class="p">,</span> <span class="n">truncate</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span></code></pre></figure> - </div> + </div> <div data-lang="r"> - <figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span>sparkR.session<span class="p">()</span> + <figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span>sparkR.session<span class="p">()</span> properties <span class="o"><-</span> sql<span class="p">(</span><span class="s">"SET -v"</span><span class="p">)</span> showDF<span class="p">(</span>properties<span class="p">,</span> numRows <span class="o">=</span> <span class="m">200</span><span class="p">,</span> truncate <span class="o">=</span> <span class="kc">FALSE</span><span class="p">)</span></code></pre></figure> - </div> + </div> </div> - -### Spark Streaming +<h3 id="spark-streaming">Spark Streaming</h3> <table class="table"> <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr> @@ -2257,7 +2278,7 @@ showDF<span class="p">(</span>properties<span class="p">,</span> numRows <span c </tr> </table> -### SparkR +<h3 id="sparkr">SparkR</h3> <table class="table"> <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr> @@ -2307,7 +2328,7 @@ showDF<span class="p">(</span>properties<span class="p">,</span> numRows <span c </table> -### GraphX +<h3 id="graphx">GraphX</h3> <table class="table"> <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr> @@ -2321,7 +2342,7 @@ showDF<span class="p">(</span>properties<span class="p">,</span> numRows <span c </tr> </table> -### Deploy +<h3 id="deploy">Deploy</h3> <table class="table"> <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr> @@ -2343,30 +2364,28 @@ showDF<span class="p">(</span>properties<span class="p">,</span> numRows <span c </tr> </table> +<h3 id="cluster-managers">Cluster Managers</h3> -### Cluster Managers - -Each cluster manager in Spark has additional configuration options. Configurations -can be found on the pages for each mode: +<p>Each cluster manager in Spark has additional configuration options. Configurations +can be found on the pages for each mode:</p> -#### [YARN](running-on-yarn.html#configuration) +<h4 id="yarn"><a href="running-on-yarn.html#configuration">YARN</a></h4> -#### [Mesos](running-on-mesos.html#configuration) +<h4 id="mesos"><a href="running-on-mesos.html#configuration">Mesos</a></h4> -#### [Standalone Mode](spark-standalone.html#cluster-launch-scripts) +<h4 id="standalone-mode"><a href="spark-standalone.html#cluster-launch-scripts">Standalone Mode</a></h4> -# Environment Variables +<h1 id="environment-variables">Environment Variables</h1> -Certain Spark settings can be configured through environment variables, which are read from the -`conf/spark-env.sh` script in the directory where Spark is installed (or `conf/spark-env.cmd` on +<p>Certain Spark settings can be configured through environment variables, which are read from the +<code>conf/spark-env.sh</code> script in the directory where Spark is installed (or <code>conf/spark-env.cmd</code> on Windows). In Standalone and Mesos modes, this file can give machine specific information such as -hostnames. It is also sourced when running local Spark applications or submission scripts. - -Note that `conf/spark-env.sh` does not exist by default when Spark is installed. However, you can -copy `conf/spark-env.sh.template` to create it. Make sure you make the copy executable. +hostnames. It is also sourced when running local Spark applications or submission scripts.</p> -The following variables can be set in `spark-env.sh`: +<p>Note that <code>conf/spark-env.sh</code> does not exist by default when Spark is installed. However, you can +copy <code>conf/spark-env.sh.template</code> to create it. Make sure you make the copy executable.</p> +<p>The following variables can be set in <code>spark-env.sh</code>:</p> <table class="table"> <tr><th style="width:21%">Environment Variable</th><th>Meaning</th></tr> @@ -2399,42 +2418,43 @@ The following variables can be set in `spark-env.sh`: </tr> </table> -In addition to the above, there are also options for setting up the Spark -[standalone cluster scripts](spark-standalone.html#cluster-launch-scripts), such as number of cores -to use on each machine and maximum memory. +<p>In addition to the above, there are also options for setting up the Spark +<a href="spark-standalone.html#cluster-launch-scripts">standalone cluster scripts</a>, such as number of cores +to use on each machine and maximum memory.</p> -Since `spark-env.sh` is a shell script, some of these can be set programmatically -- for example, you might -compute `SPARK_LOCAL_IP` by looking up the IP of a specific network interface. +<p>Since <code>spark-env.sh</code> is a shell script, some of these can be set programmatically – for example, you might +compute <code>SPARK_LOCAL_IP</code> by looking up the IP of a specific network interface.</p> -Note: When running Spark on YARN in `cluster` mode, environment variables need to be set using the `spark.yarn.appMasterEnv.[EnvironmentVariableName]` property in your `conf/spark-defaults.conf` file. Environment variables that are set in `spark-env.sh` will not be reflected in the YARN Application Master process in `cluster` mode. See the [YARN-related Spark Properties](running-on-yarn.html#spark-properties) for more information. +<p>Note: When running Spark on YARN in <code>cluster</code> mode, environment variables need to be set using the <code>spark.yarn.appMasterEnv.[EnvironmentVariableName]</code> property in your <code>conf/spark-defaults.conf</code> file. Environment variables that are set in <code>spark-env.sh</code> will not be reflected in the YARN Application Master process in <code>cluster</code> mode. See the <a href="running-on-yarn.html#spark-properties">YARN-related Spark Properties</a> for more [...] -# Configuring Logging +<h1 id="configuring-logging">Configuring Logging</h1> -Spark uses [log4j](http://logging.apache.org/log4j/) for logging. You can configure it by adding a -`log4j.properties` file in the `conf` directory. One way to start is to copy the existing -`log4j.properties.template` located there. +<p>Spark uses <a href="http://logging.apache.org/log4j/">log4j</a> for logging. You can configure it by adding a +<code>log4j.properties</code> file in the <code>conf</code> directory. One way to start is to copy the existing +<code>log4j.properties.template</code> located there.</p> -# Overriding configuration directory +<h1 id="overriding-configuration-directory">Overriding configuration directory</h1> -To specify a different configuration directory other than the default "SPARK_HOME/conf", +<p>To specify a different configuration directory other than the default “SPARK_HOME/conf”, you can set SPARK_CONF_DIR. Spark will use the configuration files (spark-defaults.conf, spark-env.sh, log4j.properties, etc) -from this directory. +from this directory.</p> -# Inheriting Hadoop Cluster Configuration +<h1 id="inheriting-hadoop-cluster-configuration">Inheriting Hadoop Cluster Configuration</h1> -If you plan to read and write from HDFS using Spark, there are two Hadoop configuration files that -should be included on Spark's classpath: +<p>If you plan to read and write from HDFS using Spark, there are two Hadoop configuration files that +should be included on Spark’s classpath:</p> -* `hdfs-site.xml`, which provides default behaviors for the HDFS client. -* `core-site.xml`, which sets the default filesystem name. +<ul> + <li><code>hdfs-site.xml</code>, which provides default behaviors for the HDFS client.</li> + <li><code>core-site.xml</code>, which sets the default filesystem name.</li> +</ul> -The location of these configuration files varies across Hadoop versions, but -a common location is inside of `/etc/hadoop/conf`. Some tools create -configurations on-the-fly, but offer a mechanisms to download copies of them. +<p>The location of these configuration files varies across Hadoop versions, but +a common location is inside of <code>/etc/hadoop/conf</code>. Some tools create +configurations on-the-fly, but offer a mechanisms to download copies of them.</p> -To make these files visible to Spark, set `HADOOP_CONF_DIR` in `$SPARK_HOME/spark-env.sh` -to a location containing the configuration files. -</td></td></tr></table> +<p>To make these files visible to Spark, set <code>HADOOP_CONF_DIR</code> in <code>$SPARK_HOME/spark-env.sh</code> +to a location containing the configuration files.</p> </div> diff --git a/site/docs/2.3.3/configuration.html b/site/docs/2.3.3/configuration.html index f022f29..c284bd5 100644 --- a/site/docs/2.3.3/configuration.html +++ b/site/docs/2.3.3/configuration.html @@ -136,10 +136,35 @@ <li><a href="#runtime-environment" id="markdown-toc-runtime-environment">Runtime Environment</a></li> <li><a href="#shuffle-behavior" id="markdown-toc-shuffle-behavior">Shuffle Behavior</a></li> <li><a href="#spark-ui" id="markdown-toc-spark-ui">Spark UI</a></li> + <li><a href="#compression-and-serialization" id="markdown-toc-compression-and-serialization">Compression and Serialization</a></li> + <li><a href="#memory-management" id="markdown-toc-memory-management">Memory Management</a></li> + <li><a href="#execution-behavior" id="markdown-toc-execution-behavior">Execution Behavior</a></li> + <li><a href="#networking" id="markdown-toc-networking">Networking</a></li> + <li><a href="#scheduling" id="markdown-toc-scheduling">Scheduling</a></li> + <li><a href="#dynamic-allocation" id="markdown-toc-dynamic-allocation">Dynamic Allocation</a></li> + <li><a href="#security" id="markdown-toc-security">Security</a></li> + <li><a href="#tls--ssl" id="markdown-toc-tls--ssl">TLS / SSL</a></li> + <li><a href="#spark-sql" id="markdown-toc-spark-sql">Spark SQL</a></li> + <li><a href="#spark-streaming" id="markdown-toc-spark-streaming">Spark Streaming</a></li> + <li><a href="#sparkr" id="markdown-toc-sparkr">SparkR</a></li> + <li><a href="#graphx" id="markdown-toc-graphx">GraphX</a></li> + <li><a href="#deploy" id="markdown-toc-deploy">Deploy</a></li> + <li><a href="#cluster-managers" id="markdown-toc-cluster-managers">Cluster Managers</a> <ul> + <li><a href="#yarn" id="markdown-toc-yarn">YARN</a></li> + <li><a href="#mesos" id="markdown-toc-mesos">Mesos</a></li> + <li><a href="#kubernetes" id="markdown-toc-kubernetes">Kubernetes</a></li> + <li><a href="#standalone-mode" id="markdown-toc-standalone-mode">Standalone Mode</a></li> + </ul> + </li> </ul> </li> </ul> </li> + <li><a href="#environment-variables" id="markdown-toc-environment-variables">Environment Variables</a></li> + <li><a href="#configuring-logging" id="markdown-toc-configuring-logging">Configuring Logging</a></li> + <li><a href="#overriding-configuration-directory" id="markdown-toc-overriding-configuration-directory">Overriding configuration directory</a></li> + <li><a href="#inheriting-hadoop-cluster-configuration" id="markdown-toc-inheriting-hadoop-cluster-configuration">Inheriting Hadoop Cluster Configuration</a></li> + <li><a href="#custom-hadoophive-configuration" id="markdown-toc-custom-hadoophive-configuration">Custom Hadoop/Hive Configuration</a></li> </ul> <p>Spark provides three locations to configure the system:</p> @@ -1050,11 +1075,11 @@ of the most common options to set are:</p> <td> The maximum allowed size for a HTTP request header, in bytes unless otherwise specified. This setting applies for the Spark History Server too. - <td> -</tr> -</table> + </td> +</tr> +</table> -### Compression and Serialization +<h3 id="compression-and-serialization">Compression and Serialization</h3> <table class="table"> <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr> @@ -1221,7 +1246,7 @@ of the most common options to set are:</p> </tr> </table> -### Memory Management +<h3 id="memory-management">Memory Management</h3> <table class="table"> <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr> @@ -1363,7 +1388,7 @@ of the most common options to set are:</p> </tr> </table> -### Execution Behavior +<h3 id="execution-behavior">Execution Behavior</h3> <table class="table"> <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr> @@ -1497,7 +1522,7 @@ of the most common options to set are:</p> </tr> </table> -### Networking +<h3 id="networking">Networking</h3> <table class="table"> <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr> @@ -1607,7 +1632,7 @@ of the most common options to set are:</p> </tr> </table> -### Scheduling +<h3 id="scheduling">Scheduling</h3> <table class="table"> <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr> @@ -1892,7 +1917,7 @@ of the most common options to set are:</p> </tr> </table> -### Dynamic Allocation +<h3 id="dynamic-allocation">Dynamic Allocation</h3> <table class="table"> <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr> @@ -1974,7 +1999,7 @@ of the most common options to set are:</p> </tr> </table> -### Security +<h3 id="security">Security</h3> <table class="table"> <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr> @@ -2164,7 +2189,7 @@ of the most common options to set are:</p> </tr> </table> -### TLS / SSL +<h3 id="tls--ssl">TLS / SSL</h3> <table class="table"> <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr> @@ -2283,44 +2308,42 @@ of the most common options to set are:</p> </tr> </table> +<h3 id="spark-sql">Spark SQL</h3> -### Spark SQL - -Running the <code>SET -v</code> command will show the entire list of the SQL configuration. +<p>Running the <code>SET -v</code> command will show the entire list of the SQL configuration.</p> <div class="codetabs"> <div data-lang="scala"> - <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="c1">// spark is an existing SparkSession</span> + <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="c1">// spark is an existing SparkSession</span> <span class="n">spark</span><span class="o">.</span><span class="n">sql</span><span class="o">(</span><span class="s">"SET -v"</span><span class="o">).</span><span class="n">show</span><span class="o">(</span><span class="n">numRows</span> <span class="k">=</span> <span class="mi">200</span><span class="o">,</span> <span class="n">truncate</span> <span class="k">=</span> <span class="kc">false</span><span class="o">)</span></code></pre></figure> - </div> + </div> <div data-lang="java"> - <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="c1">// spark is an existing SparkSession</span> + <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="c1">// spark is an existing SparkSession</span> <span class="n">spark</span><span class="o">.</span><span class="na">sql</span><span class="o">(</span><span class="s">"SET -v"</span><span class="o">).</span><span class="na">show</span><span class="o">(</span><span class="mi">200</span><span class="o">,</span> <span class="kc">false</span><span class="o">);</span></code></pre></figure> - </div> + </div> <div data-lang="python"> - <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="c1"># spark is an existing SparkSession</span> + <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="c1"># spark is an existing SparkSession</span> <span class="n">spark</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span><span class="s2">"SET -v"</span><span class="p">)</span><span class="o">.</span><span class="n">show</span><span class="p">(</span><span class="n">n</span><span class="o">=</span><span class="mi">200</span><span class="p">,</span> <span class="n">truncate</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span></code></pre></figure> - </div> + </div> <div data-lang="r"> - <figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span>sparkR.session<span class="p">()</span> + <figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span>sparkR.session<span class="p">()</span> properties <span class="o"><-</span> sql<span class="p">(</span><span class="s">"SET -v"</span><span class="p">)</span> showDF<span class="p">(</span>properties<span class="p">,</span> numRows <span class="o">=</span> <span class="m">200</span><span class="p">,</span> truncate <span class="o">=</span> <span class="kc">FALSE</span><span class="p">)</span></code></pre></figure> - </div> + </div> </div> - -### Spark Streaming +<h3 id="spark-streaming">Spark Streaming</h3> <table class="table"> <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr> @@ -2442,7 +2465,7 @@ showDF<span class="p">(</span>properties<span class="p">,</span> numRows <span c </tr> </table> -### SparkR +<h3 id="sparkr">SparkR</h3> <table class="table"> <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr> @@ -2492,7 +2515,7 @@ showDF<span class="p">(</span>properties<span class="p">,</span> numRows <span c </table> -### GraphX +<h3 id="graphx">GraphX</h3> <table class="table"> <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr> @@ -2506,7 +2529,7 @@ showDF<span class="p">(</span>properties<span class="p">,</span> numRows <span c </tr> </table> -### Deploy +<h3 id="deploy">Deploy</h3> <table class="table"> <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr> @@ -2528,32 +2551,30 @@ showDF<span class="p">(</span>properties<span class="p">,</span> numRows <span c </tr> </table> +<h3 id="cluster-managers">Cluster Managers</h3> -### Cluster Managers - -Each cluster manager in Spark has additional configuration options. Configurations -can be found on the pages for each mode: +<p>Each cluster manager in Spark has additional configuration options. Configurations +can be found on the pages for each mode:</p> -#### [YARN](running-on-yarn.html#configuration) +<h4 id="yarn"><a href="running-on-yarn.html#configuration">YARN</a></h4> -#### [Mesos](running-on-mesos.html#configuration) +<h4 id="mesos"><a href="running-on-mesos.html#configuration">Mesos</a></h4> -#### [Kubernetes](running-on-kubernetes.html#configuration) +<h4 id="kubernetes"><a href="running-on-kubernetes.html#configuration">Kubernetes</a></h4> -#### [Standalone Mode](spark-standalone.html#cluster-launch-scripts) +<h4 id="standalone-mode"><a href="spark-standalone.html#cluster-launch-scripts">Standalone Mode</a></h4> -# Environment Variables +<h1 id="environment-variables">Environment Variables</h1> -Certain Spark settings can be configured through environment variables, which are read from the -`conf/spark-env.sh` script in the directory where Spark is installed (or `conf/spark-env.cmd` on +<p>Certain Spark settings can be configured through environment variables, which are read from the +<code>conf/spark-env.sh</code> script in the directory where Spark is installed (or <code>conf/spark-env.cmd</code> on Windows). In Standalone and Mesos modes, this file can give machine specific information such as -hostnames. It is also sourced when running local Spark applications or submission scripts. +hostnames. It is also sourced when running local Spark applications or submission scripts.</p> -Note that `conf/spark-env.sh` does not exist by default when Spark is installed. However, you can -copy `conf/spark-env.sh.template` to create it. Make sure you make the copy executable. - -The following variables can be set in `spark-env.sh`: +<p>Note that <code>conf/spark-env.sh</code> does not exist by default when Spark is installed. However, you can +copy <code>conf/spark-env.sh.template</code> to create it. Make sure you make the copy executable.</p> +<p>The following variables can be set in <code>spark-env.sh</code>:</p> <table class="table"> <tr><th style="width:21%">Environment Variable</th><th>Meaning</th></tr> @@ -2586,64 +2607,64 @@ The following variables can be set in `spark-env.sh`: </tr> </table> -In addition to the above, there are also options for setting up the Spark -[standalone cluster scripts](spark-standalone.html#cluster-launch-scripts), such as number of cores -to use on each machine and maximum memory. +<p>In addition to the above, there are also options for setting up the Spark +<a href="spark-standalone.html#cluster-launch-scripts">standalone cluster scripts</a>, such as number of cores +to use on each machine and maximum memory.</p> -Since `spark-env.sh` is a shell script, some of these can be set programmatically -- for example, you might -compute `SPARK_LOCAL_IP` by looking up the IP of a specific network interface. +<p>Since <code>spark-env.sh</code> is a shell script, some of these can be set programmatically – for example, you might +compute <code>SPARK_LOCAL_IP</code> by looking up the IP of a specific network interface.</p> -Note: When running Spark on YARN in `cluster` mode, environment variables need to be set using the `spark.yarn.appMasterEnv.[EnvironmentVariableName]` property in your `conf/spark-defaults.conf` file. Environment variables that are set in `spark-env.sh` will not be reflected in the YARN Application Master process in `cluster` mode. See the [YARN-related Spark Properties](running-on-yarn.html#spark-properties) for more information. +<p>Note: When running Spark on YARN in <code>cluster</code> mode, environment variables need to be set using the <code>spark.yarn.appMasterEnv.[EnvironmentVariableName]</code> property in your <code>conf/spark-defaults.conf</code> file. Environment variables that are set in <code>spark-env.sh</code> will not be reflected in the YARN Application Master process in <code>cluster</code> mode. See the <a href="running-on-yarn.html#spark-properties">YARN-related Spark Properties</a> for more [...] -# Configuring Logging +<h1 id="configuring-logging">Configuring Logging</h1> -Spark uses [log4j](http://logging.apache.org/log4j/) for logging. You can configure it by adding a -`log4j.properties` file in the `conf` directory. One way to start is to copy the existing -`log4j.properties.template` located there. +<p>Spark uses <a href="http://logging.apache.org/log4j/">log4j</a> for logging. You can configure it by adding a +<code>log4j.properties</code> file in the <code>conf</code> directory. One way to start is to copy the existing +<code>log4j.properties.template</code> located there.</p> -# Overriding configuration directory +<h1 id="overriding-configuration-directory">Overriding configuration directory</h1> -To specify a different configuration directory other than the default "SPARK_HOME/conf", +<p>To specify a different configuration directory other than the default “SPARK_HOME/conf”, you can set SPARK_CONF_DIR. Spark will use the configuration files (spark-defaults.conf, spark-env.sh, log4j.properties, etc) -from this directory. - -# Inheriting Hadoop Cluster Configuration +from this directory.</p> -If you plan to read and write from HDFS using Spark, there are two Hadoop configuration files that -should be included on Spark's classpath: +<h1 id="inheriting-hadoop-cluster-configuration">Inheriting Hadoop Cluster Configuration</h1> -* `hdfs-site.xml`, which provides default behaviors for the HDFS client. -* `core-site.xml`, which sets the default filesystem name. +<p>If you plan to read and write from HDFS using Spark, there are two Hadoop configuration files that +should be included on Spark’s classpath:</p> -The location of these configuration files varies across Hadoop versions, but -a common location is inside of `/etc/hadoop/conf`. Some tools create -configurations on-the-fly, but offer a mechanism to download copies of them. +<ul> + <li><code>hdfs-site.xml</code>, which provides default behaviors for the HDFS client.</li> + <li><code>core-site.xml</code>, which sets the default filesystem name.</li> +</ul> -To make these files visible to Spark, set `HADOOP_CONF_DIR` in `$SPARK_HOME/conf/spark-env.sh` -to a location containing the configuration files. +<p>The location of these configuration files varies across Hadoop versions, but +a common location is inside of <code>/etc/hadoop/conf</code>. Some tools create +configurations on-the-fly, but offer a mechanism to download copies of them.</p> -# Custom Hadoop/Hive Configuration +<p>To make these files visible to Spark, set <code>HADOOP_CONF_DIR</code> in <code>$SPARK_HOME/conf/spark-env.sh</code> +to a location containing the configuration files.</p> -If your Spark application is interacting with Hadoop, Hive, or both, there are probably Hadoop/Hive -configuration files in Spark's classpath. +<h1 id="custom-hadoophive-configuration">Custom Hadoop/Hive Configuration</h1> -Multiple running applications might require different Hadoop/Hive client side configurations. -You can copy and modify `hdfs-site.xml`, `core-site.xml`, `yarn-site.xml`, `hive-site.xml` in -Spark's classpath for each application. In a Spark cluster running on YARN, these configuration -files are set cluster-wide, and cannot safely be changed by the application. +<p>If your Spark application is interacting with Hadoop, Hive, or both, there are probably Hadoop/Hive +configuration files in Spark’s classpath.</p> -The better choice is to use spark hadoop properties in the form of `spark.hadoop.*`. -They can be considered as same as normal spark properties which can be set in `$SPARK_HOME/conf/spark-defalut.conf` +<p>Multiple running applications might require different Hadoop/Hive client side configurations. +You can copy and modify <code>hdfs-site.xml</code>, <code>core-site.xml</code>, <code>yarn-site.xml</code>, <code>hive-site.xml</code> in +Spark’s classpath for each application. In a Spark cluster running on YARN, these configuration +files are set cluster-wide, and cannot safely be changed by the application.</p> -In some cases, you may want to avoid hard-coding certain configurations in a `SparkConf`. For -instance, Spark allows you to simply create an empty conf and set spark/spark hadoop properties. +<p>The better choice is to use spark hadoop properties in the form of <code>spark.hadoop.*</code>. +They can be considered as same as normal spark properties which can be set in <code>$SPARK_HOME/conf/spark-defalut.conf</code></p> +<p>In some cases, you may want to avoid hard-coding certain configurations in a <code>SparkConf</code>. For +instance, Spark allows you to simply create an empty conf and set spark/spark hadoop properties.</p> <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">val</span> <span class="n">conf</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">SparkConf</span><span class="o">().</span><span class="n">set</span><span class="o">(</span><span class="s">"spark.hadoop.abc.def"</span><span class="o">,</span><span class="s">"xyz"</span><span class="o">)</span> <span class="k">val</span> <span class="n">sc</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">SparkContext</span><span class="o">(</span><span class="n">conf</span><span class="o">)</span></code></pre></figure> - -Also, you can modify or add configurations at runtime: +<p>Also, you can modify or add configurations at runtime:</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span>./bin/spark-submit <span class="se">\ </span> --name <span class="s2">"My app"</span> <span class="se">\ </span> @@ -2653,7 +2674,6 @@ Also, you can modify or add configurations at runtime: --conf spark.hadoop.abc.def<span class="o">=</span>xyz <span class="se">\ </span> myApp.jar</code></pre></figure> -</td></td></tr></table> </div> diff --git a/site/docs/2.4.1/configuration.html b/site/docs/2.4.1/configuration.html index c50ad00..7903ed9 100644 --- a/site/docs/2.4.1/configuration.html +++ b/site/docs/2.4.1/configuration.html @@ -136,10 +136,34 @@ <li><a href="#runtime-environment" id="markdown-toc-runtime-environment">Runtime Environment</a></li> <li><a href="#shuffle-behavior" id="markdown-toc-shuffle-behavior">Shuffle Behavior</a></li> <li><a href="#spark-ui" id="markdown-toc-spark-ui">Spark UI</a></li> + <li><a href="#compression-and-serialization" id="markdown-toc-compression-and-serialization">Compression and Serialization</a></li> + <li><a href="#memory-management" id="markdown-toc-memory-management">Memory Management</a></li> + <li><a href="#execution-behavior" id="markdown-toc-execution-behavior">Execution Behavior</a></li> + <li><a href="#networking" id="markdown-toc-networking">Networking</a></li> + <li><a href="#scheduling" id="markdown-toc-scheduling">Scheduling</a></li> + <li><a href="#dynamic-allocation" id="markdown-toc-dynamic-allocation">Dynamic Allocation</a></li> + <li><a href="#security" id="markdown-toc-security">Security</a></li> + <li><a href="#spark-sql" id="markdown-toc-spark-sql">Spark SQL</a></li> + <li><a href="#spark-streaming" id="markdown-toc-spark-streaming">Spark Streaming</a></li> + <li><a href="#sparkr" id="markdown-toc-sparkr">SparkR</a></li> + <li><a href="#graphx" id="markdown-toc-graphx">GraphX</a></li> + <li><a href="#deploy" id="markdown-toc-deploy">Deploy</a></li> + <li><a href="#cluster-managers" id="markdown-toc-cluster-managers">Cluster Managers</a> <ul> + <li><a href="#yarn" id="markdown-toc-yarn">YARN</a></li> + <li><a href="#mesos" id="markdown-toc-mesos">Mesos</a></li> + <li><a href="#kubernetes" id="markdown-toc-kubernetes">Kubernetes</a></li> + <li><a href="#standalone-mode" id="markdown-toc-standalone-mode">Standalone Mode</a></li> + </ul> + </li> </ul> </li> </ul> </li> + <li><a href="#environment-variables" id="markdown-toc-environment-variables">Environment Variables</a></li> + <li><a href="#configuring-logging" id="markdown-toc-configuring-logging">Configuring Logging</a></li> + <li><a href="#overriding-configuration-directory" id="markdown-toc-overriding-configuration-directory">Overriding configuration directory</a></li> + <li><a href="#inheriting-hadoop-cluster-configuration" id="markdown-toc-inheriting-hadoop-cluster-configuration">Inheriting Hadoop Cluster Configuration</a></li> + <li><a href="#custom-hadoophive-configuration" id="markdown-toc-custom-hadoophive-configuration">Custom Hadoop/Hive Configuration</a></li> </ul> <p>Spark provides three locations to configure the system:</p> @@ -1072,11 +1096,11 @@ of the most common options to set are:</p> <td> The maximum allowed size for a HTTP request header, in bytes unless otherwise specified. This setting applies for the Spark History Server too. - <td> -</tr> -</table> + </td> +</tr> +</table> -### Compression and Serialization +<h3 id="compression-and-serialization">Compression and Serialization</h3> <table class="table"> <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr> @@ -1243,7 +1267,7 @@ of the most common options to set are:</p> </tr> </table> -### Memory Management +<h3 id="memory-management">Memory Management</h3> <table class="table"> <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr> @@ -1385,7 +1409,7 @@ of the most common options to set are:</p> </tr> </table> -### Execution Behavior +<h3 id="execution-behavior">Execution Behavior</h3> <table class="table"> <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr> @@ -1528,7 +1552,7 @@ of the most common options to set are:</p> </tr> </table> -### Networking +<h3 id="networking">Networking</h3> <table class="table"> <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr> @@ -1647,7 +1671,7 @@ of the most common options to set are:</p> </tr> </table> -### Scheduling +<h3 id="scheduling">Scheduling</h3> <table class="table"> <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr> @@ -1941,7 +1965,7 @@ of the most common options to set are:</p> </tr> </table> -### Dynamic Allocation +<h3 id="dynamic-allocation">Dynamic Allocation</h3> <table class="table"> <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr> @@ -2041,48 +2065,47 @@ of the most common options to set are:</p> </tr> </table> -### Security +<h3 id="security">Security</h3> -Please refer to the [Security](security.html) page for available options on how to secure different -Spark subsystems. +<p>Please refer to the <a href="security.html">Security</a> page for available options on how to secure different +Spark subsystems.</p> -### Spark SQL +<h3 id="spark-sql">Spark SQL</h3> -Running the <code>SET -v</code> command will show the entire list of the SQL configuration. +<p>Running the <code>SET -v</code> command will show the entire list of the SQL configuration.</p> <div class="codetabs"> <div data-lang="scala"> - <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="c1">// spark is an existing SparkSession</span> + <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="c1">// spark is an existing SparkSession</span> <span class="n">spark</span><span class="o">.</span><span class="n">sql</span><span class="o">(</span><span class="s">"SET -v"</span><span class="o">).</span><span class="n">show</span><span class="o">(</span><span class="n">numRows</span> <span class="k">=</span> <span class="mi">200</span><span class="o">,</span> <span class="n">truncate</span> <span class="k">=</span> <span class="kc">false</span><span class="o">)</span></code></pre></figure> - </div> + </div> <div data-lang="java"> - <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="c1">// spark is an existing SparkSession</span> + <figure class="highlight"><pre><code class="language-java" data-lang="java"><span></span><span class="c1">// spark is an existing SparkSession</span> <span class="n">spark</span><span class="o">.</span><span class="na">sql</span><span class="o">(</span><span class="s">"SET -v"</span><span class="o">).</span><span class="na">show</span><span class="o">(</span><span class="mi">200</span><span class="o">,</span> <span class="kc">false</span><span class="o">);</span></code></pre></figure> - </div> + </div> <div data-lang="python"> - <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="c1"># spark is an existing SparkSession</span> + <figure class="highlight"><pre><code class="language-python" data-lang="python"><span></span><span class="c1"># spark is an existing SparkSession</span> <span class="n">spark</span><span class="o">.</span><span class="n">sql</span><span class="p">(</span><span class="s2">"SET -v"</span><span class="p">)</span><span class="o">.</span><span class="n">show</span><span class="p">(</span><span class="n">n</span><span class="o">=</span><span class="mi">200</span><span class="p">,</span> <span class="n">truncate</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span></code></pre></figure> - </div> + </div> <div data-lang="r"> - <figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span>sparkR.session<span class="p">()</span> + <figure class="highlight"><pre><code class="language-r" data-lang="r"><span></span>sparkR.session<span class="p">()</span> properties <span class="o"><-</span> sql<span class="p">(</span><span class="s">"SET -v"</span><span class="p">)</span> showDF<span class="p">(</span>properties<span class="p">,</span> numRows <span class="o">=</span> <span class="m">200</span><span class="p">,</span> truncate <span class="o">=</span> <span class="kc">FALSE</span><span class="p">)</span></code></pre></figure> - </div> + </div> </div> - -### Spark Streaming +<h3 id="spark-streaming">Spark Streaming</h3> <table class="table"> <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr> @@ -2212,7 +2235,7 @@ showDF<span class="p">(</span>properties<span class="p">,</span> numRows <span c </tr> </table> -### SparkR +<h3 id="sparkr">SparkR</h3> <table class="table"> <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr> @@ -2262,7 +2285,7 @@ showDF<span class="p">(</span>properties<span class="p">,</span> numRows <span c </table> -### GraphX +<h3 id="graphx">GraphX</h3> <table class="table"> <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr> @@ -2276,7 +2299,7 @@ showDF<span class="p">(</span>properties<span class="p">,</span> numRows <span c </tr> </table> -### Deploy +<h3 id="deploy">Deploy</h3> <table class="table"> <tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr> @@ -2298,32 +2321,30 @@ showDF<span class="p">(</span>properties<span class="p">,</span> numRows <span c </tr> </table> +<h3 id="cluster-managers">Cluster Managers</h3> -### Cluster Managers - -Each cluster manager in Spark has additional configuration options. Configurations -can be found on the pages for each mode: +<p>Each cluster manager in Spark has additional configuration options. Configurations +can be found on the pages for each mode:</p> -#### [YARN](running-on-yarn.html#configuration) +<h4 id="yarn"><a href="running-on-yarn.html#configuration">YARN</a></h4> -#### [Mesos](running-on-mesos.html#configuration) +<h4 id="mesos"><a href="running-on-mesos.html#configuration">Mesos</a></h4> -#### [Kubernetes](running-on-kubernetes.html#configuration) +<h4 id="kubernetes"><a href="running-on-kubernetes.html#configuration">Kubernetes</a></h4> -#### [Standalone Mode](spark-standalone.html#cluster-launch-scripts) +<h4 id="standalone-mode"><a href="spark-standalone.html#cluster-launch-scripts">Standalone Mode</a></h4> -# Environment Variables +<h1 id="environment-variables">Environment Variables</h1> -Certain Spark settings can be configured through environment variables, which are read from the -`conf/spark-env.sh` script in the directory where Spark is installed (or `conf/spark-env.cmd` on +<p>Certain Spark settings can be configured through environment variables, which are read from the +<code>conf/spark-env.sh</code> script in the directory where Spark is installed (or <code>conf/spark-env.cmd</code> on Windows). In Standalone and Mesos modes, this file can give machine specific information such as -hostnames. It is also sourced when running local Spark applications or submission scripts. +hostnames. It is also sourced when running local Spark applications or submission scripts.</p> -Note that `conf/spark-env.sh` does not exist by default when Spark is installed. However, you can -copy `conf/spark-env.sh.template` to create it. Make sure you make the copy executable. - -The following variables can be set in `spark-env.sh`: +<p>Note that <code>conf/spark-env.sh</code> does not exist by default when Spark is installed. However, you can +copy <code>conf/spark-env.sh.template</code> to create it. Make sure you make the copy executable.</p> +<p>The following variables can be set in <code>spark-env.sh</code>:</p> <table class="table"> <tr><th style="width:21%">Environment Variable</th><th>Meaning</th></tr> @@ -2356,64 +2377,64 @@ The following variables can be set in `spark-env.sh`: </tr> </table> -In addition to the above, there are also options for setting up the Spark -[standalone cluster scripts](spark-standalone.html#cluster-launch-scripts), such as number of cores -to use on each machine and maximum memory. +<p>In addition to the above, there are also options for setting up the Spark +<a href="spark-standalone.html#cluster-launch-scripts">standalone cluster scripts</a>, such as number of cores +to use on each machine and maximum memory.</p> -Since `spark-env.sh` is a shell script, some of these can be set programmatically -- for example, you might -compute `SPARK_LOCAL_IP` by looking up the IP of a specific network interface. +<p>Since <code>spark-env.sh</code> is a shell script, some of these can be set programmatically – for example, you might +compute <code>SPARK_LOCAL_IP</code> by looking up the IP of a specific network interface.</p> -Note: When running Spark on YARN in `cluster` mode, environment variables need to be set using the `spark.yarn.appMasterEnv.[EnvironmentVariableName]` property in your `conf/spark-defaults.conf` file. Environment variables that are set in `spark-env.sh` will not be reflected in the YARN Application Master process in `cluster` mode. See the [YARN-related Spark Properties](running-on-yarn.html#spark-properties) for more information. +<p>Note: When running Spark on YARN in <code>cluster</code> mode, environment variables need to be set using the <code>spark.yarn.appMasterEnv.[EnvironmentVariableName]</code> property in your <code>conf/spark-defaults.conf</code> file. Environment variables that are set in <code>spark-env.sh</code> will not be reflected in the YARN Application Master process in <code>cluster</code> mode. See the <a href="running-on-yarn.html#spark-properties">YARN-related Spark Properties</a> for more [...] -# Configuring Logging +<h1 id="configuring-logging">Configuring Logging</h1> -Spark uses [log4j](http://logging.apache.org/log4j/) for logging. You can configure it by adding a -`log4j.properties` file in the `conf` directory. One way to start is to copy the existing -`log4j.properties.template` located there. +<p>Spark uses <a href="http://logging.apache.org/log4j/">log4j</a> for logging. You can configure it by adding a +<code>log4j.properties</code> file in the <code>conf</code> directory. One way to start is to copy the existing +<code>log4j.properties.template</code> located there.</p> -# Overriding configuration directory +<h1 id="overriding-configuration-directory">Overriding configuration directory</h1> -To specify a different configuration directory other than the default "SPARK_HOME/conf", +<p>To specify a different configuration directory other than the default “SPARK_HOME/conf”, you can set SPARK_CONF_DIR. Spark will use the configuration files (spark-defaults.conf, spark-env.sh, log4j.properties, etc) -from this directory. - -# Inheriting Hadoop Cluster Configuration +from this directory.</p> -If you plan to read and write from HDFS using Spark, there are two Hadoop configuration files that -should be included on Spark's classpath: +<h1 id="inheriting-hadoop-cluster-configuration">Inheriting Hadoop Cluster Configuration</h1> -* `hdfs-site.xml`, which provides default behaviors for the HDFS client. -* `core-site.xml`, which sets the default filesystem name. +<p>If you plan to read and write from HDFS using Spark, there are two Hadoop configuration files that +should be included on Spark’s classpath:</p> -The location of these configuration files varies across Hadoop versions, but -a common location is inside of `/etc/hadoop/conf`. Some tools create -configurations on-the-fly, but offer a mechanism to download copies of them. +<ul> + <li><code>hdfs-site.xml</code>, which provides default behaviors for the HDFS client.</li> + <li><code>core-site.xml</code>, which sets the default filesystem name.</li> +</ul> -To make these files visible to Spark, set `HADOOP_CONF_DIR` in `$SPARK_HOME/conf/spark-env.sh` -to a location containing the configuration files. +<p>The location of these configuration files varies across Hadoop versions, but +a common location is inside of <code>/etc/hadoop/conf</code>. Some tools create +configurations on-the-fly, but offer a mechanism to download copies of them.</p> -# Custom Hadoop/Hive Configuration +<p>To make these files visible to Spark, set <code>HADOOP_CONF_DIR</code> in <code>$SPARK_HOME/conf/spark-env.sh</code> +to a location containing the configuration files.</p> -If your Spark application is interacting with Hadoop, Hive, or both, there are probably Hadoop/Hive -configuration files in Spark's classpath. +<h1 id="custom-hadoophive-configuration">Custom Hadoop/Hive Configuration</h1> -Multiple running applications might require different Hadoop/Hive client side configurations. -You can copy and modify `hdfs-site.xml`, `core-site.xml`, `yarn-site.xml`, `hive-site.xml` in -Spark's classpath for each application. In a Spark cluster running on YARN, these configuration -files are set cluster-wide, and cannot safely be changed by the application. +<p>If your Spark application is interacting with Hadoop, Hive, or both, there are probably Hadoop/Hive +configuration files in Spark’s classpath.</p> -The better choice is to use spark hadoop properties in the form of `spark.hadoop.*`. -They can be considered as same as normal spark properties which can be set in `$SPARK_HOME/conf/spark-defaults.conf` +<p>Multiple running applications might require different Hadoop/Hive client side configurations. +You can copy and modify <code>hdfs-site.xml</code>, <code>core-site.xml</code>, <code>yarn-site.xml</code>, <code>hive-site.xml</code> in +Spark’s classpath for each application. In a Spark cluster running on YARN, these configuration +files are set cluster-wide, and cannot safely be changed by the application.</p> -In some cases, you may want to avoid hard-coding certain configurations in a `SparkConf`. For -instance, Spark allows you to simply create an empty conf and set spark/spark hadoop properties. +<p>The better choice is to use spark hadoop properties in the form of <code>spark.hadoop.*</code>. +They can be considered as same as normal spark properties which can be set in <code>$SPARK_HOME/conf/spark-defaults.conf</code></p> +<p>In some cases, you may want to avoid hard-coding certain configurations in a <code>SparkConf</code>. For +instance, Spark allows you to simply create an empty conf and set spark/spark hadoop properties.</p> <figure class="highlight"><pre><code class="language-scala" data-lang="scala"><span></span><span class="k">val</span> <span class="n">conf</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">SparkConf</span><span class="o">().</span><span class="n">set</span><span class="o">(</span><span class="s">"spark.hadoop.abc.def"</span><span class="o">,</span><span class="s">"xyz"</span><span class="o">)</span> <span class="k">val</span> <span class="n">sc</span> <span class="k">=</span> <span class="k">new</span> <span class="nc">SparkContext</span><span class="o">(</span><span class="n">conf</span><span class="o">)</span></code></pre></figure> - -Also, you can modify or add configurations at runtime: +<p>Also, you can modify or add configurations at runtime:</p> <figure class="highlight"><pre><code class="language-bash" data-lang="bash"><span></span>./bin/spark-submit <span class="se">\ </span> --name <span class="s2">"My app"</span> <span class="se">\ </span> @@ -2423,7 +2444,6 @@ Also, you can modify or add configurations at runtime: --conf spark.hadoop.abc.def<span class="o">=</span>xyz <span class="se">\ </span> myApp.jar</code></pre></figure> -</td></td></tr></table> </div> --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org