viirya commented on a change in pull request #29779:
URL: https://github.com/apache/spark/pull/29779#discussion_r491235435



##########
File path: python/docs/source/getting_started/install.rst
##########
@@ -0,0 +1,138 @@
+..  Licensed to the Apache Software Foundation (ASF) under one
+    or more contributor license agreements.  See the NOTICE file
+    distributed with this work for additional information
+    regarding copyright ownership.  The ASF licenses this file
+    to you under the Apache License, Version 2.0 (the
+    "License"); you may not use this file except in compliance
+    with the License.  You may obtain a copy of the License at
+
+..    http://www.apache.org/licenses/LICENSE-2.0
+
+..  Unless required by applicable law or agreed to in writing,
+    software distributed under the License is distributed on an
+    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+    KIND, either express or implied.  See the License for the
+    specific language governing permissions and limitations
+    under the License.
+
+============
+Installation
+============
+
+PySpark is included in the official releases of Spark available in the `Apache 
Spark website <https://spark.apache.org/downloads.html>`_.
+For Python users, PySpark also provides ``pip`` installation from PyPI. This 
is usually for local usage or as
+a client to connect to a cluster instead of setting up a cluster itself.
+ 
+This page includes instructions for installing PySpark by using pip, Conda, 
downloading manually,
+and building from the source.
+
+
+Python Version Supported
+------------------------
+
+Python 3.6 and above.
+
+
+Using PyPI
+----------
+
+PySpark installation using `PyPI <https://pypi.org/project/pyspark/>`_ is as 
follows:
+
+.. code-block:: bash
+
+    pip install pyspark
+
+If you want to install extra dependencies for a specific componenet, you can 
install it as below:
+
+.. code-block:: bash
+
+    pip install pyspark[sql]
+
+
+Using Conda
+-----------
+
+Conda is an open-source package management and environment management system 
which is a part of
+the `Anaconda <https://docs.continuum.io/anaconda/>`_ distribution. It is both 
cross-platform and
+language agnostic. In practice, Conda can replace both `pip 
<https://pip.pypa.io/en/latest/>`_ and
+`virtualenv <https://virtualenv.pypa.io/en/latest/>`_.
+
+Create new virtual environment from your terminal as shown below:
+
+.. code-block:: bash
+
+    conda create -n pyspark_env
+
+After the virtual environment is created, it should be visible under the list 
of Conda environments
+which can be seen using the following command:
+
+.. code-block:: bash
+
+    conda env list
+
+Now activate the newly created environment with the following command:
+
+.. code-block:: bash
+
+    conda activate pyspark_env
+
+You can install pyspark by `Using PyPI <#using-pypi>`_ to install PySpark in 
the newly created
+environment, for example as below. It will install PySpark under the new 
virtual environemnt
+``pyspark_env`` created above.
+
+.. code-block:: bash
+
+    pip install pyspark
+
+Alternatively, you can install PySpark from Conda itself as below:
+
+.. code-block:: bash
+
+    conda install pyspark
+
+However, note that `PySpark at Conda 
<https://anaconda.org/conda-forge/pyspark>`_ is not necessarily
+synced with PySpark release cycle because it is maintained by the community 
separately.
+
+
+Manually Downloading
+--------------------
+
+PySpark is included in the distributions available at the `Apache Spark 
website <https://spark.apache.org/downloads.html>`_.
+You can download a distribution you want from the site. After that, uncompress 
the tar file into the directoy where you want
+to install Spark as below:
+
+.. code-block:: bash
+
+    tar xzvf spark-3.0.0-bin-hadoop2.7.tgz
+
+Ensure the ``SPARK_HOME`` environment variable points to the directory where 
the code has been extracted. 

Review comment:
       `where the tar file has been extracted.`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to