Hi, I want to use pySpark, but can't understand how it works. Documentation
doesn't provide enough information.
1) How python shipped to cluster? Should machines in cluster already have
python?
2) What happens when I write some python code in map function - is it
shipped to cluster and just
Hi Egor,
Here are a few answers to your questions:
1) Python needs to be installed on all machines, but not pyspark. The way
the executors get the pyspark code depends on which cluster manager you
use. In standalone mode, your executors need to have the actual python
files in their working
Also take a look at this:
https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals
On Fri, Jul 11, 2014 at 10:29 AM, Andrew Or and...@databricks.com wrote:
Hi Egor,
Here are a few answers to your questions:
1) Python needs to be installed on all machines, but not pyspark. The