Thanks

andy

From:  Davies Liu <dav...@databricks.com>
Date:  Friday, November 13, 2015 at 3:42 PM
To:  Andrew Davidson <a...@santacruzintegration.com>
Cc:  "user @spark" <user@spark.apache.org>
Subject:  Re: bin/pyspark SparkContext is missing?

> You forgot to create a SparkContext instance:
> 
> sc = SparkContext()
> 
> On Tue, Nov 3, 2015 at 9:59 AM, Andy Davidson
> <a...@santacruzintegration.com> wrote:
>>  I am having a heck of a time getting Ipython notebooks to work on my 1.5.1
>>  AWS cluster I created using spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2
>> 
>>  I have read the instructions for using iPython notebook on
>>  http://spark.apache.org/docs/latest/programming-guide.html#using-the-shell
>> 
>>  I want to run the notebook server on my master and use an ssh tunnel to
>>  connect a web browser running on my mac.
>> 
>>  I am confident the cluster is set up correctly because the sparkPi example
>>  runs.
>> 
>>  I am able to use IPython notebooks on my local mac and work with spark and
>>  local files with out any problems.
>> 
>>  I know the ssh tunnel is working.
>> 
>>  On my cluster I am able to use python shell in general
>> 
>>  [ec2-user@ip-172-31-29-60 dataScience]$ /root/spark/bin/pyspark --master
>>  local[2]
>> 
>> 
>>>>>  from pyspark import SparkContext
>> 
>>>>>  textFile = sc.textFile("file:///home/ec2-user/dataScience/readme.txt")
>> 
>>>>>  textFile.take(1)
>> 
>> 
>> 
>>  When I run the exact same code in iPython notebook I get
>> 
>>  ---------------------------------------------------------------------------
>>  NameError                                 Traceback (most recent call last)
>>  <ipython-input-1-ba11b935529e> in <module>()
>>       11 from pyspark import SparkContext, SparkConf
>>       12
>>  ---> 13 textFile =
>>  sc.textFile("file:///home/ec2-user/dataScience/readme.txt")
>>       14
>>       15 textFile.take(1)
>> 
>>  NameError: name 'sc' is not defined
>> 
>> 
>> 
>> 
>>  To try an debug I wrote a script to launch pyspark and added Œset ­x¹ to
>>  pyspark so I could see what the script was doing
>> 
>>  Any idea how I can debug this?
>> 
>>  Thanks in advance
>> 
>>  Andy
>> 
>>  $ cat notebook.sh
>> 
>>  set -x
>> 
>>  export PYSPARK_DRIVER_PYTHON=ipython
>> 
>>  export PYSPARK_DRIVER_PYTHON_OPTS="notebook --no-browser --port=7000"
>> 
>>  /root/spark/bin/pyspark --master local[2]
>> 
>> 
>> 
>> 
>>  [ec2-user@ip-172-31-29-60 dataScience]$ ./notebook.sh
>> 
>>  ++ export PYSPARK_DRIVER_PYTHON=ipython
>> 
>>  ++ PYSPARK_DRIVER_PYTHON=ipython
>> 
>>  ++ export 'PYSPARK_DRIVER_PYTHON_OPTS=notebook --no-browser --port=7000'
>> 
>>  ++ PYSPARK_DRIVER_PYTHON_OPTS='notebook --no-browser --port=7000'
>> 
>>  ++ /root/spark/bin/pyspark --master 'local[2]'
>> 
>>  +++ dirname /root/spark/bin/pyspark
>> 
>>  ++ cd /root/spark/bin/..
>> 
>>  ++ pwd
>> 
>>  + export SPARK_HOME=/root/spark
>> 
>>  + SPARK_HOME=/root/spark
>> 
>>  + source /root/spark/bin/load-spark-env.sh
>> 
>>  ++++ dirname /root/spark/bin/pyspark
>> 
>>  +++ cd /root/spark/bin/..
>> 
>>  +++ pwd
>> 
>>  ++ FWDIR=/root/spark
>> 
>>  ++ '[' -z '' ']'
>> 
>>  ++ export SPARK_ENV_LOADED=1
>> 
>>  ++ SPARK_ENV_LOADED=1
>> 
>>  ++++ dirname /root/spark/bin/pyspark
>> 
>>  +++ cd /root/spark/bin/..
>> 
>>  +++ pwd
>> 
>>  ++ parent_dir=/root/spark
>> 
>>  ++ user_conf_dir=/root/spark/conf
>> 
>>  ++ '[' -f /root/spark/conf/spark-env.sh ']'
>> 
>>  ++ set -a
>> 
>>  ++ . /root/spark/conf/spark-env.sh
>> 
>>  +++ export JAVA_HOME=/usr/java/latest
>> 
>>  +++ JAVA_HOME=/usr/java/latest
>> 
>>  +++ export SPARK_LOCAL_DIRS=/mnt/spark,/mnt2/spark
>> 
>>  +++ SPARK_LOCAL_DIRS=/mnt/spark,/mnt2/spark
>> 
>>  +++ export SPARK_MASTER_OPTS=
>> 
>>  +++ SPARK_MASTER_OPTS=
>> 
>>  +++ '[' -n 1 ']'
>> 
>>  +++ export SPARK_WORKER_INSTANCES=1
>> 
>>  +++ SPARK_WORKER_INSTANCES=1
>> 
>>  +++ export SPARK_WORKER_CORES=2
>> 
>>  +++ SPARK_WORKER_CORES=2
>> 
>>  +++ export HADOOP_HOME=/root/ephemeral-hdfs
>> 
>>  +++ HADOOP_HOME=/root/ephemeral-hdfs
>> 
>>  +++ export
>>  SPARK_MASTER_IP=ec2-54-215-207-132.us-west-1.compute.amazonaws.com
>> 
>>  +++ SPARK_MASTER_IP=ec2-54-215-207-132.us-west-1.compute.amazonaws.com
>> 
>>  ++++ cat /root/spark-ec2/cluster-url
>> 
>>  +++ export
>>  MASTER=spark://ec2-54-215-207-132.us-west-1.compute.amazonaws.com:7077
>> 
>>  +++ MASTER=spark://ec2-54-215-207-132.us-west-1.compute.amazonaws.com:7077
>> 
>>  +++ export SPARK_SUBMIT_LIBRARY_PATH=:/root/ephemeral-hdfs/lib/native/
>> 
>>  +++ SPARK_SUBMIT_LIBRARY_PATH=:/root/ephemeral-hdfs/lib/native/
>> 
>>  +++ export SPARK_SUBMIT_CLASSPATH=::/root/ephemeral-hdfs/conf
>> 
>>  +++ SPARK_SUBMIT_CLASSPATH=::/root/ephemeral-hdfs/conf
>> 
>>  ++++ wget -q -O - http://169.254.169.254/latest/meta-data/public-hostname
>> 
>>  +++ export
>>  SPARK_PUBLIC_DNS=ec2-54-215-207-132.us-west-1.compute.amazonaws.com
>> 
>>  +++ SPARK_PUBLIC_DNS=ec2-54-215-207-132.us-west-1.compute.amazonaws.com
>> 
>>  +++ export YARN_CONF_DIR=/root/ephemeral-hdfs/conf
>> 
>>  +++ YARN_CONF_DIR=/root/ephemeral-hdfs/conf
>> 
>>  ++++ id -u
>> 
>>  +++ '[' 222 == 0 ']'
>> 
>>  ++ set +a
>> 
>>  ++ '[' -z '' ']'
>> 
>>  ++ ASSEMBLY_DIR2=/root/spark/assembly/target/scala-2.11
>> 
>>  ++ ASSEMBLY_DIR1=/root/spark/assembly/target/scala-2.10
>> 
>>  ++ [[ -d /root/spark/assembly/target/scala-2.11 ]]
>> 
>>  ++ '[' -d /root/spark/assembly/target/scala-2.11 ']'
>> 
>>  ++ export SPARK_SCALA_VERSION=2.10
>> 
>>  ++ SPARK_SCALA_VERSION=2.10
>> 
>>  + export '_SPARK_CMD_USAGE=Usage: ./bin/pyspark [options]'
>> 
>>  + _SPARK_CMD_USAGE='Usage: ./bin/pyspark [options]'
>> 
>>  + hash python2.7
>> 
>>  + DEFAULT_PYTHON=python2.7
>> 
>>  + [[ -n '' ]]
>> 
>>  + [[ '' == \1 ]]
>> 
>>  + [[ -z ipython ]]
>> 
>>  + [[ -z '' ]]
>> 
>>  + [[ ipython == *ipython* ]]
>> 
>>  + [[ python2.7 != \p\y\t\h\o\n\2\.\7 ]]
>> 
>>  + PYSPARK_PYTHON=python2.7
>> 
>>  + export PYSPARK_PYTHON
>> 
>>  + export PYTHONPATH=/root/spark/python/:
>> 
>>  + PYTHONPATH=/root/spark/python/:
>> 
>>  + export
>>  PYTHONPATH=/root/spark/python/lib/py4j-0.8.2.1-src.zip:/root/spark/python/:
>> 
>>  +
>>  PYTHONPATH=/root/spark/python/lib/py4j-0.8.2.1-src.zip:/root/spark/python/:
>> 
>>  + export OLD_PYTHONSTARTUP=
>> 
>>  + OLD_PYTHONSTARTUP=
>> 
>>  + export PYTHONSTARTUP=/root/spark/python/pyspark/shell.py
>> 
>>  + PYTHONSTARTUP=/root/spark/python/pyspark/shell.py
>> 
>>  + [[ -n '' ]]
>> 
>>  + export PYSPARK_DRIVER_PYTHON
>> 
>>  + export PYSPARK_DRIVER_PYTHON_OPTS
>> 
>>  + exec /root/spark/bin/spark-submit pyspark-shell-main --name PySparkShell
>>  --master 'local[2]'
>> 
>>  [NotebookApp] Using existing profile dir:
>>  u'/home/ec2-user/.ipython/profile_default'
>> 
>>  [NotebookApp] Serving notebooks from /home/ec2-user/dataScience
>> 
>>  [NotebookApp] The IPython Notebook is running at: http://127.0.0.1:7000/
>> 
>>  [NotebookApp] Use Control-C to stop this server and shut down all kernels.
>> 
>>  [NotebookApp] Using MathJax from CDN:
>>  http://cdn.mathjax.org/mathjax/latest/MathJax.js
>> 
>>  [NotebookApp] Kernel started: 2ba6864b-0dc8-4814-8e05-4f532cb40e2b
>> 
>>  [NotebookApp] Connecting to: tcp://127.0.0.1:55099
>> 
>>  [NotebookApp] Connecting to: tcp://127.0.0.1:48994
>> 
>>  [NotebookApp] Connecting to: tcp://127.0.0.1:57214
>> 
>>  [IPKernelApp] To connect another client to this kernel, use:
>> 
>>  [IPKernelApp] --existing kernel-2ba6864b-0dc8-4814-8e05-4f532cb40e2b.json
>> 
>> 
> 


Reply via email to