Thanks andy
From: Davies Liu <dav...@databricks.com> Date: Friday, November 13, 2015 at 3:42 PM To: Andrew Davidson <a...@santacruzintegration.com> Cc: "user @spark" <user@spark.apache.org> Subject: Re: bin/pyspark SparkContext is missing? > You forgot to create a SparkContext instance: > > sc = SparkContext() > > On Tue, Nov 3, 2015 at 9:59 AM, Andy Davidson > <a...@santacruzintegration.com> wrote: >> I am having a heck of a time getting Ipython notebooks to work on my 1.5.1 >> AWS cluster I created using spark-1.5.1-bin-hadoop2.6/ec2/spark-ec2 >> >> I have read the instructions for using iPython notebook on >> http://spark.apache.org/docs/latest/programming-guide.html#using-the-shell >> >> I want to run the notebook server on my master and use an ssh tunnel to >> connect a web browser running on my mac. >> >> I am confident the cluster is set up correctly because the sparkPi example >> runs. >> >> I am able to use IPython notebooks on my local mac and work with spark and >> local files with out any problems. >> >> I know the ssh tunnel is working. >> >> On my cluster I am able to use python shell in general >> >> [ec2-user@ip-172-31-29-60 dataScience]$ /root/spark/bin/pyspark --master >> local[2] >> >> >>>>> from pyspark import SparkContext >> >>>>> textFile = sc.textFile("file:///home/ec2-user/dataScience/readme.txt") >> >>>>> textFile.take(1) >> >> >> >> When I run the exact same code in iPython notebook I get >> >> --------------------------------------------------------------------------- >> NameError Traceback (most recent call last) >> <ipython-input-1-ba11b935529e> in <module>() >> 11 from pyspark import SparkContext, SparkConf >> 12 >> ---> 13 textFile = >> sc.textFile("file:///home/ec2-user/dataScience/readme.txt") >> 14 >> 15 textFile.take(1) >> >> NameError: name 'sc' is not defined >> >> >> >> >> To try an debug I wrote a script to launch pyspark and added set x¹ to >> pyspark so I could see what the script was doing >> >> Any idea how I can debug this? >> >> Thanks in advance >> >> Andy >> >> $ cat notebook.sh >> >> set -x >> >> export PYSPARK_DRIVER_PYTHON=ipython >> >> export PYSPARK_DRIVER_PYTHON_OPTS="notebook --no-browser --port=7000" >> >> /root/spark/bin/pyspark --master local[2] >> >> >> >> >> [ec2-user@ip-172-31-29-60 dataScience]$ ./notebook.sh >> >> ++ export PYSPARK_DRIVER_PYTHON=ipython >> >> ++ PYSPARK_DRIVER_PYTHON=ipython >> >> ++ export 'PYSPARK_DRIVER_PYTHON_OPTS=notebook --no-browser --port=7000' >> >> ++ PYSPARK_DRIVER_PYTHON_OPTS='notebook --no-browser --port=7000' >> >> ++ /root/spark/bin/pyspark --master 'local[2]' >> >> +++ dirname /root/spark/bin/pyspark >> >> ++ cd /root/spark/bin/.. >> >> ++ pwd >> >> + export SPARK_HOME=/root/spark >> >> + SPARK_HOME=/root/spark >> >> + source /root/spark/bin/load-spark-env.sh >> >> ++++ dirname /root/spark/bin/pyspark >> >> +++ cd /root/spark/bin/.. >> >> +++ pwd >> >> ++ FWDIR=/root/spark >> >> ++ '[' -z '' ']' >> >> ++ export SPARK_ENV_LOADED=1 >> >> ++ SPARK_ENV_LOADED=1 >> >> ++++ dirname /root/spark/bin/pyspark >> >> +++ cd /root/spark/bin/.. >> >> +++ pwd >> >> ++ parent_dir=/root/spark >> >> ++ user_conf_dir=/root/spark/conf >> >> ++ '[' -f /root/spark/conf/spark-env.sh ']' >> >> ++ set -a >> >> ++ . /root/spark/conf/spark-env.sh >> >> +++ export JAVA_HOME=/usr/java/latest >> >> +++ JAVA_HOME=/usr/java/latest >> >> +++ export SPARK_LOCAL_DIRS=/mnt/spark,/mnt2/spark >> >> +++ SPARK_LOCAL_DIRS=/mnt/spark,/mnt2/spark >> >> +++ export SPARK_MASTER_OPTS= >> >> +++ SPARK_MASTER_OPTS= >> >> +++ '[' -n 1 ']' >> >> +++ export SPARK_WORKER_INSTANCES=1 >> >> +++ SPARK_WORKER_INSTANCES=1 >> >> +++ export SPARK_WORKER_CORES=2 >> >> +++ SPARK_WORKER_CORES=2 >> >> +++ export HADOOP_HOME=/root/ephemeral-hdfs >> >> +++ HADOOP_HOME=/root/ephemeral-hdfs >> >> +++ export >> SPARK_MASTER_IP=ec2-54-215-207-132.us-west-1.compute.amazonaws.com >> >> +++ SPARK_MASTER_IP=ec2-54-215-207-132.us-west-1.compute.amazonaws.com >> >> ++++ cat /root/spark-ec2/cluster-url >> >> +++ export >> MASTER=spark://ec2-54-215-207-132.us-west-1.compute.amazonaws.com:7077 >> >> +++ MASTER=spark://ec2-54-215-207-132.us-west-1.compute.amazonaws.com:7077 >> >> +++ export SPARK_SUBMIT_LIBRARY_PATH=:/root/ephemeral-hdfs/lib/native/ >> >> +++ SPARK_SUBMIT_LIBRARY_PATH=:/root/ephemeral-hdfs/lib/native/ >> >> +++ export SPARK_SUBMIT_CLASSPATH=::/root/ephemeral-hdfs/conf >> >> +++ SPARK_SUBMIT_CLASSPATH=::/root/ephemeral-hdfs/conf >> >> ++++ wget -q -O - http://169.254.169.254/latest/meta-data/public-hostname >> >> +++ export >> SPARK_PUBLIC_DNS=ec2-54-215-207-132.us-west-1.compute.amazonaws.com >> >> +++ SPARK_PUBLIC_DNS=ec2-54-215-207-132.us-west-1.compute.amazonaws.com >> >> +++ export YARN_CONF_DIR=/root/ephemeral-hdfs/conf >> >> +++ YARN_CONF_DIR=/root/ephemeral-hdfs/conf >> >> ++++ id -u >> >> +++ '[' 222 == 0 ']' >> >> ++ set +a >> >> ++ '[' -z '' ']' >> >> ++ ASSEMBLY_DIR2=/root/spark/assembly/target/scala-2.11 >> >> ++ ASSEMBLY_DIR1=/root/spark/assembly/target/scala-2.10 >> >> ++ [[ -d /root/spark/assembly/target/scala-2.11 ]] >> >> ++ '[' -d /root/spark/assembly/target/scala-2.11 ']' >> >> ++ export SPARK_SCALA_VERSION=2.10 >> >> ++ SPARK_SCALA_VERSION=2.10 >> >> + export '_SPARK_CMD_USAGE=Usage: ./bin/pyspark [options]' >> >> + _SPARK_CMD_USAGE='Usage: ./bin/pyspark [options]' >> >> + hash python2.7 >> >> + DEFAULT_PYTHON=python2.7 >> >> + [[ -n '' ]] >> >> + [[ '' == \1 ]] >> >> + [[ -z ipython ]] >> >> + [[ -z '' ]] >> >> + [[ ipython == *ipython* ]] >> >> + [[ python2.7 != \p\y\t\h\o\n\2\.\7 ]] >> >> + PYSPARK_PYTHON=python2.7 >> >> + export PYSPARK_PYTHON >> >> + export PYTHONPATH=/root/spark/python/: >> >> + PYTHONPATH=/root/spark/python/: >> >> + export >> PYTHONPATH=/root/spark/python/lib/py4j-0.8.2.1-src.zip:/root/spark/python/: >> >> + >> PYTHONPATH=/root/spark/python/lib/py4j-0.8.2.1-src.zip:/root/spark/python/: >> >> + export OLD_PYTHONSTARTUP= >> >> + OLD_PYTHONSTARTUP= >> >> + export PYTHONSTARTUP=/root/spark/python/pyspark/shell.py >> >> + PYTHONSTARTUP=/root/spark/python/pyspark/shell.py >> >> + [[ -n '' ]] >> >> + export PYSPARK_DRIVER_PYTHON >> >> + export PYSPARK_DRIVER_PYTHON_OPTS >> >> + exec /root/spark/bin/spark-submit pyspark-shell-main --name PySparkShell >> --master 'local[2]' >> >> [NotebookApp] Using existing profile dir: >> u'/home/ec2-user/.ipython/profile_default' >> >> [NotebookApp] Serving notebooks from /home/ec2-user/dataScience >> >> [NotebookApp] The IPython Notebook is running at: http://127.0.0.1:7000/ >> >> [NotebookApp] Use Control-C to stop this server and shut down all kernels. >> >> [NotebookApp] Using MathJax from CDN: >> http://cdn.mathjax.org/mathjax/latest/MathJax.js >> >> [NotebookApp] Kernel started: 2ba6864b-0dc8-4814-8e05-4f532cb40e2b >> >> [NotebookApp] Connecting to: tcp://127.0.0.1:55099 >> >> [NotebookApp] Connecting to: tcp://127.0.0.1:48994 >> >> [NotebookApp] Connecting to: tcp://127.0.0.1:57214 >> >> [IPKernelApp] To connect another client to this kernel, use: >> >> [IPKernelApp] --existing kernel-2ba6864b-0dc8-4814-8e05-4f532cb40e2b.json >> >> >