PySpark definetly works for me in ipython notebook. A good way to debug is
do "setMaster("local")" in your python sc object, see if that works. Then
from there, modify it to point to the real spark server.
Also, I added a hack where i did sys.path.insert the path to pyspark in my
python note book to get it working properly.
You can try these instructions out if you want which i recently put
together based on some other stuff online + a few minor modifications .
http://jayunit100.blogspot.com/2014/07/ipython-on-spark.html
On Thu, Oct 9, 2014 at 2:50 PM, Andy Davidson <[email protected]
> wrote:
> I wonder if I am starting iPython notebook incorrectly. The example in my
> original email does not work. It looks like stdout is not configured
> correctly If I submit it as a python.py file It works correctly
>
> Any idea how I what the problem is?
>
>
> Thanks
>
> Andy
>
>
> From: Andrew Davidson <[email protected]>
> Date: Tuesday, October 7, 2014 at 4:23 PM
> To: "[email protected]" <[email protected]>
> Subject: bug with IPython notebook?
>
> Hi
>
> I think I found a bug in the iPython notebook integration. I am not sure
> how to report it
>
> I am running spark-1.1.0-bin-hadoop2.4 on an AWS ec2 cluster. I start the
> cluster using the launch script provided by spark
>
> I start iPython notebook on my cluster master as follows and use an ssh
> tunnel to open the notebook in a browser running on my local computer
>
> ec2-user@ip-172-31-20-107 ~]$ IPYTHON_OPTS="notebook --pylab inline
> --no-browser --port=7000" /root/spark/bin/pyspark
>
> Bellow is the code my notebook executes
>
>
> Bug list:
>
> 1. Why do I need to create a SparkContext? If I run pyspark
> interactively The context is created automatically for me
> 2. The print statement causes the output to be displayed in the
> terminal I started pyspark, not in the notebooks output
>
> Any comments or suggestions would be greatly appreciated
>
> Thanks
>
> Andy
>
>
> import sys
> from operator import add
>
> from pyspark import SparkContext
>
> # only stand alone jobs should create a SparkContext
> sc = SparkContext(appName="pyStreamingSparkRDDPipe”)
>
> data = [1, 2, 3, 4, 5]
> rdd = sc.parallelize(data)
>
> def echo(data):
> print "python recieved: %s" % (data) # output winds up in the shell
> console in my cluster (ie. The machine I launched pyspark from)
>
> rdd.foreach(echo)
> print "we are done"
>
>
>
--
jay vyas