Hello friends:
I recently compiled and installed Spark v0.9 from the Apache distribution.
Note: I have the Cloudera/CDH5 Spark RPMs co-installed as well
(actually, the
entire big-data suite from CDH is installed), but for the moment I'm
using my
manually built Apache Spark for 'ground-up' learning purposes.
Now, prior to compilation (i.e. 'sbt/sbt clean compile') I specified the
following:
export SPARK_YARN=true
export SPARK_HADOOP_VERSION=2.3.0-cdh5.0.0
The resulting examples ran fine locally as well as on YARN.
I'm not interested in YARN here; just mentioning it for completeness in
case that matters in
my upcoming question. Here is my issue / question:
I start pyspark locally -- on one machine for API learning purposes --
as shown below, and attempt to
interact with a local text file (not in HDFS). Unfortunately, the
SparkContext (sc) tries to connect to
a HDFS Name Node (which I don't currently have enabled because I don't
need it).
The SparkContext cleverly inspects the configurations in my
'/etc/hadoop/conf/' directory to learn
where my Name Node is, however I don't want it to do that in this case.
I just want it to run a
one-machine local version of 'pyspark'.
Did I miss something in my invocation/use of 'pyspark' below? Do I need
to add something else?
(Btw: I searched but could not find any solutions, and the
documentation, while good, doesn't
quite get me there).
See below, and thank you all in advance!
user$ export PYSPARK_PYTHON=/usr/bin/bpython
user$ export MASTER=local[8]
user$ /home/user/APPS.d/SPARK.d/latest/bin/pyspark
#
===========================================================================================
>>> sc
<pyspark.context.SparkContext object at 0x24f0f50>
>>>
>>> distData = sc.textFile('/home/user/Download/ml-10M100K/ratings.dat')
>>> distData.count()
[ ... snip ... ]
*Py4JJavaError: An error occurred while calling o21.collect.
: java.net.ConnectException: Call From server01/192.168.0.15 to
namenode:8020 failed on connection exception:
java.net.ConnectException: Connection refused; For more details
see: http://wiki.apache.org/hadoop/ConnectionRefused*
[ ... snip ... ]
>>>
>>>
#
===========================================================================================
--
Sincerely,
DiData