Hello friends:

I recently compiled and installed Spark v0.9 from the Apache distribution.

Note: I have the Cloudera/CDH5 Spark RPMs co-installed as well (actually, the entire big-data suite from CDH is installed), but for the moment I'm using my
manually built Apache Spark for 'ground-up' learning purposes.

Now, prior to compilation (i.e. 'sbt/sbt clean compile') I specified the following:

      export SPARK_YARN=true
      export SPARK_HADOOP_VERSION=2.3.0-cdh5.0.0

The resulting examples ran fine locally as well as on YARN.

I'm not interested in YARN here; just mentioning it for completeness in case that matters in
my upcoming question. Here is my issue / question:

I start pyspark locally -- on one machine for API learning purposes -- as shown below, and attempt to interact with a local text file (not in HDFS). Unfortunately, the SparkContext (sc) tries to connect to a HDFS Name Node (which I don't currently have enabled because I don't need it).

The SparkContext cleverly inspects the configurations in my '/etc/hadoop/conf/' directory to learn where my Name Node is, however I don't want it to do that in this case. I just want it to run a
one-machine local version of 'pyspark'.

Did I miss something in my invocation/use of 'pyspark' below? Do I need to add something else?

(Btw: I searched but could not find any solutions, and the documentation, while good, doesn't
quite get me there).

See below, and thank you all in advance!


user$ export PYSPARK_PYTHON=/usr/bin/bpython
user$ export MASTER=local[8]
user$ /home/user/APPS.d/SPARK.d/latest/bin/pyspark
# ===========================================================================================
  >>> sc
  <pyspark.context.SparkContext object at 0x24f0f50>
  >>>
  >>> distData = sc.textFile('/home/user/Download/ml-10M100K/ratings.dat')
  >>> distData.count()
  [ ... snip ... ]
*Py4JJavaError: An error occurred while calling o21.collect.
: java.net.ConnectException: Call From server01/192.168.0.15 to namenode:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused*
  [ ... snip ... ]
  >>>
  >>>
# ===========================================================================================

--
Sincerely,
DiData

Reply via email to