NullPointerException inside RDD when calling sc.textFile

2015-07-21 Thread MorEru
I have a number of CSV files and need to combine them into a RDD by part of their filenames. For example, for the below files $ ls 20140101_1.csv 20140101_3.csv 20140201_2.csv 20140301_1.csv 20140301_3.csv 20140101_2.csv 20140201_1.csv 20140201_3.csv I need to combine files with names

Re: Spark standalone cluster - Output file stored in temporary directory in worker

2015-07-07 Thread MorEru
core-site.xml configuration property namefs.default.name/name valuehdfs://localhost:9000/value /property /configuration hdfs_site.xml - configuration property namedfs.replication/name value1/value /property property namedfs.namenode.name.dir/name

Re: Spark Standalone Cluster - Slave not connecting to Master

2015-07-07 Thread MorEru
Hi Himanshu, I am using spark_core_2.10 in my maven dependency. There were no issues with that. The problem I had with this was that the spark master was running on localhost inside the vm and the slave was not able to connect it. I changed the spark master to run on the private IP address

Spark standalone cluster - Output file stored in temporary directory in worker

2015-07-06 Thread MorEru
I have a Spark standalone cluster with 2 workers - Master and one slave thread run on a single machine -- Machine 1 Another slave running on a separate machine -- Machine 2 I am running a spark shell in the 2nd machine that reads a file from hdfs and does some calculations on them and stores the