I have a number of CSV files and need to combine them into a RDD by part of
their filenames.
For example, for the below files
$ ls
20140101_1.csv 20140101_3.csv 20140201_2.csv 20140301_1.csv
20140301_3.csv 20140101_2.csv 20140201_1.csv 20140201_3.csv
I need to combine files with names
core-site.xml
configuration
property
namefs.default.name/name
valuehdfs://localhost:9000/value
/property
/configuration
hdfs_site.xml -
configuration
property
namedfs.replication/name
value1/value
/property
property
namedfs.namenode.name.dir/name
Hi Himanshu,
I am using spark_core_2.10 in my maven dependency. There were no issues with
that.
The problem I had with this was that the spark master was running on
localhost inside the vm and the slave was not able to connect it.
I changed the spark master to run on the private IP address
I have a Spark standalone cluster with 2 workers -
Master and one slave thread run on a single machine -- Machine 1
Another slave running on a separate machine -- Machine 2
I am running a spark shell in the 2nd machine that reads a file from hdfs
and does some calculations on them and stores the