Hi Chris, Did you ever figure this out? It should just work provided that your HDFS is set up correctly. If you don't call setMaster, it actually uses the spark://[master-node-ip]:7077 by default (this is configured in your conf/spark-env.sh). However, even if you use a local master, it should still work (I just tried this on my own EC2 cluster). By the way, SPARK_MASTER is actually deprecated. Instead, please use bin/spark-submit --master [your master].
Andrew 2014-07-16 23:46 GMT-07:00 Akhil Das <ak...@sigmoidanalytics.com>: > You can try the following in the spark-shell: > > 1. Run it in *Clustermode* by going inside the spark directory: > > $ SPARK_MASTER=spark://masterip:7077 ./bin/spark-shell > > val textFile = sc.textFile("hdfs://masterip/data/blah.csv") > > textFile.take(10).foreach(println) > > > 2. Now try running in *Localmode:* > > $ SPARK_MASTER=local ./bin/spark-shell > > val textFile = sc.textFile("hdfs://masterip/data/blah.csv") > > textFile.take(10).foreach(println) > > > Both should print the first 10 lines from your blah.csv file. > > > >