You should use a distributed filesystem such as HDFS. If you want to use the local filesystem then you have to copy each file to each node.
> On 29. Sep 2017, at 12:05, Gaurav1809 <gauravhpan...@gmail.com> wrote: > > Hi All, > > I have multi node architecture of (1 master,2 workers) Spark cluster, the > job runs to read CSV file data and it works fine when run on local mode > (Local(*)). > However, when the same job is ran in cluster mode(Spark://HOST:PORT), it is > not able to read it. > I want to know how to reference the files Or where to store them? Currently > the CSV data file is on master(from where the job is submitted). > > Following code works fine in local mode but not in cluster mode. > > val spark = SparkSession > .builder() > .appName("SampleFlightsApp") > .master("spark://masterIP:7077") // change it to .master("local[*]) > for local mode > .getOrCreate() > > val flightDF = > spark.read.option("header",true).csv("/home/username/sampleflightdata") > flightDF.printSchema() > > Error: FileNotFoundException: File file:/home/username/sampleflightdata does > not exist > > > > -- > Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org