Yes you need to store the file at a location where it is equally retrievable ("same path") for the master and all nodes in the cluster. A simple solution (apart from a HDFS) that does not scale to well but might be a OK with only 3 nodes like in your configuration is a network accessible storage (a NAS or a shared folder for example).
hope this helps Alexander On Fri, Sep 29, 2017 at 12:05 PM, Sathishkumar Manimoorthy < mrsathishkuma...@gmail.com> wrote: > Place it in HDFS and give the reference path in your code. > > Thanks, > Sathish > > On Fri, Sep 29, 2017 at 3:31 PM, Gaurav1809 <gauravhpan...@gmail.com> > wrote: > >> Hi All, >> >> I have multi node architecture of (1 master,2 workers) Spark cluster, the >> job runs to read CSV file data and it works fine when run on local mode >> (Local(*)). However, when the same job is ran in cluster mode >> (Spark://HOST:PORT), it is not able to read it. I want to know how to >> reference the files Or where to store them? Currently the CSV data file is >> on master(from where the job is submitted). >> >> Following code works fine in local mode but not in cluster mode. >> >> val spark = SparkSession >> .builder() >> .appName("SampleFlightsApp") >> .master("spark://masterIP:7077") // change it to .master("local[*]) >> for local mode >> .getOrCreate() >> >> val flightDF = >> spark.read.option("header",true).csv("/home/username/sampleflightdata") >> flightDF.printSchema() >> >> Error: FileNotFoundException: File file:/home/gaurav/sampleflightdata >> does >> not exist >> >> >> >> -- >> Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> >