[Spark-Submit] Where to store data files while running job in cluster mode?

Gaurav1809 Fri, 29 Sep 2017 03:06:07 -0700

Hi All,

I have multi node architecture of (1 master,2 workers) Spark cluster, the
job runs to read CSV file data and it works fine when run on local mode
(Local(*)). 
However, when the same job is ran in cluster mode(Spark://HOST:PORT), it is
not able to read it. 
I want to know how to reference the files Or where to store them? Currently
the CSV data file is on master(from where the job is submitted).


Following code works fine in local mode but not in cluster mode.

val spark = SparkSession
      .builder()
      .appName("SampleFlightsApp")
      .master("spark://masterIP:7077") // change it to .master("local[*])
for local mode
      .getOrCreate()

    val flightDF =
spark.read.option("header",true).csv("/home/username/sampleflightdata")
    flightDF.printSchema()

Error: FileNotFoundException: File file:/home/username/sampleflightdata does
not exist



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

[Spark-Submit] Where to store data files while running job in cluster mode?

Reply via email to