[Spark-Submit] Where to store data files while running job in cluster mode?

Gaurav1809 Fri, 29 Sep 2017 03:01:54 -0700

Hi All,

I have multi node architecture of (1 master,2 workers) Spark cluster, the
job runs to read CSV file data and it works fine when run on local mode
(Local(*)). However, when the same job is ran in cluster mode
(Spark://HOST:PORT), it is not able to read it. I want to know how to
reference the files Or where to store them? Currently the CSV data file is
on master(from where the job is submitted).


Following code works fine in local mode but not in cluster mode.

val spark = SparkSession
      .builder()
      .appName("SampleFlightsApp")
      .master("spark://masterIP:7077") // change it to .master("local[*])
for local mode
      .getOrCreate()

    val flightDF =
spark.read.option("header",true).csv("/home/username/sampleflightdata")
    flightDF.printSchema()

Error: FileNotFoundException: File file:/home/gaurav/sampleflightdata does
not exist



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

[Spark-Submit] Where to store data files while running job in cluster mode?

Reply via email to