Try something like below
1) Put your csv say cities.csv in HDFS as below
hdfs dfs -put cities.csv /data/stg/test
2) Read it into dataframe in PySpark as below
csv_file="hdfs://:PORT/data/stg/test/cities.csv"
# read it in spark
listing_df =
spark.read.format("com.databricks.spark.csv").option("infe
Put the file on HDFS, if you have a Hadoop cluster?
On Thu, Mar 9, 2023 at 3:02 PM sam smith wrote:
> Hello,
>
> I use Yarn client mode to submit my driver program to Hadoop, the dataset
> I load is from the local file system, when i invoke load("file://path")
> Spark complains about the csv fil
Hello,
I use Yarn client mode to submit my driver program to Hadoop, the dataset I
load is from the local file system, when i invoke load("file://path") Spark
complains about the csv file being not found, which i totally understand,
since the dataset is not in any of the workers or the application