Re: How to share a dataset file across nodes

2023-03-09 Thread Mich Talebzadeh
Try something like below 1) Put your csv say cities.csv in HDFS as below hdfs dfs -put cities.csv /data/stg/test 2) Read it into dataframe in PySpark as below csv_file="hdfs://:PORT/data/stg/test/cities.csv" # read it in spark listing_df = spark.read.format("com.databricks.spark.csv").option("infe

Re: How to share a dataset file across nodes

2023-03-09 Thread Sean Owen
Put the file on HDFS, if you have a Hadoop cluster? On Thu, Mar 9, 2023 at 3:02 PM sam smith wrote: > Hello, > > I use Yarn client mode to submit my driver program to Hadoop, the dataset > I load is from the local file system, when i invoke load("file://path") > Spark complains about the csv fil

How to share a dataset file across nodes

2023-03-09 Thread sam smith
Hello, I use Yarn client mode to submit my driver program to Hadoop, the dataset I load is from the local file system, when i invoke load("file://path") Spark complains about the csv file being not found, which i totally understand, since the dataset is not in any of the workers or the application