Hi Ryan,

since the driver is at your laptop, in order to access a remote file you need to specify the url for this I guess.

For example, when I am using Spark over HDFS I specify the file like hdfs://blablabla which contains the url where namenode

can answer. I believe that something similar must be done here.

all the best,

Apostolos


On 25/11/20 16:51, Ryan Victory wrote:
Hello!

I have been tearing my hair out trying to solve this problem. Here is my setup:

1. I have Spark running on a server in standalone mode with data on the filesystem of the server itself (/opt/data/). 2. I have an instance of a Hive Metastore server running (backed by MariaDB) on the same server
3. I have a laptop where I am developing my spark jobs (Scala)

I have configured Spark to use the metastore and set the warehouse directory to be in /opt/data/warehouse/. What I am trying to accomplish are a couple of things:

1. I am trying to submit Spark jobs (via JARs) using spark-submit, but have the driver run on my local machine (my laptop). I want the jobs to use the data ON THE SERVER and not try to reference it from my local machine. If I do something like this:

val df = spark.sql("SELECT * FROM parquet.`/opt/data/transactions.parquet`")

I get an error that the path doesn't exist (because it's trying to find it on my laptop). If I run the same thing in a spark-shell on the spark server itself, there isn't an issue because the driver has access to the data. If I submit the job with submit-mode=cluster then it works too because the driver is on the cluster. I don't want this, I want to get the results on my laptop.

How can I force Spark to read the data from the cluster's filesystem and not the driver's?

2. I have setup a Hive Metastore and created a table (in the spark shell on the spark server itself). The data in the warehouse is in the local filesystem. When I create a spark application JAR and try to run it from my laptop, I get the same problem as #1, namely that it tries to find the warehouse directory on my laptop itself.

Am I crazy? Perhaps this isn't a supported way to use Spark? Any help or insights are much appreciated!

-Ryan Victory

--
Apostolos N. Papadopoulos, Associate Professor
Department of Informatics
Aristotle University of Thessaloniki
Thessaloniki, GREECE
tel: ++0030312310991918
email: papad...@csd.auth.gr
twitter: @papadopoulos_ap
web: http://datalab.csd.auth.gr/~apostol


---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to