NFS is a simple option for this kind of usage, yes.
But --files is making N copies of the data - you may not want to do that
for large data, or for data that you need to mutate.
On Wed, Nov 25, 2020 at 9:16 PM Artemis User wrote:
> Ah, I almost forgot that there is an even easier solution for yo
Ah, I almost forgot that there is an even easier solution for your
problem, namely to use the --files option in spark-submit. Usage as follows:
--files FILES Comma-separated list of files to be placed in
the working
directory of each executor. File paths
This is a typical file sharing problem in Spark. Just setting up HDFS
won't solve the problem unless you make your local machine as part of
the cluster. Spark server doesn't share files with your local machine
without mounting drives to each other. The best/easiest way to share
the data betw
In your situation, I'd try to do one of the following (in decreasing order
of personal preference)
1. Restructure things so that you can operate on a local data file, at
least for the purpose of developing your driver logic. Don't rely on the
Metastore or HDFS until you have to. Structu
A key part of what I'm trying to do involves NOT having to bring the data
"through" the driver in order to get the cluster to work on it (which would
involve a network hop from server to laptop and another from laptop to
server). I'd rather have the data stay on the server and the driver stay on
my
I'm also curious if this is possible, so while I can't offer a solution
maybe you could try the following.
The driver and executor nodes need to have access to the same
(distributed) file system, so you could try to mount the file system to
your laptop, locally, and then try to submit jobs and/or
Thanks Apostolos,
I'm trying to avoid standing up HDFS just for this use case (single node).
-Ryan
On Wed, Nov 25, 2020 at 8:56 AM Apostolos N. Papadopoulos <
papad...@csd.auth.gr> wrote:
> Hi Ryan,
>
> since the driver is at your laptop, in order to access a remote file you
> need to specify t
Hi Ryan,
since the driver is at your laptop, in order to access a remote file you
need to specify the url for this I guess.
For example, when I am using Spark over HDFS I specify the file like
hdfs://blablabla which contains the url where namenode
can answer. I believe that something simila
Hello!
I have been tearing my hair out trying to solve this problem. Here is my
setup:
1. I have Spark running on a server in standalone mode with data on the
filesystem of the server itself (/opt/data/).
2. I have an instance of a Hive Metastore server running (backed by
MariaDB) on the same ser