The error message shows a mismatch between the configured warehouse
directory and the actual location accessible by the Spark application
running in the container..
You have configured the SparkSession with
spark.sql.warehouse.dir="file:/data/hive/warehouse". This tells Spark where
to store tempor
Okay that was some caching issue. Now there is a shared mount point between
the place the pyspark code is executed and the spark nodes it runs. Hrmph,
I was hoping that wouldn't be the case. Fair enough!
On Thu, Mar 7, 2024 at 11:23 PM Tom Barber wrote:
> Okay interesting, maybe my assumption wa
Okay interesting, maybe my assumption was incorrect, although I'm still
confused.
I tried to mount a central mount point that would be the same on my local
machine and the container. Same error although I moved the path to
/tmp/hive/data/hive/ but when I rerun the test code to save a table,
th
Wonder if anyone can just sort my brain out here as to whats possible or
not.
I have a container running Spark, with Hive and a ThriftServer. I want to
run code against it remotely.
If I take something simple like this
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType