Re: Spark on k8s - map persistentStorage for data spilling

Tomasz Krol Fri, 01 Mar 2019 11:04:58 -0800

Hi Matt,

Thanks for coming back to me. Yeah that doesn't work. Basically in the
properties I set Volume and mounting point as below;


spark.kubernetes.driver.volumes.persistentVolumeClaim.checkvolume.mount.path=/checkpoint
spark.kubernetes.driver.volumes.persistentVolumeClaim.checkvolume.mount.readOnly=false
spark.kubernetes.driver.volumes.persistentVolumeClaim.checkvolume.mount.claimName=sparkstorage

spark.kubernetes.executor.volumes.persistentVolumeClaim.checkvolume.mount.path=/checkpoint
spark.kubernetes.executor.volumes.persistentVolumeClaim.checkvolume.mount.readOnly=false
spark.kubernetes.executor.volumes.persistentVolumeClaim.checkvolume.mount.claimName=sparkstorage

That works as expected and PVC is mounted in the driver and executor PODs
on /checkpoint directory.

As you suggested, first thing what I was trying it was set spark.local.dir
or env SPARK_LOCAL_DIRS to directory /checkpoint. As my expectation was
that it will be spilling to my PVC. However this is throwing following
error:

"spark-kube-driver" is invalid:
spec.containers[0].volumeMounts[3].mountPath: Invalid value: "/checkpoint":
must be unique"

It seems like it's trying to mount emptyDir with mounting point
"/checkpoint", but it can't because "/checkpoint" is the directory where
the PVC is already mounted.

At the moment it looks like to me, the emptyDir is always used for spilling
data. The question is how to mount it on the PVC. Unless I miss something
here.
I can't really run any bigger jobs at the moment because of that.
Appreciate any feedback :)

Thanks

Tom

On Thu, 28 Feb 2019 at 17:23, Matt Cheah <mch...@palantir.com> wrote:

> I think we want to change the value of spark.local.dir to point to where
> your PVC is mounted. Can you give that a try and let us know if that moves
> the spills as expected?
>
>
>
> -Matt Cheah
>
>
>
> *From: *Tomasz Krol <patric...@gmail.com>
> *Date: *Wednesday, February 27, 2019 at 3:41 AM
> *To: *"user@spark.apache.org" <user@spark.apache.org>
> *Subject: *Spark on k8s - map persistentStorage for data spilling
>
>
>
> Hey Guys,
>
>
>
> I hope someone will be able to help me, as I've stuck with this for a
> while:) Basically I am running some jobs on kubernetes as per documentation
>
>
>
> https://spark.apache.org/docs/latest/running-on-kubernetes.html
> [spark.apache.org]
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__spark.apache.org_docs_latest_running-2Don-2Dkubernetes.html&d=DwMFaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=hzwIMNQ9E99EMYGuqHI0kXhVbvX3nU3OSDadUnJxjAs&m=pl7iQpYOLmjHJrzMaSyfQ56-lmUgnrE-__71VnhN_t0&s=KRpveGOSKlQ8zkPxuwZCAiXRMqVh9nE7B2aU_fN-bFg&e=>
>
>
>
> All works fine, however if I run queries on bigger data volume, then jobs
> failing that there is not enough space in /var/data/spark-1xxx directory.
>
>
>
> Obviously the reason for this is that emptyDir mounted doesnt have enough
> space.
>
>
>
> I also mounted pvc to the driver and executors pods which I can see during
> the runtime. I am wondering if someone knows how to set that data will be
> spilled to different directory (i.e my persistent storage directory)
> instead of empyDir with some limitted space. Or if I can mount the empyDir
> somehow on my pvc. Basically at the moment I cant run any jobs as they are
> failing due to insufficient space in that /var/data directory.
>
>
>
> Thanks
>
> --
>
> Tomasz Krol
> patric...@gmail.com
>


-- 
Tomasz Krol
patric...@gmail.com

Re: Spark on k8s - map persistentStorage for data spilling

Reply via email to