Spark on k8s: Mount config map in executor

2019-08-26 Thread Steven Stetzler
Hello everyone, I am wondering if there is a way to mount a Kubernetes ConfigMap into a directory in a Spark executor on Kubernetes. Poking around the docs, the only volume mounting options I can find are for a PVC, a directory on the host machine, and an empty volume. I am trying to pass in confi

Re: PGP Encrypt using spark Scala

2019-08-26 Thread Roland Johann
I want to add that the major hadoop distributions also offer additional encryption possibilities (for example Ranger from Hortonworks) Roland Johann Software Developer/Data Engineer phenetic GmbH Lütticher Straße 10, 50674 Köln, Germany Mobil: +49 172 365 26 46 Mail: roland.joh...@phenetic.io W

Re: PGP Encrypt using spark Scala

2019-08-26 Thread Roland Johann
Hi all, instead of handling encryption explicit at application level, I suggest that you investigate into the topic „encryption at rest“, for example encryption at HDFS level https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html

Re: PGP Encrypt using spark Scala

2019-08-26 Thread Sachit Murarka
Hi Deepak, Thanks for reply. Yes. That is the option I am considering now because even apache camel needs data in local. I might need to copy data from hdfs to local if I want apache camel ( to get rid of shell). Thanks Sachit On Mon, 26 Aug 2019, 21:11 Deepak Sharma, wrote: > Hi Schit > PGP

Re: PGP Encrypt using spark Scala

2019-08-26 Thread Deepak Sharma
Hi Schit PGP Encrypt is something that is not inbuilt with spark. I would suggest writing a shell script that would do pgp encrypt and use it in spark scala program , which would run from driver. Thanks Deepak On Mon, Aug 26, 2019 at 8:10 PM Sachit Murarka wrote: > Hi All, > > I want to encrypt

PGP Encrypt using spark Scala

2019-08-26 Thread Sachit Murarka
Hi All, I want to encrypt my files available at HDFS location using PGP Encryption How can I do it in spark. I saw Apache Camel but it seems camel is used when source files are in Local location rather than HDFS. Kind Regards, Sachit Murarka

GraphFrames cluster staling with a large dataset and pyspark

2019-08-26 Thread Alexander Czech
Hey All, I'm trying to run a pagerank with GraphFrames on a large graph (about 90 billion edges and 1.4TB total disk size) and I'm running into some issues. The code is very simplistic it's just load dataframes from S3 and put them into the GraphFrames pagerank function. But when I run it the clust