Spark 2.3.0 --files vs. addFile()

Marius Wed, 09 May 2018 00:51:55 -0700

Hey,

i am using Spark to distribute the execution of a binary tool and to dosome further calculation further down stream. I want to distribute thebinary tool using either the --files or the addFile option from spark tomake it available on each worker node. However although he tells my thathe added the file:2018-05-09 07:42:19 INFO SparkContext:54 - Added files3a://executables/blastp at s3a://executables/foo with timestamp15258517399722018-05-09 07:42:20 INFO Utils:54 - Fetching s3a://executables/foo to/tmp/spark-54931ea6-b3d6-419b-997b-a498da898b77/userFiles-5e4b66e5-de4a-4420-a641-4453b9ea2ead/fetchFileTemp3437582648265876247.tmp

However when i want to execute the tool using pipe it does not work. Icurrently assume that the file is only downloaded to the master node.However i am not sure if i misunderstood the concept of adding files inspark or if i did something wrong.I am getting the path with Sparkfiles.get(). It does work but the bin isnot there.


This is my call:

spark-submit \
--class de.jlu.bioinfsys.sparkBlast.run.Run \
--master $master \
--jars${awsPath},${awsJavaSDK} \
--files 
s3a://database/a.a.z,s3a://database/a.a.y,s3a://database/a.a.x,s3a://executables/tool
 \
--conf spark.executor.extraClassPath=${awsPath}:${awsJavaSDK} \
--conf spark.driver.extraClassPath=${awsPath}:${awsJavaSDK} \
--conf 
spark.hadoop.fs.s3a.endpoint=https://s3.computational.bio.uni-giessen.de/ \
--conf spark.hadoop.fs.s3a.access.key=$s3Access \
--conf spark.hadoop.fs.s3a.secret.key=$s3Secret \
--conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
${execJarPath}

I am using Spark v 2.3.0 along with scala in Standalone cluster nodewith three workers.


Cheers
Marius

Spark 2.3.0 --files vs. addFile()

Reply via email to