Re: Spark 2.3.0 --files vs. addFile()

Lalwani, Jayesh Thu, 10 May 2018 05:54:22 -0700

This is a long standing bug in Spark. –jars and –files doesn’t work in 
Standalone mode
https://issues.apache.org/jira/browse/SPARK-4160

From: Marius <m.die0...@gmail.com>
Date: Wednesday, May 9, 2018 at 3:51 AM
To: "user@spark.apache.org" <user@spark.apache.org>
Subject: Spark 2.3.0 --files vs. addFile()

Hey,

i am using Spark to distribute the execution of a binary tool and to do some 
further calculation further down stream. I want to distribute the binary tool 
using either the --files or the addFile option from spark to make it available 
on each worker node. However although he tells my that he added the file:
2018-05-09 07:42:19 INFO  SparkContext:54 - Added file s3a://executables/blastp 
at s3a://executables/foo with timestamp 1525851739972
2018-05-09 07:42:20 INFO  Utils:54 - Fetching s3a://executables/foo to 
/tmp/spark-54931ea6-b3d6-419b-997b-a498da898b77/userFiles-5e4b66e5-de4a-4420-a641-4453b9ea2ead/fetchFileTemp3437582648265876247.tmp

However when i want to execute the tool using pipe it does not work. I 
currently assume that the file is only downloaded to the master node. However i 
am not sure if i misunderstood the concept of adding files in spark or if i did 
something wrong.
I am getting the path with Sparkfiles.get(). It does work but the bin is not 
there.

This is my call:

spark-submit \

--class de.jlu.bioinfsys.sparkBlast.run.Run \

--master $master \

--jars ${awsPath},${awsJavaSDK} \

--files 
s3a://database/a.a.z,s3a://database/a.a.y,s3a://database/a.a.x,s3a://executables/tool
 \

--conf spark.executor.extraClassPath=${awsPath}:${awsJavaSDK} \

--conf spark.driver.extraClassPath=${awsPath}:${awsJavaSDK} \

--conf 
spark.hadoop.fs.s3a.endpoint=https://s3.computational.bio.uni-giessen.de/<https://urldefense.proofpoint.com/v2/url?u=https-3A__s3.computational.bio.uni-2Dgiessen.de_&d=DwMDaQ&c=pLULRYW__RtkwsQUPxJVDGboCTdgji3AcHNJU0BpTJE&r=F2RNeGILvLdBxn7RJ4effes_QFIiEsoVM2rPi9qX1DKow5HQSjq0_WhIW109SXQ4&m=X4cHBr7qC1lO1X6gj37AszPDf02jI96W88e-9rVvaFw&s=eN8XVAxUFlLI0UTs2kV6eTGXhM6c8OJMJMWaa5rlK2I&e=>
 \

--conf spark.hadoop.fs.s3a.access.key=$s3Access \

--conf spark.hadoop.fs.s3a.secret.key=$s3Secret \

--conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \

${execJarPath}
I am using Spark v 2.3.0 along with scala in Standalone cluster node with three 
workers.

Cheers
Marius

________________________________________________________

The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates and may only be used solely in performance of 
work or services for Capital One. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed. If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.

Re: Spark 2.3.0 --files vs. addFile()

Reply via email to