Hi,

if you start spark or pyspark from command line and then add the option
--jars and see that things are working fine, then it means that you will
have to add the jar either to SPARK_HOME jars file or modify the spark-env
file to include the path pointing to the location where the jar file is
stored. This location has to be accessible by all the worker nodes.


Regards,
Gourav Sengupta

On Sat, Apr 14, 2018 at 6:02 PM, Jason Boorn <jbo...@gmail.com> wrote:

> Ok great I’ll give that a shot -
>
> Thanks for all the help
>
>
> On Apr 14, 2018, at 12:08 PM, Gene Pang <gene.p...@gmail.com> wrote:
>
> Yes, I think that is the case. I haven't tried that before, but it should
> work.
>
> Thanks,
> Gene
>
> On Fri, Apr 13, 2018 at 11:32 AM, Jason Boorn <jbo...@gmail.com> wrote:
>
>> Hi Gene -
>>
>> Are you saying that I just need to figure out how to get the Alluxio jar
>> into the classpath of my parent application?  If it shows up in the
>> classpath then Spark will automatically know that it needs to use it when
>> communicating with Alluxio?
>>
>> Apologies for going back-and-forth on this - I feel like my particular
>> use case is clouding what is already a tricky issue.
>>
>> On Apr 13, 2018, at 2:26 PM, Gene Pang <gene.p...@gmail.com> wrote:
>>
>> Hi Jason,
>>
>> Alluxio does work with Spark in master=local mode. This is because both
>> spark-submit and spark-shell have command-line options to set the classpath
>> for the JVM that is being started.
>>
>> If you are not using spark-submit or spark-shell, you will have to figure
>> out how to configure that JVM instance with the proper properties.
>>
>> Thanks,
>> Gene
>>
>> On Fri, Apr 13, 2018 at 10:47 AM, Jason Boorn <jbo...@gmail.com> wrote:
>>
>>> Ok thanks - I was basing my design on this:
>>>
>>> https://databricks.com/blog/2016/08/15/how-to-use-sparksessi
>>> on-in-apache-spark-2-0.html
>>>
>>> Wherein it says:
>>> Once the SparkSession is instantiated, you can configure Spark’s runtime
>>> config properties.
>>> Apparently the suite of runtime configs you can change does not include
>>> classpath.
>>>
>>> So the answer to my original question is basically this:
>>>
>>> When using local (pseudo-cluster) mode, there is no way to add external
>>> jars to the spark instance.  This means that Alluxio will not work with
>>> Spark when Spark is run in master=local mode.
>>>
>>> Thanks again - often getting a definitive “no” is almost as good as a
>>> yes.  Almost ;)
>>>
>>> On Apr 13, 2018, at 1:21 PM, Marcelo Vanzin <van...@cloudera.com> wrote:
>>>
>>> There are two things you're doing wrong here:
>>>
>>> On Thu, Apr 12, 2018 at 6:32 PM, jb44 <jbo...@gmail.com> wrote:
>>>
>>> Then I can add the alluxio client library like so:
>>> sparkSession.conf.set("spark.driver.extraClassPath",
>>> ALLUXIO_SPARK_CLIENT)
>>>
>>>
>>> First one, you can't modify JVM configuration after it has already
>>> started. So this line does nothing since it can't re-launch your
>>> application with a new JVM.
>>>
>>> sparkSession.conf.set("spark.executor.extraClassPath",
>>> ALLUXIO_SPARK_CLIENT)
>>>
>>>
>>> There is a lot of configuration that you cannot set after the
>>> application has already started. For example, after the session is
>>> created, most probably this option will be ignored, since executors
>>> will already have started.
>>>
>>> I'm not so sure about what happens when you use dynamic allocation,
>>> but these post-hoc config changes in general are not expected to take
>>> effect.
>>>
>>> The documentation could be clearer about this (especially stuff that
>>> only applies to spark-submit), but that's the gist of it.
>>>
>>>
>>> --
>>> Marcelo
>>>
>>>
>>>
>>
>>
>
>

Reply via email to