Re: Using SPARK packages in Spark Cluster

Jorge Machado Mon, 15 Feb 2016 03:18:07 -0800

$SPARK_HOME/bin/spark-shell --packages com.databricks:spark-csv_2.10:1.3.0


It will download everything for you and register into your  JVM.  If you want 
to use it in your Prod just package it with maven. 

> On 15/02/2016, at 12:14, Gourav Sengupta <gourav.sengu...@gmail.com> wrote:
> 
> Hi,
> 
> How to we include the following package: 
> https://github.com/databricks/spark-csv 
> <https://github.com/databricks/spark-csv> while starting a SPARK standalone 
> cluster as mentioned here: 
> http://spark.apache.org/docs/latest/spark-standalone.html 
> <http://spark.apache.org/docs/latest/spark-standalone.html>
> 
> 
> 
> Thanks and Regards,
> Gourav Sengupta
> 
> On Mon, Feb 15, 2016 at 10:32 AM, Ramanathan R <ramanatha...@gmail.com 
> <mailto:ramanatha...@gmail.com>> wrote:
> Hi Gourav, 
> 
> If your question is how to distribute python package dependencies across the 
> Spark cluster programmatically? ...here is an example - 
> 
>          $ export 
> PYTHONPATH='path/to/thrift.zip:path/to/happybase.zip:path/to/your/py/application'
> 
> And in code:
> 
>         sc.addPyFile('/path/to/thrift.zip')
>         sc.addPyFile('/path/to/happybase.zip')
> 
> Regards, 
> Ram
> 
> 
> 
> On 15 February 2016 at 15:16, Gourav Sengupta <gourav.sengu...@gmail.com 
> <mailto:gourav.sengu...@gmail.com>> wrote:
> Hi,
> 
> So far no one is able to get my question at all. I know what it takes to load 
> packages via SPARK shell or SPARK submit. 
> 
> How do I load packages when starting a SPARK cluster, as mentioned here 
> http://spark.apache.org/docs/latest/spark-standalone.html 
> <http://spark.apache.org/docs/latest/spark-standalone.html> ?
> 
> 
> Regards,
> Gourav Sengupta
> 
> 
> 
> 
> On Mon, Feb 15, 2016 at 3:25 AM, Divya Gehlot <divya.htco...@gmail.com 
> <mailto:divya.htco...@gmail.com>> wrote:
> with conf option 
> 
> spark-submit --conf 'key = value '
> 
> Hope that helps you.
> 
> On 15 February 2016 at 11:21, Divya Gehlot <divya.htco...@gmail.com 
> <mailto:divya.htco...@gmail.com>> wrote:
> Hi Gourav,
> you can use like below to load packages at the start of the spark shell.
> 
> spark-shell  --packages com.databricks:spark-csv_2.10:1.1.0   
> 
> On 14 February 2016 at 03:34, Gourav Sengupta <gourav.sengu...@gmail.com 
> <mailto:gourav.sengu...@gmail.com>> wrote:
> Hi,
> 
> I was interested in knowing how to load the packages into SPARK cluster 
> started locally. Can someone pass me on the links to set the conf file so 
> that the packages can be loaded? 
> 
> Regards,
> Gourav
> 
> On Fri, Feb 12, 2016 at 6:52 PM, Burak Yavuz <brk...@gmail.com 
> <mailto:brk...@gmail.com>> wrote:
> Hello Gourav,
> 
> The packages need to be loaded BEFORE you start the JVM, therefore you won't 
> be able to add packages dynamically in code. You should use the --packages 
> with pyspark before you start your application.
> One option is to add a `conf` that will load some packages if you are 
> constantly going to use them.
> 
> Best,
> Burak
> 
> 
> 
> On Fri, Feb 12, 2016 at 4:22 AM, Gourav Sengupta <gourav.sengu...@gmail.com 
> <mailto:gourav.sengu...@gmail.com>> wrote:
> Hi,
> 
> I am creating sparkcontext in a SPARK standalone cluster as mentioned here: 
> http://spark.apache.org/docs/latest/spark-standalone.html 
> <http://spark.apache.org/docs/latest/spark-standalone.html> using the 
> following code:
> 
> --------------------------------------------------------------------------------------------------------------------------
> sc.stop()
> conf = SparkConf().set( 'spark.driver.allowMultipleContexts' , False) \
>                   .setMaster("spark://hostname:7077") \
>                   .set('spark.shuffle.service.enabled', True) \
>                   .set('spark.dynamicAllocation.enabled','true') \
>                   .set('spark.executor.memory','20g') \
>                   .set('spark.driver.memory', '4g') \
>                   
> .set('spark.default.parallelism',(multiprocessing.cpu_count() -1 ))
> conf.getAll()
> sc = SparkContext(conf = conf)
> 
> -----(we should definitely be able to optimise the configuration but that is 
> not the point here) ---
> 
> I am not able to use packages, a list of which is mentioned here 
> http://spark-packages.org <http://spark-packages.org/>, using this method. 
> 
> Where as if I use the standard "pyspark --packages" option then the packages 
> load just fine.
> 
> I will be grateful if someone could kindly let me know how to load packages 
> when starting a cluster as mentioned above.
> 
> 
> Regards,
> Gourav Sengupta
> 
> 
> 
> 
> 
> 
>

Re: Using SPARK packages in Spark Cluster

Reply via email to