$SPARK_HOME/bin/spark-shell --packages com.databricks:spark-csv_2.10:1.3.0
It will download everything for you and register into your JVM. If you want to use it in your Prod just package it with maven. > On 15/02/2016, at 12:14, Gourav Sengupta <gourav.sengu...@gmail.com> wrote: > > Hi, > > How to we include the following package: > https://github.com/databricks/spark-csv > <https://github.com/databricks/spark-csv> while starting a SPARK standalone > cluster as mentioned here: > http://spark.apache.org/docs/latest/spark-standalone.html > <http://spark.apache.org/docs/latest/spark-standalone.html> > > > > Thanks and Regards, > Gourav Sengupta > > On Mon, Feb 15, 2016 at 10:32 AM, Ramanathan R <ramanatha...@gmail.com > <mailto:ramanatha...@gmail.com>> wrote: > Hi Gourav, > > If your question is how to distribute python package dependencies across the > Spark cluster programmatically? ...here is an example - > > $ export > PYTHONPATH='path/to/thrift.zip:path/to/happybase.zip:path/to/your/py/application' > > And in code: > > sc.addPyFile('/path/to/thrift.zip') > sc.addPyFile('/path/to/happybase.zip') > > Regards, > Ram > > > > On 15 February 2016 at 15:16, Gourav Sengupta <gourav.sengu...@gmail.com > <mailto:gourav.sengu...@gmail.com>> wrote: > Hi, > > So far no one is able to get my question at all. I know what it takes to load > packages via SPARK shell or SPARK submit. > > How do I load packages when starting a SPARK cluster, as mentioned here > http://spark.apache.org/docs/latest/spark-standalone.html > <http://spark.apache.org/docs/latest/spark-standalone.html> ? > > > Regards, > Gourav Sengupta > > > > > On Mon, Feb 15, 2016 at 3:25 AM, Divya Gehlot <divya.htco...@gmail.com > <mailto:divya.htco...@gmail.com>> wrote: > with conf option > > spark-submit --conf 'key = value ' > > Hope that helps you. > > On 15 February 2016 at 11:21, Divya Gehlot <divya.htco...@gmail.com > <mailto:divya.htco...@gmail.com>> wrote: > Hi Gourav, > you can use like below to load packages at the start of the spark shell. > > spark-shell --packages com.databricks:spark-csv_2.10:1.1.0 > > On 14 February 2016 at 03:34, Gourav Sengupta <gourav.sengu...@gmail.com > <mailto:gourav.sengu...@gmail.com>> wrote: > Hi, > > I was interested in knowing how to load the packages into SPARK cluster > started locally. Can someone pass me on the links to set the conf file so > that the packages can be loaded? > > Regards, > Gourav > > On Fri, Feb 12, 2016 at 6:52 PM, Burak Yavuz <brk...@gmail.com > <mailto:brk...@gmail.com>> wrote: > Hello Gourav, > > The packages need to be loaded BEFORE you start the JVM, therefore you won't > be able to add packages dynamically in code. You should use the --packages > with pyspark before you start your application. > One option is to add a `conf` that will load some packages if you are > constantly going to use them. > > Best, > Burak > > > > On Fri, Feb 12, 2016 at 4:22 AM, Gourav Sengupta <gourav.sengu...@gmail.com > <mailto:gourav.sengu...@gmail.com>> wrote: > Hi, > > I am creating sparkcontext in a SPARK standalone cluster as mentioned here: > http://spark.apache.org/docs/latest/spark-standalone.html > <http://spark.apache.org/docs/latest/spark-standalone.html> using the > following code: > > -------------------------------------------------------------------------------------------------------------------------- > sc.stop() > conf = SparkConf().set( 'spark.driver.allowMultipleContexts' , False) \ > .setMaster("spark://hostname:7077") \ > .set('spark.shuffle.service.enabled', True) \ > .set('spark.dynamicAllocation.enabled','true') \ > .set('spark.executor.memory','20g') \ > .set('spark.driver.memory', '4g') \ > > .set('spark.default.parallelism',(multiprocessing.cpu_count() -1 )) > conf.getAll() > sc = SparkContext(conf = conf) > > -----(we should definitely be able to optimise the configuration but that is > not the point here) --- > > I am not able to use packages, a list of which is mentioned here > http://spark-packages.org <http://spark-packages.org/>, using this method. > > Where as if I use the standard "pyspark --packages" option then the packages > load just fine. > > I will be grateful if someone could kindly let me know how to load packages > when starting a cluster as mentioned above. > > > Regards, > Gourav Sengupta > > > > > > >