You have to edit the conf/zeppelin-env.sh

add the following line:

export SPARK_SUBMIT_OPTIONS="--packages
org.elasticsearch::elasticsearch-spark:2.2.0.BUILD-SNAPSHOT"

and then restart Zeppelin:

zeppelin-daemon.sh stop
zeppelin-daemon.sh start

>From now on every time Zeppelin starts it will load the additional packages
automatically.

Last note: I did not test it on the elastic search libs, I have done that
with other libs and it worked (Amazon Kinesis libs). But it should work as
the principle is the same :)

Hope that helps,
Josef


On 16 November 2015 at 21:27, SiS <n...@cht3.com> wrote:

> Hi
>
> thanks a lot for your help. It worked like you described it. But I have
> another question. Would it be possible to define the dependency somehow
> centrally, so that it is not necessary to insert the %dep in all notebooks
> and especially not need to restart the Interpreter all time I start the
> notebook?
>
> BR
>
>
> > Am 10.11.2015 um 23:06 schrieb Jeff Steinmetz <
> jeffrey.steinm...@gmail.com>:
> >
> > You can load the hadoop dependency directly from the elastic search
> maven repository if you like.
> > I’m using the snapshot builds since it fixes a few issues that I’ve been
> testing with costin @ elastic recently.
> >
> > In your interpreter, you will want to set a new flag of es.nodes and
> list your comma separated elastic search IP addresses or host names.
> >
> > I.e, you can do this (assuming you want to use the native
> elasticsearch-spark approach - which is preferred over hadoop
> reader/writers)
> >
> >
> > %dep
> >
> > z.addRepo("Sonatype snapshot").url("
> https://oss.sonatype.org/content/repositories/snapshots";).snapshot
> > z.load("org.elasticsearch::elasticsearch-spark:2.2.0.BUILD-SNAPSHOT")
> >
> >
> >
> > %spark
> >
> > val query = “{ some ES style query string here }”
> >
> > val RDD = sc.esJsonRDD("evo-session-reader/session", query ) // this
> returns the original json, if you omit query, it will assume match_all
> > val RDD2 = sc.esRDD("evo-session-reader/session", query )  // this
> returns a Map[String,AnyRef]
> >
> >
> >
> >
> >
> >
> >
> > Best
> > Jeff Steinmetz
> >
> >
> > On 11/10/15, 1:45 PM, "SiS" <n...@cht3.com> wrote:
> >
> >> Hi Everybody,
> >>
> >> through the effect that I’m new to Spark and Zeppelin I hope my
> question I have is here in the right place.
> >> I played around with Zeppelin and Spark and tried to load data by
> connection to an elasticsearch cluster.
> >> But to be honest I have no clue how I have to setup zeppelin or the
> notebook to use the elasticsearch/hadoop/spark
> >> library (jar) so I’m able to connect using pyspark.
> >> Do I have to copy the jar somewhere in the zeppelin folders?
> >>
> >> My plan is to transfer an index/type from elasticsearch to datafframes
> in spark.
> >>
> >> Did somebody give me a short explanation for setting this up? Or could
> point me to the right documentation.
> >>
> >> Any help would be appreciated.
> >>
> >> Thanks a lot
> >> Sven
> >
>
>

Reply via email to