Re: Basic Spark integration question

Denis Magda Wed, 10 Feb 2016 23:05:33 -0800

Hi,

Please see inline


On 2/11/2016 8:48 AM, thesrc2016 wrote:

Hi, I'm a new to Ignite and just trying to get my head round how exactly it
can integrate with Spark.
I have been looking through the overview and the diagrams for IgniteRDD but
still things are a little unclear to me.

I guess my query comes down to - can Ignite RDD simply be configured for use
by Spark as its internal RDD implementation, given IgniteRDD implements the
RDD abstraction?  The examples given appear to require explicit coding in
order to be able to make use of IgniteRDD...

Ignite RDD is a special implementation of Spark RDD abstraction thatallows to keep results of Spark jobs in memory and reuse the results byanother Spark jobs.Underneath Ignite RDD is based on Ignite Cache [1] thus you have toobtain a reference to Ignite RDD in a special way [2]


val  igniteContext  =  new  IgniteContext[Integer,Integer](sparkContext,
    "examples/config/example-cache.xml")


val  cacheRdd  =  igniteContext.fromCache("partitioned")


Once obtained you can work with this RDD using basic RDD API.

In my use case I want to access Spark capabilities through its SparkR API,
and I want to accelerate processing with Spark's DataFrame SQL context
available through this API so that it can use Ignite's in-memory indexing
and other in-memory capabilities.

Here you should use IgniteRDD's 'sql' and 'objectSql' methods [3]

val  result  =  cacheRdd.sql(
  "select _val from Integer where val > ? and val < ?",10,100)


Indexing is configured using Ignite's CacheConfiguration [3]

  I'd also like to persist the loaded RDD in
memory between different Spark Application sessions in order to speed up
start-up and also share pre-loaded in-memory RDDs between different R
applications. Is any of this possible in relation to how Ignite is
implemented or intended to operate?

As I mentioned above Ignite RDD is based on Ignite Cache that willdistribute and persist data across available cluster nodes.


Does it make sense to you?

Regards,
Denis

Thanks!


[1] https://apacheignite.readme.io/docs/data-grid

[2]https://apacheignite-fs.readme.io/docs/ignitecontext-igniterdd#ignitecontext[3]https://apacheignite-fs.readme.io/docs/ignitecontext-igniterdd#section-running-sql-queries-against-ignite-cache[4]https://apacheignite.readme.io/docs/sql-queries#configuring-sql-indexes-using-queryentity


--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Basic-Spark-integration-question-tp2944.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Basic Spark integration question

Reply via email to