Hi Stan,
Thank you for providing the information.
Yes, it's exact 'embedded deployment', but it looks like to be deprecated
because of performance issues. 

Stanislav Lukyanov wrote
> Embedded Mode Deprecation
> Embedded mode implies starting Ignite server nodes within Spark executors
> which can cause unexpected rebalancing or even data loss. Therefore this
> mode is currently deprecated and will be eventually discontinued. Consider
> starting a separate Ignite cluster and using standalone mode to avoid data
> consistency and performance issues.

Sorry for my careless, I didn't notice that information before, it almost
answered my question.
But, how about if I use earlier ignite (1.9-), ensure On-Heap cache and let
Ignite handle the GC?
My case is straight forward streaming process (locality process comes with
hash partitioning, affinity collocation to ignite cache).
Is there any further potential problem? Your suggestion is appreciated.

My concerns:
This way makes deployment very easy, compare old SharedRDD solution, I don't
have to persuade DBA to initial ignite cluster one by one, and POJO classes,
that's also really a bottleneck to collect data from 'cluster' to 'driver'
and 'ignite', node to node local computing is preferable,  because all I
need to do is to ship the library to spark cluster from the driver node.

If spark job stopped, Ignite cluster will be terminated and the data will
lost(I saw this actions from the spark executor logs). If that is the case, 
the 'risk' is acceptable to me, I don't care about data loss issue. My
project is to receive concurrent kafka traffic data using spark streaming, I
need ignite to act a fast k/v dictionary to reduce deduplication computing
(before saving to Hbase), in the meantime ignite cache provide SQL
query/monitor service for curious users.

According to the information above,  do you have any suggestion? 
We have been confused about how to integrate bunch of open source projects
until I found apache spark has a big heart to supply everything.

Your help is appreciated.

Marco


Code:
#ship ignite 1.9 libraries to spark cluster
/spark-shell --master yarn --conf spark.ui.enabled=true --deploy-mode client
--num-executors 2 --jars
"/opt/ignite/assembly/1.9/Ignite-1.9-jar-with-dependencies.jar"


class LazyAgent(isClient: Boolean) extends Serializable  {
    lazy val conf=ic_conf()
    lazy val ignite=org.apache.ignite.Ignition.start(conf)
    lazy val cache=ignite.getOrCreateCache(gen_cache("test"))
    def ic_conf(): org.apache.ignite.configuration.IgniteConfiguration={
        val spi = new org.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi()
        spi.setLocalPortRange(10)
        val ipfiner=new
org.apache.ignite.spi.discovery.tcp.ipfinder.jdbc.TcpDiscoveryJdbcIpFinder()
        ipfiner.setDataSource(JDBCDataSource())
        spi.setIpFinder(ipfiner)
        val cfg = new org.apache.ignite.configuration.IgniteConfiguration()
        cfg.setClientMode(isClient)
        cfg.setDiscoverySpi(spi)
        cfg
    }
    
    def JDBCDataSource(): javax.sql.DataSource={
        val dataSource=new com.mysql.jdbc.jdbc2.optional.MysqlDataSource()
        dataSource.setServerName({hostip})
        ......
        dataSource
    }
 
    def gen_cache(cache_name:String):
org.apache.ignite.configuration.CacheConfiguration[String,String] ={
        val cache_cfg = new
org.apache.ignite.configuration.CacheConfiguration[String,
String](cache_name)
        ......
        cache_cfg
    }
}

//on driver:
val instance= sc.broadcast(new LazyAgent(false))

//on datanode
rdd.foreachPartition(iter => {
   #here on each data node, ignite cluster will be created and only once.
   val cache = instance.value.cache
   iter.foreach(el=>{
       cache.put("k"+el.toString(),"value of ["+el.toString()+"]")
   })
})








--
Sent from: http://apache-ignite-users.70518.x6.nabble.com/

Reply via email to