RE: Spark on Apache Ingnite?

Boavida, Rodrigo Tue, 12 Jan 2016 06:36:30 -0800

I also had a quick look and agree it’s not very clear. I believe if one reads 
through the clustering logic and the replication settings would get a good idea 
of how it works.
https://apacheignite.readme.io/docs/cluster
I believe it integrates with Hadoop and other file based systems for persisting 
when needed. Not sure about the details on how does it recover.
Also  resource manager such as Mesos can add recoverability for at least 
scenarios where there isn’t any state to recover.

Resilience is a feature and not every use case needs it. For example, I’m 
currently considering Ignite for caching purposes of transient data where we 
have the need to share RDDs between different Spark Contexts where one context 
produces data and the other consumes

From: Koert Kuipers [mailto:ko...@tresata.com]
Sent: 11 January 2016 16:08
To: Boavida, Rodrigo <rodrigo.boav...@aspect.com>
Cc: user@spark.apache.org
Subject: Re: Spark on Apache Ingnite?

where is ignite's resilience/fault-tolerance design documented?
i can not find it. i would generally stay away from it if fault-tolerance is an 
afterthought.

On Mon, Jan 11, 2016 at 10:31 AM, RodrigoB 
<rodrigo.boav...@aspect.com<mailto:rodrigo.boav...@aspect.com>> wrote:
Although I haven't work explicitly with either, they do seem to differ in
design and consequently in usage scenarios.

Ignite is claimed to be a pure in-memory distributed database.
With Ignite, updating existing keys is something that is self-managed
comparing with Tachyon. In Tachyon once a value is created for a given key,
becomes immutable, so you either delete and insert again, or need to
manage/update the tachyon keys yourself.
Also, Tachyon's resilience design is based on the underlying file system
(typically hadoop), which means that if a node goes down, to recover the
lost data, it would need first to have been persisted on the corresponding
file partition.
With Ignite, there is no master dependency like with Tachyon, and my
understanding is that API calls will depend on master's availability in
Tachyon. I believe Ignite has some options for replication which would be
more aligned with the in-memory datastore.

If you are looking for persisting some RDD's output into an in-memory store
and query it outside of Spark, on the paper Ignite sounds like a better
solution.

Since you are asking about Ignite benefits that was the focus of my
response. Tachyon has its own benefits like the community support and the
Spark lineage persistency integration. If you are doing batch based
processing and want to persist fast Spark RDDs, Tachyon is your friend.

Hope this helps.

Tnks,
Rod

--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-on-Apache-Ingnite-tp25884p25933.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: 
user-unsubscr...@spark.apache.org<mailto:user-unsubscr...@spark.apache.org>
For additional commands, e-mail: 
user-h...@spark.apache.org<mailto:user-h...@spark.apache.org>

This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.

RE: Spark on Apache Ingnite?

Reply via email to