Hi Oleg,

Connectors don't deal with HA, they rely on Spark for that, so neither the Datastax connector, Stratio Deep nor Calliope have anything to do with Spark's HA. You should have previously configured Spark so that it meets your high availability needs. Furthermore, as I mentioned in a previous answer, Spark can be configured to have high availability without the use of Mesos, you have more information in "https://spark.apache.org/docs/latest/spark-standalone.html#high-availability"; <https://spark.apache.org/docs/latest/spark-standalone.html#high-availability>. The three of them have similar features so all of them seem good choices. One of the highlights of Stratio Deep is that it's able to connect with multiple databases, not just Cassandra (currently with Cassandra and MongoDB, more on the roadmap). Also take into account that Stratio Deep integration with Cassandra was developed from the ground up making no use of Hadoop at all.

On the other hand, Spark does in-memory computation but this doesn't mean it's not able to process data that doesn't fit in memory. It will use disk if told so, and quoting the Spark oficial faq, "Spark can either spill it to disk or recompute the partitions that don't fit in RAM each time they are requested. By default, it uses recomputation, but you can set a dataset's storage level to MEMORY_AND_DISK to avoid this."

El 11/09/14 a las #4, Oleg Ruchovets escribió:
Ok.
DataStax , Startio are required mesos, hadoop yarn other third party to get spark cluster HA.

What in case of calliope?
Is it sufficient to have cassandra + calliope + spark to be able process aggregations? In my case we have quite a lot of data so doing aggregation only in memory - impossible.

Does calliope support not in memory mode for spark?

Thanks
Oleg.

On Thu, Sep 11, 2014 at 9:23 PM, abhinav chowdary <abhinav.chowd...@gmail.com <mailto:abhinav.chowd...@gmail.com>> wrote:

    Adding to conversation...

    there are 3 great open source options available

    1. Calliope http://tuplejump.github.io/calliope/
        This is the first library that was out some time late last
    year (as i can recall) and I have been using this for a while,
    mostly very stable, uses Hadoop i/o in Cassandra (note that it
    doesn't require hadoop)

    2. Datastax spark cassandra connector
    https://github.com/datastax/spark-cassandra-connector: Main
    difference is this uses cql3, again a great library but has few
    issues, also is very actively developed by far and still uses
    thrift for minor stuff but all heavy lifting in cql3

    3. Startio Deep https://github.com/Stratio/stratio-deep: Has lot
    more to offer if you use all startio stack, Deep is for Spark,
    Statio Streaming is built on top of spark streaming, Stratio meta
    is something similar to sharkor sparksql and finally stratio
    Cassandra which is a fork of Cassandra with advanced Lucene based
    indexing



Reply via email to