Do you have any remark/correction on my assumptions ? Le 4 oct. 2016 9:54 AM, "vincent gromakowski" <vincent.gromakow...@gmail.com> a écrit :
> Hi, > I know that Ignite has SQL support but: > - ODBC driver doesn't seem to provide HTTP(S) support, which is easier to > integrate on corporate networks with rules, firewalls, proxies > - The SQL engine doesn't seem to scale like Spark SQL would. For instance, > Spark won't generate OOM is dataset (source or result) doesn't fit in > memory. From Ignite side, it's not clear... > - Spark thrift can manage multi tenancy: different users can connect to > the same SQL engine and share cache. In Ignite it's one cache per user, so > a big waste of RAM. > > What I want to achieve is : > - use Cassandra for data store as it provides idempotence (HDFS/hive > doesn't), resulting in exactly once semantic without any duplicates. > - use Spark SQL thriftserver in multi tenancy for large scale adhoc > analytics queries (> TB) from an ODBC driver through HTTP(S) > - accelerate Cassandra reads when the data modeling of the Cassandra table > doesn't fit the queries. Queries would be OLAP style: target multiple C* > partitions, groupby or filters on lots of dimensions that aren't > necessarely in the C* table key. > > Thanks for your advises > > > 2016-10-04 6:51 GMT+02:00 Jörn Franke <jornfra...@gmail.com>: > >> I am not sure that this will be performant. What do you want to achieve >> here? Fast lookups? Then the Cassandra Ignite store might be the right >> solution. If you want to do more analytic style of queries then you can put >> the data on HDFS/Hive and use the Ignite HDFS cache to cache certain >> partitions/tables in Hive in-memory. If you want to go to iterative machine >> learning algorithms you can go for Spark on top of this. You can use then >> also Ignite cache for Spark RDDs. >> >> On 4 Oct 2016, at 02:24, Alexey Kuznetsov <akuznet...@gridgain.com> >> wrote: >> >> Hi, Vincent! >> >> Ignite also has SQL support (also scalable), I think it will be much >> faster to query directly from Ignite than query from Spark. >> Also please mind, that before executing queries you should load all >> needed data to cache. >> To load data from Cassandra to Ignite you may use Cassandra store [1]. >> >> [1] https://apacheignite.readme.io/docs/ignite-with-apache-cassandra >> >> On Tue, Oct 4, 2016 at 4:19 AM, vincent gromakowski < >> vincent.gromakow...@gmail.com> wrote: >> >>> Hi, >>> I am evaluating the possibility to use Spark SQL (and its scalability) >>> over an Ignite cache with Cassandra persistent store to increase read >>> workloads like OLAP style analytics. >>> Is there any way to configure Spark thriftserver to load an external >>> table in Ignite like we can do in Cassandra ? >>> Here is an example of config for spark backed by cassandra >>> >>> CREATE EXTERNAL TABLE MyHiveTable >>> ( id int, data string ) >>> STORED BY 'org.apache.hadoop.hive.cassandra.cql.CqlStorageHandler' >>> >>> TBLPROPERTIES ("cassandra.host" = "x.x.x.x", "cassandra.ks.name" >>> = "test" , >>> "cassandra.cf.name" = "mytable" , >>> "cassandra.ks.repfactor" = "1" , >>> "cassandra.ks.strategy" = >>> "org.apache.cassandra.locator.SimpleStrategy" ); >>> >>> >> >> >> -- >> Alexey Kuznetsov >> >> >