Hi Vincent, Can you please explain what do you mean by HTTP(S) support for the ODBC?
I'm not quite sure I get it. Best Regards, Igor On Thu, Oct 6, 2016 at 9:59 AM, vincent gromakowski < vincent.gromakow...@gmail.com> wrote: > Thanks > > Starting the thriftserver with igniterdd tables doesn't seem very hard. > Implementing a security layer over ignite cache may be harder as I need to: > - get username from thriftserver > - intercept each request and check permissions > Maybe spark will also be able to handle permissions... > > I will keep you informed > > Le 6 oct. 2016 00:12, "Denis Magda" <dma...@gridgain.com> a écrit : > >> Vincent, >> >> Please see below >> >> On Oct 5, 2016, at 4:31 AM, vincent gromakowski < >> vincent.gromakow...@gmail.com> wrote: >> >> Hi >> thanks for your explanations. Please find inline more questions >> >> Vincent >> >> 2016-10-05 3:33 GMT+02:00 Denis Magda <dma...@gridgain.com>: >> >>> Hi Vincent, >>> >>> See my answers inline >>> >>> On Oct 4, 2016, at 12:54 AM, vincent gromakowski < >>> vincent.gromakow...@gmail.com> wrote: >>> >>> Hi, >>> I know that Ignite has SQL support but: >>> - ODBC driver doesn't seem to provide HTTP(S) support, which is easier >>> to integrate on corporate networks with rules, firewalls, proxies >>> >>> >>> *Igor Sapego*, what URIs are supported presently? >>> >>> - The SQL engine doesn't seem to scale like Spark SQL would. For >>> instance, Spark won't generate OOM is dataset (source or result) doesn't >>> fit in memory. From Ignite side, it's not clear… >>> >>> >>> OOM is not related to scalability topic at all. This is about >>> application’s logic. >>> >>> Ignite SQL engine perfectly scales out along with your cluster. >>> Moreover, Ignite supports indexes which allows you to get O(logN) running >>> time complexity for your SQL queries while in case of Spark you will face >>> with full-scans (O(N)) all the time. >>> >>> However, to benefit from Ignite SQL queries you have to put all the data >>> in-memory. Ignite doesn’t go to a CacheStore (Cassandra, relational >>> database, MongoDB, etc) while a SQL query is executed and won’t preload >>> anything from an underlying CacheStore. Automatic preloading works for >>> key-value queries like cache.get(key). >>> >> >> >> This is an issue because I will potentially have to query TB of data. If >> I use Spark thriftserver backed by IgniteRDD, does it solve this point and >> can I get automatic preloading from C* ? >> >> >> IgniteRDD will load missing tuples (key-value) pair from Cassandra >> because essentially IgniteRDD is an IgniteCache and Cassandra is a >> CacheStore. The only thing that is left to check is whether Spark >> triftserver can work with IgniteRDDs. Hope you will be able figure out this >> and share your feedback with us. >> >> >> >>> - Spark thrift can manage multi tenancy: different users can connect to >>> the same SQL engine and share cache. In Ignite it's one cache per user, so >>> a big waste of RAM. >>> >>> >>> Everyone can connect to an Ignite cluster and work with the same set of >>> distributed caches. I’m not sure why you need to create caches with the >>> same content for every user. >>> >> >> It's a security issue, Ignite cache doesn't provide multiple user account >> per cache. I am thinking of using Spark to authenticate multiple users and >> then Spark use a shared account on Ignite cache >> >> >> Basically, Ignite provides basic security interfaces and some >> implementations which you can rely on by building your secure solution. >> This article can be useful for your case >> http://smartkey.co.uk/development/securing-an-apache-ignite-cluster/ >> >> — >> Denis >> >> >>> If you need a real multi-tenancy support where cacheA is allowed to be >>> accessed by a group of users A only and cacheB by users from group B then >>> you can take a look at GridGain which is built on top of Ignite >>> https://gridgain.readme.io/docs/multi-tenancy >>> >>> >>> >> OK but I am evaluating open source only solutions (kylin, druid, >> alluxio...), it's a constraint from my hierarchy >> >>> >>> What I want to achieve is : >>> - use Cassandra for data store as it provides idempotence (HDFS/hive >>> doesn't), resulting in exactly once semantic without any duplicates. >>> - use Spark SQL thriftserver in multi tenancy for large scale adhoc >>> analytics queries (> TB) from an ODBC driver through HTTP(S) >>> - accelerate Cassandra reads when the data modeling of the Cassandra >>> table doesn't fit the queries. Queries would be OLAP style: target multiple >>> C* partitions, groupby or filters on lots of dimensions that aren't >>> necessarely in the C* table key. >>> >>> >>> As it was mentioned Ignite uses Cassandra as a CacheStore. You should >>> keep this in mind. Before trying to assemble all the chain I would >>> recommend you trying to connect Spark SQL thrift server directly to Ignite >>> and work with its shared RDDs [1]. A shared RDD (basically Ignite cache) >>> can be backed by Cassandra. Probably this chain will work for you but I >>> can’t give more precise guidance on this. >>> >>> >> I will try to make it works and give you feedback >> >> >> >>> [1] https://apacheignite-fs.readme.io/docs/ignite-for-spark >>> >>> — >>> Denis >>> >>> Thanks for your advises >>> >>> >>> 2016-10-04 6:51 GMT+02:00 Jörn Franke <jornfra...@gmail.com>: >>> >>>> I am not sure that this will be performant. What do you want to achieve >>>> here? Fast lookups? Then the Cassandra Ignite store might be the right >>>> solution. If you want to do more analytic style of queries then you can put >>>> the data on HDFS/Hive and use the Ignite HDFS cache to cache certain >>>> partitions/tables in Hive in-memory. If you want to go to iterative machine >>>> learning algorithms you can go for Spark on top of this. You can use then >>>> also Ignite cache for Spark RDDs. >>>> >>>> On 4 Oct 2016, at 02:24, Alexey Kuznetsov <akuznet...@gridgain.com> >>>> wrote: >>>> >>>> Hi, Vincent! >>>> >>>> Ignite also has SQL support (also scalable), I think it will be much >>>> faster to query directly from Ignite than query from Spark. >>>> Also please mind, that before executing queries you should load all >>>> needed data to cache. >>>> To load data from Cassandra to Ignite you may use Cassandra store [1]. >>>> >>>> [1] https://apacheignite.readme.io/docs/ignite-with-apache-cassandra >>>> >>>> On Tue, Oct 4, 2016 at 4:19 AM, vincent gromakowski <vincent.gromakows >>>> k...@gmail.com> wrote: >>>> >>>>> Hi, >>>>> I am evaluating the possibility to use Spark SQL (and its scalability) >>>>> over an Ignite cache with Cassandra persistent store to increase read >>>>> workloads like OLAP style analytics. >>>>> Is there any way to configure Spark thriftserver to load an external >>>>> table in Ignite like we can do in Cassandra ? >>>>> Here is an example of config for spark backed by cassandra >>>>> >>>>> CREATE EXTERNAL TABLE MyHiveTable >>>>> ( id int, data string ) >>>>> STORED BY 'org.apache.hadoop.hive.cassan >>>>> dra.cql.CqlStorageHandler' >>>>> TBLPROPERTIES ("cassandra.host" = "x.x.x.x", " >>>>> cassandra.ks.name" = "test" , >>>>> "cassandra.cf.name" = "mytable" , >>>>> "cassandra.ks.repfactor" = "1" , >>>>> "cassandra.ks.strategy" = >>>>> "org.apache.cassandra.locator.SimpleStrategy" ); >>>>> >>>>> >>>> >>>> >>>> -- >>>> Alexey Kuznetsov >>>> >>>> >>