Please consider using NoSQL engine such as hbase. Cheers
> On Nov 9, 2015, at 3:03 PM, Andrés Ivaldi <iaiva...@gmail.com> wrote: > > Hi, > I'm also considering something similar, Spark plain is too slow for my case, > a possible solution is use Spark as Multiple Source connector and basic > transformation layer, then persist the information (actually is a RDBM), > after that with our engine we build a kind of Cube queries, and the result is > processed again by Spark adding Machine Learning. > Our Missing part is reemplace the RDBM with something more suitable and > scalable than RDBM, dont care about pre processing information if after pre > processing the queries are fast. > > Regards > >> On Mon, Nov 9, 2015 at 3:56 PM, tsh <t...@timshenkao.su> wrote: >> Hi, >> >> I'm in the same position right now: we are going to implement something like >> OLAP BI + Machine Learning explorations on the same cluster. >> Well, the question is quite ambivalent: from one hand, we have terabytes >> of versatile data and the necessity to make something like cubes (Hive and >> Hive on HBase are unsatisfactory). From the other, our users get accustomed >> to Tableau + Vertica. >> So, right now I consider the following choices: >> 1) Platfora (not free, I don't know price right now) + Spark >> 2) AtScale + Tableau(not free, I don't know price right now) + Spark >> 3) Apache Kylin (young project?) + Spark on YARN + Kafka + Flume + some >> storage >> 4) Apache Phoenix + Apache HBase + Mondrian + Spark on YARN + Kafka + Flume >> (has somebody use it in production?) >> 5) Spark + Tableau (cubes?) >> >> For myself, I decided not to dive into Mesos. Cassandra is hardly >> configurable, you'll have to dedicate special employee to support it. >> >> I'll be glad to hear other ideas & propositions as we are at the beginning >> of the process too. >> >> Sincerely yours, Tim Shenkao >> >> >>> On 11/09/2015 09:46 AM, fightf...@163.com wrote: >>> Hi, >>> >>> Thanks for suggesting. Actually we are now evaluating and stressing the >>> spark sql on cassandra, while >>> >>> trying to define business models. FWIW, the solution mentioned here is >>> different from traditional OLAP >>> >>> cube engine, right ? So we are hesitating on the common sense or direction >>> choice of olap architecture. >>> >>> And we are happy to hear more use case from this community. >>> >>> Best, >>> Sun. >>> >>> fightf...@163.com >>> >>> From: Jörn Franke >>> Date: 2015-11-09 14:40 >>> To: fightf...@163.com >>> CC: user; dev >>> Subject: Re: OLAP query using spark dataframe with cassandra >>> >>> Is there any distributor supporting these software components in >>> combination? If no and your core business is not software then you may want >>> to look for something else, because it might not make sense to build up >>> internal know-how in all of these areas. >>> >>> In any case - it depends all highly on your data and queries. You will have >>> to do your own experiments. >>> >>> On 09 Nov 2015, at 07:02, "fightf...@163.com" <fightf...@163.com> wrote: >>> >>>> Hi, community >>>> >>>> We are specially interested about this featural integration according to >>>> some slides from [1]. The SMACK(Spark+Mesos+Akka+Cassandra+Kafka) >>>> >>>> seems good implementation for lambda architecure in the open-source world, >>>> especially non-hadoop based cluster environment. As we can see, >>>> >>>> the advantages obviously consist of : >>>> >>>> 1 the feasibility and scalability of spark datafram api, which can also >>>> make a perfect complement for Apache Cassandra native cql feature. >>>> >>>> 2 both streaming and batch process availability using the ALL-STACK thing, >>>> cool. >>>> >>>> 3 we can both achieve compacity and usability for spark with cassandra, >>>> including seemlessly integrating with job scheduling and resource >>>> management. >>>> >>>> Only one concern goes to the OLAP query performance issue, which mainly >>>> caused by frequent aggregation work between daily increased large tables, >>>> for >>>> >>>> both spark sql and cassandra. I can see that the [1] use case facilitates >>>> FiloDB to achieve columnar storage and query performance, but we had >>>> nothing more >>>> >>>> knowledge. >>>> >>>> Question is : Any guy had such use case for now, especially using in your >>>> production environment ? Would be interested in your architeture for >>>> designing this >>>> >>>> OLAP engine using spark + cassandra. What do you think the comparison >>>> between the scenario with traditional OLAP cube design? Like Apache Kylin >>>> or >>>> >>>> pentaho mondrian ? >>>> >>>> Best Regards, >>>> >>>> Sun. >>>> >>>> >>>> [1] >>>> http://www.slideshare.net/planetcassandra/cassandra-summit-2014-interactive-olap-queries-using-apache-cassandra-and-spark >>>> >>>> fightf...@163.com > > > > -- > Ing. Ivaldi Andres