Re: [Architecture] [POC] Performance evaluation of Hive vs Shark

2014-12-15 Thread David Morales
Hi there, For sure, the new release does support SparkSQL, so you can use sparkSQL and Stratio Deep together jusy out of the box. About cross-data, it' not itself related to Spark but can use Spark-Deep. It's an interactive SQL like Hive, for example. Regards. 2014-12-12 21:29 GMT+01:00

Re: [Architecture] [POC] Performance evaluation of Hive vs Shark

2014-12-15 Thread Niranda Perera
Hi David, Could you point me to an example where SparkSQL is used in Stratio Deep? Rgds On Mon, Dec 15, 2014 at 2:20 PM, David Morales dmora...@stratio.com wrote: Hi there, For sure, the new release does support SparkSQL, so you can use sparkSQL and Stratio Deep together jusy out of the

Re: [Architecture] [POC] Performance evaluation of Hive vs Shark

2014-12-15 Thread David Morales
Of course, here you have a simple example: //reading from mongoDB val config : ExtractorConfig[Cells] = new ExtractorConfig(); config.setExtractorImplClass(classOf[MongoNativeCellExtractor]); config.putValue(ExtractorConstants.DATABASE, test) config.putValue(ExtractorConstants.COLLECTION,

Re: [Architecture] [POC] Performance evaluation of Hive vs Shark

2014-12-12 Thread Niranda Perera
Hi David, I have been going through the Deep-Spark examples. It looks very promising. On a follow up query, does Deep-spark/ deep-cassandra support SQL like operations on RDDs (like SparkSQL)? Example (from Datastax Cassandra connector demos): object SQLDemo extends DemoApp { val cc = new

Re: [Architecture] [POC] Performance evaluation of Hive vs Shark

2014-12-02 Thread Niranda Perera
Hi David, Sorry to re-initiate this thread. But may I know if you have done any benchmarking on Datastax Spark cassandra connector and Stratio Deep-spark cassandra integration? Would love to take a look at it. I recently checked deep-spark github repo and noticed that there is no activity since

Re: [Architecture] [POC] Performance evaluation of Hive vs Shark

2014-12-02 Thread David Morales
HiĀ” Please, check the develop branch if you want to see a more realistic view of our development path. Last commit was about two hours ago :) Stratio Deep is one of our core modules so there is a core team in Stratio fully devoted to spark + noSQL integration. In these last months, for example,

Re: [Architecture] [POC] Performance evaluation of Hive vs Shark

2014-08-26 Thread David Morales
Yes, it is already included in our benchmarks. It could be a nice idea to share our findings, let me talk about it here. Meanwhile, you can ask us any question by using my mail or this thread, we are glad to help you. Best regards. 2014-08-24 15:49 GMT+02:00 Niranda Perera nira...@wso2.com:

Re: [Architecture] [POC] Performance evaluation of Hive vs Shark

2014-08-21 Thread Niranda Perera
Hi Srinath, Yes, I am working on deploying it on a multi-node cluster with the debs dataset. I will keep architecture@ posted on the progress. Hi David, Thank you very much for the detailed insight you've provided. Few quick questions, 1. Do you have experiences in using storage handlers in

Re: [Architecture] [POC] Performance evaluation of Hive vs Shark

2014-08-20 Thread Niranda Perera
Hi Anjana and Srinath, After the discussion I had with Anjana, I researched more on the continuation of Shark project by Databricks. Here's what I found out, - Shark was built on the Hive codebase and achieved performance improvements by swapping out the physical execution engine part of Hive.

Re: [Architecture] [POC] Performance evaluation of Hive vs Shark

2014-08-20 Thread Maninda Edirisooriya
In the case of discontinuity of Shark project, IMO we should not move to Shark at all. And it seems better to go with Spark SQL as we are already using Spark for CEP. But I am not sure the difference between Spark SQL and the Siddhi queries on the Spark engine. And we have to figure out how Spark

Re: [Architecture] [POC] Performance evaluation of Hive vs Shark

2014-08-20 Thread Niranda Perera
@Maninda, +1 for suggesting Spark SQL. Quote Databricks, Spark SQL provides state-of-the-art SQL performance and maintains compatibility with Shark/Hive. In particular, like Shark, Spark SQL supports all existing Hive data formats, user-defined functions (UDF), and the Hive metastore. [1] But I

Re: [Architecture] [POC] Performance evaluation of Hive vs Shark

2014-08-20 Thread Sriskandarajah Suhothayan
On Wed, Aug 20, 2014 at 1:36 PM, Niranda Perera nira...@wso2.com wrote: @Maninda, +1 for suggesting Spark SQL. Quote Databricks, Spark SQL provides state-of-the-art SQL performance and maintains compatibility with Shark/Hive. In particular, like Shark, Spark SQL supports all existing Hive

Re: [Architecture] [POC] Performance evaluation of Hive vs Shark

2014-08-13 Thread Anjana Fernando
Hi Niranda, Excellent analysis of Hive vs Shark! .. This gives a lot of insight into how both operates in different scenarios. As the next step, we will need to run this in an actual cluster of computers. Since you've used a subset of the dataset of 2014 DEBS challenge, we should use the full

Re: [Architecture] [POC] Performance evaluation of Hive vs Shark

2014-08-13 Thread Anjana Fernando
On Wed, Aug 13, 2014 at 3:51 PM, Sumedha Rubasinghe sume...@wso2.com wrote: After these are done, we should also do a trial run of our own APIM Hive scripts, migrated to Shark. Do we need to migrate?I thought existing Hive scripts can run as it is. First of all we need to create a large