I dint mean that. When you try the above approach only one client will have
access to the cached data.

But when you expose your data through a thrift server the case is quite
different.

In the case of thrift server all the request goes to the thrift server and
spark will be able to take the advantage of caching.

That is Thrift server be your sole client to the spark cluster.

check this link
http://spark.apache.org/docs/1.1.0/sql-programming-guide.html#running-the-thrift-jdbc-server

Your applications can connect to your spark cluster through jdbc driver.It
works similar to your hive thrift server.

Thanks,
Vishnu

On Wed, Feb 11, 2015 at 10:31 PM, Ashish Mukherjee <
ashish.mukher...@gmail.com> wrote:

> Thanks for your reply, Vishnu.
>
> I assume you are suggesting I build Hive tables and cache them in memory
> and query on top of that for fast, real-time querying.
>
> Perhaps, I should write a generic piece of code like this and submit this
> as a Spark job with the SQL clause as an argument based on user selections
> on the Web interface -
>
> String sqlClause = args[0];
> ...
> JavaHiveContext sqlContext = new 
> org.apache.spark.sql.hive.api.java.HiveContext(sc);// Queries are expressed 
> in HiveQL.Row[] results = sqlContext.sql(sqlClause).collect();
>
>
> Is my understanding right?
>
> Regards,
> Ashish
>
> On Wed, Feb 11, 2015 at 4:42 PM, VISHNU SUBRAMANIAN <
> johnfedrickena...@gmail.com> wrote:
>
>> Hi Ashish,
>>
>> In order to answer your question , I assume that you are planning to
>> process data and cache them in the memory.If you are using a thrift server
>> that comes with Spark then you can query on top of it. And multiple
>> applications can use the cached data as internally all the requests go to
>> thrift server.
>>
>> Spark exposes hive query language and allows you access its data through
>> spark .So you can consider using HiveQL for querying .
>>
>> Thanks,
>> Vishnu
>>
>> On Wed, Feb 11, 2015 at 4:12 PM, Ashish Mukherjee <
>> ashish.mukher...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am planning to use Spark for a Web-based adhoc reporting tool on
>>> massive date-sets on S3. Real-time queries with filters, aggregations and
>>> joins could be constructed from UI selections.
>>>
>>> Online documentation seems to suggest that SharkQL is deprecated and
>>> users should move away from it.  I understand Hive is generally not used
>>> for real-time querying and for Spark SQL to work with other data stores,
>>> tables need to be registered explicitly in code. Also, the This would not
>>> be suitable for a dynamic query construction scenario.
>>>
>>> For a real-time , dynamic querying scenario like mine what is the proper
>>> tool to be used with Spark SQL?
>>>
>>> Regards,
>>> Ashish
>>>
>>
>>
>

Reply via email to