Re: Subsecond queries possible?

Debasish Das Wed, 01 Jul 2015 09:49:56 -0700

If you take bitmap indices out of sybase then I am guessing spark sql will
be at par with sybase ?


On that note are there plans of integrating indexed rdd ideas to spark sql
to build indices ? Is there a JIRA tracking it ?
On Jun 30, 2015 7:29 PM, "Eric Pederson" <eric...@gmail.com> wrote:

> Hi Debasish:
>
> We have the same dataset running on SybaseIQ and after the caches are warm
> the queries come back in about 300ms.  We're looking at options to relieve
> overutilization and to bring down licensing costs.  I realize that Spark
> may not be the best fit for this use case but I'm interested to see how far
> it can be pushed.
>
> Thanks for your help!
>
>
> -- Eric
>
> On Tue, Jun 30, 2015 at 5:28 PM, Debasish Das <debasish.da...@gmail.com>
> wrote:
>
>> I got good runtime improvement from hive partitioninp, caching the
>> dataset and increasing the cores through repartition...I think for your
>> case generating mysql style indexing will help further..it is not supported
>> in spark sql yet...
>>
>> I know the dataset might be too big for 1 node mysql but do you have a
>> runtime estimate from running the same query on mysql with appropriate
>> column indexing ? That should give us a good baseline number...
>>
>> For my case at least I could not put the data on 1 node mysql as it was
>> big...
>>
>> If you can write the problem in a document view you can use a document
>> store like solr/elastisearch to boost runtime...the reverse indices can get
>> you subsecond latencies...again the schema design matters for that and you
>> might have to let go some of sql expressiveness (like balance in a
>> predefined bucket might be fine but looking for the exact number might be
>> slow)
>>
>
>

Re: Subsecond queries possible?

Reply via email to