If you take bitmap indices out of sybase then I am guessing spark sql will be at par with sybase ?
On that note are there plans of integrating indexed rdd ideas to spark sql to build indices ? Is there a JIRA tracking it ? On Jun 30, 2015 7:29 PM, "Eric Pederson" <eric...@gmail.com> wrote: > Hi Debasish: > > We have the same dataset running on SybaseIQ and after the caches are warm > the queries come back in about 300ms. We're looking at options to relieve > overutilization and to bring down licensing costs. I realize that Spark > may not be the best fit for this use case but I'm interested to see how far > it can be pushed. > > Thanks for your help! > > > -- Eric > > On Tue, Jun 30, 2015 at 5:28 PM, Debasish Das <debasish.da...@gmail.com> > wrote: > >> I got good runtime improvement from hive partitioninp, caching the >> dataset and increasing the cores through repartition...I think for your >> case generating mysql style indexing will help further..it is not supported >> in spark sql yet... >> >> I know the dataset might be too big for 1 node mysql but do you have a >> runtime estimate from running the same query on mysql with appropriate >> column indexing ? That should give us a good baseline number... >> >> For my case at least I could not put the data on 1 node mysql as it was >> big... >> >> If you can write the problem in a document view you can use a document >> store like solr/elastisearch to boost runtime...the reverse indices can get >> you subsecond latencies...again the schema design matters for that and you >> might have to let go some of sql expressiveness (like balance in a >> predefined bucket might be fine but looking for the exact number might be >> slow) >> > >