Re: Spark to eliminate full-table scan latency

Michael Armbrust Mon, 27 Oct 2014 14:37:21 -0700

You can access cached data in spark through the JDBC server:

http://spark.apache.org/docs/latest/sql-programming-guide.html#running-the-thrift-jdbc-server


On Mon, Oct 27, 2014 at 1:47 PM, Ron Ayoub <ronalday...@live.com> wrote:

> We have a table containing 25 features per item id along with feature
> weights. A correlation matrix can be constructed for every feature pair
> based on co-occurrence. If a user inputs a feature they can find out the
> features that are correlated with a self-join requiring a single full table
> scan. This results in high latency for big data (10 seconds +) due to the
> IO involved in the full table scan. My idea is for this feature the data
> can be loaded into an RDD and transformations and actions can be applied to
> find out per query what are the correlated features.
>
> I'm pretty sure Spark can do this sort of thing. Since I'm new, what I'm
> not sure about is, is Spark appropriate as a server application? For
> instance, the drive application would have to load the RDD and then listen
> for request and return results, perhaps using a socket?  Are there any
> libraries to facilitate this sort of Spark server app? So I understand how
> Spark can be used to grab data, run algorithms, and put results back but is
> it appropriate as the engine of a server app and what are the general
> patterns involved?
>
>

Re: Spark to eliminate full-table scan latency

Reply via email to