Hi, Have you seen this ticket? https://issues.apache.org/jira/browse/SPARK-12449
// maropu On Thu, Jul 28, 2016 at 2:13 AM, Timothy Potter <thelabd...@gmail.com> wrote: > I'm not looking for a one-off solution for a specific query that can > be solved on the client side as you suggest, but rather a generic > solution that can be implemented within the DataSource impl itself > when it knows a sub-query can be pushed down into the engine. In other > words, I'd like to intercept the query planning process to be able to > push-down computation into the engine when it makes sense. > > On Wed, Jul 27, 2016 at 8:04 AM, Marco Colombo > <ing.marco.colo...@gmail.com> wrote: > > Why don't you create a dataframe filtered, map it as temporary table and > > then use it in your query? You can also cache it, of multiple queries on > the > > same inner queries are requested. > > > > > > Il mercoledì 27 luglio 2016, Timothy Potter <thelabd...@gmail.com> ha > > scritto: > >> > >> Take this simple join: > >> > >> SELECT m.title as title, solr.aggCount as aggCount FROM movies m INNER > >> JOIN (SELECT movie_id, COUNT(*) as aggCount FROM ratings WHERE rating > >> >= 4 GROUP BY movie_id ORDER BY aggCount desc LIMIT 10) as solr ON > >> solr.movie_id = m.movie_id ORDER BY aggCount DESC > >> > >> I would like the ability to push the inner sub-query aliased as "solr" > >> down into the data source engine, in this case Solr as it will > >> greatlly reduce the amount of data that has to be transferred from > >> Solr into Spark. I would imagine this issue comes up frequently if the > >> underlying engine is a JDBC data source as well ... > >> > >> Is this possible? Of course, my example is a bit cherry-picked so > >> determining if a sub-query can be pushed down into the data source > >> engine is probably not a trivial task, but I'm wondering if Spark has > >> the hooks to allow me to try ;-) > >> > >> Cheers, > >> Tim > >> > >> --------------------------------------------------------------------- > >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >> > > > > > > -- > > Ing. Marco Colombo > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- --- Takeshi Yamamuro