[jira] [Resolved] (SPARK-11261) Provide a more flexible alternative to Jdbc RDD

Sean Owen (JIRA) Fri, 23 Oct 2015 07:20:25 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-11261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sean Owen resolved SPARK-11261.
-------------------------------
    Resolution: Won't Fix

> Provide a more flexible alternative to Jdbc RDD
> -----------------------------------------------
>
>                 Key: SPARK-11261
>                 URL: https://issues.apache.org/jira/browse/SPARK-11261
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>            Reporter: Richard Marscher
>
> The existing JdbcRDD only covers a limited number of use cases by requiring 
> the semantics of your query to operate on upper and lower bound predicates 
> like: "select title, author from books where ? <= id and id <= ?"
> However, there are many use cases that cannot use such a method and/or are 
> much more inefficient doing so.
> For example, we have a MySQL table partitioned on a partition key. We don't 
> have range values to lookup but rather want to get all entries matching a 
> predicate and have Spark run 1 query in a partition against each logical 
> partition of our MySQL table. For example: "select * from devices where 
> partition_id = ? and app_id = 'abcd'".
> Another use case, looking up against a distinct set of identifiers that don't 
> fall within an ordering. "select * from users where user_id in 
> (?,?,?,?,?,?,?)". The number of identifiers may be quite large and/or dynamic.
> Solution:
> Instead of addressing each use case differently with new RDD types, provide 
> an alternate, general RDD that gives the user direct control over how the 
> query is partitioned in Spark and filling in the placeholders.
> The user should be able to control which placeholder values are available on 
> each partition of the RDD and also how they are inserted into the 
> PreparedStatement. Ideally it can support dynamic placeholder values like 
> inserting a set of values for an IN clause or similar.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-11261) Provide a more flexible alternative to Jdbc RDD

Reply via email to