Re: [DISCUSS] Multiple catalog support

2018-08-01 Thread Wenchen Fan
For the first question. This is what we already supported. A data source can implement `ReadSupportProvider` (based on my API improvement ) so that it can create `ReadSupport` by reflection. I agree wi

Re: [DISCUSS] Multiple catalog support

2018-07-31 Thread Ryan Blue
Wenchen, I think the misunderstanding is around how the v2 API should work with multiple catalogs. Data sources are read/write implementations that resolve to a single JVM class. When we consider how these implementations should work with multiple table catalogs, I think it is clear that the catal

Re: [DISCUSS] Multiple catalog support

2018-07-31 Thread Wenchen Fan
Here is my interpretation of your proposal, please correct me if something is wrong. End users can read/write a data source with its name and some options. e.g. `df.read.format("xyz").option(...).load`. This is currently the only end-user API for data source v2, and is widely used by Spark users t

Re: [DISCUSS] Multiple catalog support

2018-07-29 Thread Ryan Blue
Wenchen, what I'm suggesting is a bit of both of your proposals. I think that USING should be optional like your first option. USING (or format(...) in the DF side) should configure the source or implementation, while the catalog should be part of the table identifier. They serve two different pur

Re: [DISCUSS] Multiple catalog support

2018-07-27 Thread Wenchen Fan
I think the major issue is, now users have 2 ways to create a specific data source table: 1) use the USING syntax. 2) create the table in the specific catalog. It can be super confusing if users create a cassandra table in hbase data source. Also we can't drop the USING syntax as data source v1 sti

Re: [DISCUSS] Multiple catalog support

2018-07-25 Thread Ryan Blue
Quick update: I've updated my PR to add the table catalog API to implement this proposal. Here's the PR: https://github.com/apache/spark/pull/21306 On Mon, Jul 23, 2018 at 5:01 PM Ryan Blue wrote: > Lately, I’ve been working on implementing the new SQL logical plans. I’m > currently blocked work

[DISCUSS] Multiple catalog support

2018-07-23 Thread Ryan Blue
Lately, I’ve been working on implementing the new SQL logical plans. I’m currently blocked working on the plans that require table metadata operations. For example, CTAS will be implemented as a create table and a write using DSv2 (and a drop table if anything goes wrong). That requires something t