Re: [discuss] Data Source V2 write path

2017-09-25 Thread Wenchen Fan
> I think it is a bad idea to let this problem leak into the new storage API. Well, I think using data source options is a good compromise for this. We can't avoid this problem until catalog federation is done, and this may not happen within Spark 2.3, but we definitely need data source write API

Announcing Spark on Kubernetes release 0.4.0

2017-09-25 Thread Erik Erlandson
The Spark on Kubernetes development community is pleased to announce release 0.4.0 of Apache Spark with native Kubernetes scheduler back-end! The dev community is planning to use this release as the reference for upstreaming native kubernetes capability over the Spark 2.3 release cycle. This

Re: [discuss] Data Source V2 write path

2017-09-25 Thread Ryan Blue
I think it is a bad idea to let this problem leak into the new storage API. By not setting the expectation that metadata for a table will exist, this will needlessly complicate writers just to support the existing problematic design. Why can't we use an in-memory catalog to store the configuration

Re: [discuss] Data Source V2 write path

2017-09-25 Thread Wenchen Fan
Catalog federation is to publish the Spark catalog API(kind of a data source API for metadata), so that Spark is able to read/write metadata from external systems. (SPARK-15777) Currently Spark can only read/write Hive metastore, which means for other systems like Cassandra, we can only

Re: [Spark Core] Custom Catalog. Integration between Apache Ignite and Apache Spark

2017-09-25 Thread Reynold Xin
It's probably just an indication of lack of interest (or at least there isn't a substantial overlap between Ignite users and Spark users). A new catalog implementation is also pretty fundamental to Spark and the bar for that would be pretty high. See my comment in SPARK-17767. Guys - while I

Re: [discuss] Data Source V2 write path

2017-09-25 Thread Ryan Blue
However, without catalog federation, Spark doesn’t have an API to ask an external system(like Cassandra) to create a table. Currently it’s all done by data source write API. Data source implementations are responsible to create or insert a table according to the save mode. What’s catalog

Re: [discuss] Data Source V2 write path

2017-09-25 Thread Wenchen Fan
We still need to support low-level data sources like pure parquet files, which do not have a metastore. BTW I think we should leave the metadata management to the catalog API after catalog federation. Data source API should only care about data. On Mon, Sep 25, 2017 at 11:14 AM, Reynold Xin