Re: Plans for built-in v2 data sources in Spark 4

2023-09-20 Thread Dongjoon Hyun
Instead of that, I believe you are looking for `spark.sql.sources.useV1SourceList` if the question is about "Concretely, is the plan for Spark 4 to continue defaulting to the built-in v1 data sources?". Here is the code.

Re: Plans for built-in v2 data sources in Spark 4

2023-09-20 Thread Will Raschkowski
Thank you for linking that, Dongjoon! I found SPARK-44518 in that list which wants to turn Spark’s Hive integration into a data source. To think out loud: The big gaps between built-in v1 and v2 data sources are support for bucketing and

Re: Plans for built-in v2 data sources in Spark 4

2023-09-20 Thread Will Raschkowski
Thank you for linking that, Dongjoon! I found SPARK-44518 in that list which wants to turn Spark’s Hive integration into a data source. IIUC, that’s very related but I’m curious if I’m thinking about this correctly: Big gaps between built-in

Re: Plans for built-in v2 data sources in Spark 4

2023-09-14 Thread Dongjoon Hyun
Hi, Will. According to the following JIRA, as of now, there is no plan or on-going discussion to switch it. https://issues.apache.org/jira/browse/SPARK-44111 (Prepare Apache Spark 4.0.0) Thanks, Dongjoon. On Wed, Sep 13, 2023 at 9:02 AM Will Raschkowski wrote: > Hey everyone, > > > > I was

Plans for built-in v2 data sources in Spark 4

2023-09-13 Thread Will Raschkowski
Hey everyone, I was wondering what the plans are for Spark's built-in v2 file data sources in Spark 4. Concretely, is the plan for Spark 4 to continue defaulting to the built-in v1 data sources? And if yes, what are the blockers for defaulting to v2? I see, just as example, that writing