Hi, I am trying to understand the state of datasource v2, and I'm a bit lost. On one hand, it is supposed to be more flexible approach, as described for example here:
https://www.slideshare.net/databricks/apache-spark-data-source -v2-with-wenchen-fan-and-gengliang-wang On another hand, it appears both Parquet and ORC file readers are still not using v2 interface. There's an umbrella issue to address that: https://issues.apache.org/jira/browse/SPARK-23507 but it does not have any sub-issues to address Parquet and the issue about ORC: https://issues.apache.org/jira/browse/SPARK-23817 includes this text: "Not supported( due to limitation of data source V2): (1) Read multiple file path (2) Read bucketed file.". Is there some up-to-date information whether datasource v2 will indeed become to primary datasource, whether parquet reader will be converted to V2, and whether these limitations above will be fixed. Thanks in advance, -- Vladimir Prus http://vladimirprus.com