Sure. My point was that Delta Lake is also one of the 3rd party libraries and there's no way for Apache Spark to do that. There's a Delta Lake's own group and the request is better to be there.
On Mon, Oct 5, 2020 at 9:54 PM Enrico Minack <m...@enrico.minack.dev> wrote: > Though spark.read.<format> refers to "built-in" data sources, there is > nothing that prevents 3rd party libraries to "extend" spark.read in Scala > or Python. As users know the Spark-way to read built-in data sources, it > feels natural to hook 3rd party data sources under the same scheme, to give > users a holistic and integrated feel. > > One Scala example ( > https://github.com/G-Research/spark-dgraph-connector#spark-dgraph-connector > ): > > import uk.co.gresearch.spark.dgraph.connector._val triples = > spark.read.dgraph.triples("localhost:9080") > > and in Python: > > from gresearch.spark.dgraph.connector import *triples = > spark.read.dgraph.triples("localhost:9080") > > I agree that 3rd parties should also support the official > spark.read.format() and the new catalog approaches. > > Enrico > > Am 05.10.20 um 14:03 schrieb Jungtaek Lim: > > Hi, > > "spark.read.<format>" is a "shorthand" for "built-in" data sources, not > for external data sources. spark.read.format() is still an official way to > use it. Delta Lake is not included in Apache Spark so that is indeed not > possible for Spark to refer to. > > Starting from Spark 3.0, the concept of "catalog" is introduced, which you > can simply refer to the table from catalog (if the external data source > provides catalog implementation) and no need to specify the format > explicitly (as catalog would know about it). > > This session explains the catalog and how Cassandra connector leverages > it. I see some external data sources starting to support catalog, and in > Spark itself there's some effort to support catalog for JDBC. > > https://databricks.com/fr/session_na20/datasource-v2-and-cassandra-a-whole-new-world > > Hope this helps. > > Thanks, > Jungtaek Lim (HeartSaVioR) > > > On Mon, Oct 5, 2020 at 8:53 PM Moser, Michael < > michael.mo...@siemens-healthineers.com> wrote: > >> Hi there, >> >> >> >> I’m just wondering if there is any incentive to implement read/write >> methods in the DataFrameReader/DataFrameWriter for delta similar to e.g. >> parquet? >> >> >> >> For example, using PySpark, “spark.read.parquet” is available, but >> “spark.read.delta” is not (same for write). >> >> In my opinion, “spark.read.delta” feels more clean and pythonic compared >> to “spark.read.format(‘delta’).load()”, especially if more options are >> called, like “mode”. >> >> >> >> Can anyone explain the reasoning behind this, is this due to the Java >> nature of Spark? >> >> From a pythonic point of view, I could also imagine a single read/write >> method, with the format as an arg and kwargs related to the different file >> format options. >> >> >> >> Best, >> >> Michael >> >> >> >> >> >