Re: [Spark Core] Why no spark.read.delta / df.write.delta?

2020-10-05 Thread Jungtaek Lim
Sure. My point was that Delta Lake is also one of the 3rd party libraries and there's no way for Apache Spark to do that. There's a Delta Lake's own group and the request is better to be there. On Mon, Oct 5, 2020 at 9:54 PM Enrico Minack wrote: > Though spark.read. refers to "built-in" data

Re: [Spark Core] Why no spark.read.delta / df.write.delta?

2020-10-05 Thread Enrico Minack
Though spark.read. refers to "built-in" data sources, there is nothing that prevents 3rd party libraries to "extend" spark.read in Scala or Python. As users know the Spark-way to read built-in data sources, it feels natural to hook 3rd party data sources under the same scheme, to give users a

Re: [Spark Core] Why no spark.read.delta / df.write.delta?

2020-10-05 Thread Jungtaek Lim
Hi, "spark.read." is a "shorthand" for "built-in" data sources, not for external data sources. spark.read.format() is still an official way to use it. Delta Lake is not included in Apache Spark so that is indeed not possible for Spark to refer to. Starting from Spark 3.0, the concept of

[Spark Core] Why no spark.read.delta / df.write.delta?

2020-10-05 Thread Moser, Michael
Hi there, I'm just wondering if there is any incentive to implement read/write methods in the DataFrameReader/DataFrameWriter for delta similar to e.g. parquet? For example, using PySpark, "spark.read.parquet" is available, but "spark.read.delta" is not (same for write). In my opinion,