Re: Breaking API changes in Spark 3.0

2020-02-21 Thread Holden Karau
So my view of how common & stable API removal should go (in general I want to be clear exceptions can and do make sense) 1) Deprecate API 2) Release replacement API 3) Provide migration guidance (ideally in deprecated annotation, but possible in release notes or elsewhere) 4) Remove old API I thin

Re: Breaking API changes in Spark 3.0

2020-02-19 Thread Jungtaek Lim
I think I was too rushed to read and focused on the first sentence of Karen's input. Sorry about that. As I said I'm not sure I can agree with the point of deprecation and breaking changes of APIs, the thread has another topic which seems to be a good input - practice on new API proposal. I feel i

Re: Breaking API changes in Spark 3.0

2020-02-19 Thread Jungtaek Lim
Apache Spark 2.0 was released in July 2016. Assuming the project has been trying the best to follow the semantic versioning, it is "more than three years" to wait for the breaking changes. What the community misses to address necessary breaking changes would be going to be technical debts for anoth

Re: Breaking API changes in Spark 3.0

2020-02-19 Thread Dongjoon Hyun
Sure. I understand the background of the following requests. So, it's a good time to decide the criteria in order to start discussion. 1. "to provide a reasonable migration path we’d want the replacement of the deprecated API to also exist in 2.4" 2. "We need to discuss the APIs case by ca

Re: Breaking API changes in Spark 3.0

2020-02-19 Thread Xiao Li
Like https://github.com/apache/spark/pull/23131, we added back unionAll. We might need to double check whether we removed some widely used APIs in this release before RC. If the maintenance costs are small, keeping some deprecated APIs look reasonable to me. This can help the adoption of Spark 3.0

Re: Breaking API changes in Spark 3.0

2020-02-19 Thread Holden Karau
So my understanding would be that to provide a reasonable migration path we’d want the replacement of the deprecated API to also exist in 2.4 this way libraries and programs can dual target during the migration process. Now that isn’t always going to be doable, but certainly worth looking at the s

Re: Breaking API changes in Spark 3.0

2020-02-19 Thread Dongjoon Hyun
Hi, Karen. Are you saying that Spark 3 has to have all deprecated 2.x APIs? Could you tell us what is your criteria for `unnecessarily` or `necessarily`? > the migration process from Spark 2 to Spark 3 unnecessarily painful. Bests, Dongjoon. On Tue, Feb 18, 2020 at 4:55 PM Karen Feng wrote:

Breaking API changes in Spark 3.0

2020-02-18 Thread Karen Feng
Hi all, I am concerned that the API-breaking changes in SPARK-25908 (as well as SPARK-16775, and potentially others) will make the migration process from Spark 2 to Spark 3 unnecessarily painful. For example, the removal of SQLContext.getOrCreate will break a large number of libraries currently bu