Re: [DISCUSS] Breaking the Scala API for Scala 2.12 Support

Aljoscha Krettek Mon, 08 Oct 2018 08:21:27 -0700

Yes, but I think we would pretty much have to do that. I don't think we can 
stop doing 2.11 releases.


> On 8. Oct 2018, at 15:37, Chesnay Schepler <ches...@apache.org> wrote:
> 
> The infrastructure would only be required if we opt for releasing 2.11 and 
> 2.12 builds simultaneously, correct?
> 
> On 08.10.2018 15:04, Aljoscha Krettek wrote:
>> Breaking the API (or not breaking it but requiring explicit types when using 
>> Scala 2.12) and the Maven infrastructure to actually build a 2.12 release.
>> 
>>> On 8. Oct 2018, at 13:00, Chesnay Schepler <ches...@apache.org> wrote:
>>> 
>>> And the remaining parts would only be about breaking the API?
>>> 
>>> On 08.10.2018 12:24, Aljoscha Krettek wrote:
>>>> I have an open PR that does everything we can do for preparing the code 
>>>> base for Scala 2.12 without breaking the API: 
>>>> https://github.com/apache/flink/pull/6784
>>>> 
>>>>> On 8. Oct 2018, at 09:56, Chesnay Schepler <ches...@apache.org> wrote:
>>>>> 
>>>>> I'd rather not maintain 2 master branches. Beyond the maintenance 
>>>>> overhead I'm
>>>>> wondering about the benefit, as the API break still has to happen at some 
>>>>> point.
>>>>> 
>>>>> @Aljoscha how much work for supporting scala 2.12 can be merged without 
>>>>> breaking the API?
>>>>> If this is the only blocker I suggest to make the breaking change in 1.8.
>>>>> 
>>>>> On 05.10.2018 10:31, Till Rohrmann wrote:
>>>>>> Thanks Aljoscha for starting this discussion. The described problem 
>>>>>> brings
>>>>>> us indeed a bit into a pickle. Even with option 1) I think it is somewhat
>>>>>> API breaking because everyone who used lambdas without types needs to add
>>>>>> them now. Consequently, I only see two real options out of the ones 
>>>>>> you've
>>>>>> proposed:
>>>>>> 
>>>>>> 1) Disambiguate the API (either by removing
>>>>>> reduceGroup(GroupReduceFunction) or by renaming it to reduceGroupJ)
>>>>>> 2) Maintain a 2.11 and 2.12 master branch until we phase 2.11 completely 
>>>>>> out
>>>>>> 
>>>>>> Removing the reduceGroup(GroupReduceFunction) in option 1 is a bit
>>>>>> problematic because then all Scala API users who have implemented a
>>>>>> GroupReduceFunction need to convert it into a Scala lambda. Moreover, I
>>>>>> think it will be problematic with RichGroupReduceFunction which you need 
>>>>>> to
>>>>>> get access to the RuntimeContext.
>>>>>> 
>>>>>> Maintaining two master branches puts a lot of burden onto the developers 
>>>>>> to
>>>>>> always keep the two branches in sync. Ideally I would like to avoid this.
>>>>>> 
>>>>>> I also played a little bit around with implicit conversions to add the
>>>>>> lambda methods in Scala 2.11 on demand, but I was not able to get it work
>>>>>> smoothly.
>>>>>> 
>>>>>> I'm cross posting this thread to user as well to get some more user
>>>>>> feedback.
>>>>>> 
>>>>>> Cheers,
>>>>>> Till
>>>>>> 
>>>>>> On Thu, Oct 4, 2018 at 7:36 PM Elias Levy <fearsome.lucid...@gmail.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> The second alternative, with the addition of methods that take functions
>>>>>>> with Scala types, seems the most sensible.  I wonder if there is a need
>>>>>>> then to maintain the *J Java parameter methods, or whether users could 
>>>>>>> just
>>>>>>> access the functionality by converting the Scala DataStreams to Java via
>>>>>>> .javaStream and whatever the equivalent is for DataSets.
>>>>>>> 
>>>>>>> On Thu, Oct 4, 2018 at 8:10 AM Aljoscha Krettek <aljos...@apache.org>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> I'm currently working on
>>>>>>> https://issues.apache.org/jira/browse/FLINK-7811,
>>>>>>>> with the goal of adding support for Scala 2.12. There is a bit of a
>>>>>>> hurdle
>>>>>>>> and I have to explain some context first.
>>>>>>>> 
>>>>>>>> With Scala 2.12, lambdas are implemented using the lambda mechanism of
>>>>>>>> Java 8, i.e. Scala lambdas are now SAMs (Single Abstract Method). This
>>>>>>>> means that the following two method definitions can both take a lambda:
>>>>>>>> 
>>>>>>>> def map[R](mapper: MapFunction[T, R]): DataSet[R]
>>>>>>>> def map[R](fun: T => R): DataSet[R]
>>>>>>>> 
>>>>>>>> The Scala compiler gives precedence to the lambda version when you call
>>>>>>>> map() with a lambda in simple cases, so it works here. You could still
>>>>>>> call
>>>>>>>> map() with a lambda if the lambda version of the method weren't here
>>>>>>>> because they are now considered the same. For Scala 2.11 we need both
>>>>>>>> signatures, though, to allow calling with a lambda and with a
>>>>>>> MapFunction.
>>>>>>>> The problem is with more complicated method signatures, like:
>>>>>>>> 
>>>>>>>> def reduceGroup[R](fun: (scala.Iterator[T], Collector[R]) => Unit):
>>>>>>>> DataSet[R]
>>>>>>>> 
>>>>>>>> def reduceGroup[R](reducer: GroupReduceFunction[T, R]): DataSet[R]
>>>>>>>> 
>>>>>>>> (for reference, GroupReduceFunction is a SAM with void
>>>>>>>> reduce(java.lang.Iterable<T> values, Collector<O> out))
>>>>>>>> 
>>>>>>>> These two signatures are not the same but similar enough for the Scala
>>>>>>>> 2.12 compiler to "get confused". In Scala 2.11, I could call
>>>>>>> reduceGroup()
>>>>>>>> with a lambda that doesn't have parameter type definitions and things
>>>>>>> would
>>>>>>>> be fine. With Scala 2.12 I can't do that because the compiler can't
>>>>>>> figure
>>>>>>>> out which method to call and requires explicit type definitions on the
>>>>>>>> lambda parameters.
>>>>>>>> 
>>>>>>>> I see some solutions for this:
>>>>>>>> 
>>>>>>>> 1. Keep the methods as is, this would force people to always explicitly
>>>>>>>> specify parameter types on their lambdas.
>>>>>>>> 
>>>>>>>> 2. Rename the second method to reduceGroupJ() to signal that it takes a
>>>>>>>> user function that takes Java-style interfaces (the first parameter is
>>>>>>>> java.lang.Iterable while the Scala lambda takes a scala.Iterator). This
>>>>>>>> disambiguates the code, users can use lambdas without specifying 
>>>>>>>> explicit
>>>>>>>> parameter types but breaks the API.
>>>>>>>> 
>>>>>>>> One effect of 2. would be that we can add a reduceGroup() method that
>>>>>>>> takes a api.scala.GroupReduceFunction that takes proper Scala types, 
>>>>>>>> thus
>>>>>>>> it would allow people to implement user functions without having to 
>>>>>>>> cast
>>>>>>>> the various Iterator/Iterable parameters.
>>>>>>>> 
>>>>>>>> Either way, people would have to adapt their code when moving to Scala
>>>>>>>> 2.12 in some way, depending on what style of methods they use.
>>>>>>>> 
>>>>>>>> There is also solution 2.5:
>>>>>>>> 
>>>>>>>> 2.5 Rename the methods only in the Scala 2.12 build of Flink and keep 
>>>>>>>> the
>>>>>>>> old method names for Scala 2.11. This would require some infrastructure
>>>>>>> and
>>>>>>>> I don't yet know how it can be done in a sane way.
>>>>>>>> 
>>>>>>>> What do you think? I personally would be in favour of 2. but it breaks
>>>>>>> the
>>>>>>>> existing API.
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Aljoscha
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>> 
>

Re: [DISCUSS] Breaking the Scala API for Scala 2.12 Support

Reply via email to