Hey Till,

You've got a good point here. Removing some of the methods would cause
breaking the stability guarantees. I do understand if we decide not to
remove them for that reason, let me explain though why I am thinking it
might make sense to remove them already. First of all I am a bit afraid
it might take a long time before we arrive at the 2.0 version. We have
not ever discussed that in the community. At the same time a lot of the
methods already don't work or are buggy, and we do not fix them any more.

Methods which removing would not break the Public guarantees:

  * ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
  * RuntimeContext#getAllAccumulators (deprecated in 0.10)
  * ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
  * 
StreamExecutionEnvironment#setNumberOfExecutionRetries/getNumberOfExecutionRetries
    (not the equivalent in the ExecutionConfig)
  * StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
    (deprecated in 1.5)

Methods which removing would break the Public guarantees:

which have no effect:

  * ExecutionConfig#disable/enableSysoutLogging (deprecated in 1.10)
  * ExecutionConfig#set/isFailTaskOnCheckpointError (deprecated in 1.9)

which are buggy or discouraged and thus we do not support fixing them:

  * DataStream#split (deprecated in 1.8)
  * DataStream#fold and all related classes and methods such as
    FoldFunction, FoldingState, FoldingStateDescriptor ... (deprecated
    in 1.3/1.4)

The methods like:

  * 
StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...),

  * methods in (Connected)DataStream that specify keys as either indices
    or field names
  * ExecutionConfig#setNumberOfExecutionRetries/getNumberOfExecutionRetries

should be working just fine and I feel the least eager to remove those.

I'd suggest I will open PRs for removing the methods that will not cause
breakage of the Public guarantees as the general feedback was rather
positive. For the rest I do understand the resentment to do so and will
not do it in the 1.x branch. Still I think it is valuable to have the
discussion.

Best,

Dawid


On 18/08/2020 09:26, Till Rohrmann wrote:
> Having looked at the proposed set of methods to remove I've noticed
> that some of them are actually annotated with @Public. According to
> our stability guarantees, only major releases (1.0, 2.0, etc.) can
> break APIs with this annotation. Hence, I believe that we cannot
> simply remove them unless the community decides to change the
> stability guarantees we give or by making the next release a major
> release (Flink 2.0).
>
> Cheers,
> Till
>
> On Tue, Aug 18, 2020 at 5:57 AM Yun Gao <yungao...@aliyun.com
> <mailto:yungao...@aliyun.com>> wrote:
>
>     +1 for removing the methods that are deprecated for a while & have
>     alternative methods.
>
>     One specific thing is that if we remove the DataStream#split, do
>     we consider enabling side-output in more operators in the future ?
>     Currently it should be only available in ProcessFunctions, but not
>     available to other commonly used UDF like Source or AsyncFunction[1].
>
>     One temporary solution occurs to me is to add a ProcessFunction
>     after the operators want to use side-output. But I think the
>     solution is not very direct to come up with and if it really works
>     we might add it to the document of side-output. 
>
>     [1] https://issues.apache.org/jira/browse/FLINK-7954
>
>     Best,
>      Yun
>
>         ------------------Original Mail ------------------
>         *Sender:*Kostas Kloudas <kklou...@gmail.com
>         <mailto:kklou...@gmail.com>>
>         *Send Date:*Tue Aug 18 03:52:44 2020
>         *Recipients:*Dawid Wysakowicz <dwysakow...@apache.org
>         <mailto:dwysakow...@apache.org>>
>         *CC:*dev <d...@flink.apache.org <mailto:d...@flink.apache.org>>,
>         user <user@flink.apache.org <mailto:user@flink.apache.org>>
>         *Subject:*Re: [DISCUSS] Removing deprecated methods from
>         DataStream API
>
>             +1 for removing them.
>
>
>
>             From a quick look, most of them (not all) have been
>             deprecated a long time ago.
>
>
>
>             Cheers,
>
>             Kostas
>
>
>
>             On Mon, Aug 17, 2020 at 9:37 PM Dawid Wysakowicz wrote:
>
>             >
>
>             > @David Yes, my idea was to remove any use of fold method
>             and all related classes including WindowedStream#fold
>
>             >
>
>             > @Klou Good idea to also remove the deprecated
>             enableCheckpointing() &
>             StreamExecutionEnvironment#readFile and alike. I did
>             another pass over some of the classes and thought we could
>             also drop:
>
>             >
>
>             > ExecutionConfig#set/getCodeAnalysisMode
>
>             > ExecutionConfig#disable/enableSysoutLogging
>
>             > ExecutionConfig#set/isFailTaskOnCheckpointError
>
>             > ExecutionConfig#isLatencyTrackingEnabled
>
>             >
>
>             > As for the `forceCheckpointing` I am not fully convinced
>             to doing it. As far as I know iterations still do not
>             participate in checkpointing correctly. Therefore it still
>             might make sense to force it. In other words there is no
>             real alternative to that method. Unless we only remove the
>             methods from StreamExecutionEnvironment and redirect to
>             the setter in CheckpointConfig. WDYT?
>
>             >
>
>             > An updated list of methods I suggest to remove:
>
>             >
>
>             > ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>
>             > ExecutionConfig#disable/enableSysoutLogging (deprecated
>             in 1.10)
>
>             > ExecutionConfig#set/isFailTaskOnCheckpointError
>             (deprecated in 1.9)
>
>             > ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>
>             > ExecutionConfig#(get)/setNumberOfExecutionRetries()
>             (deprecated in 1.1?)
>
>             >
>             
> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
>             (deprecated in 1.2)
>
>             > RuntimeContext#getAllAccumulators (deprecated in 0.10)
>
>             > DataStream#fold and all related classes and methods such
>             as FoldFunction, FoldingState, FoldingStateDescriptor ...
>             (deprecated in 1.3/1.4)
>
>             >
>             StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>             (deprecated in 1.5)
>
>             > DataStream#split (deprecated in 1.8)
>
>             > Methods in (Connected)DataStream that specify keys as
>             either indices or field names such as DataStream#keyBy,
>             DataStream#partitionCustom, ConnectedStream#keyBy, ....
>             (deprecated in 1.11)
>
>             >
>
>             > Bear in mind that majority of the options listed above
>             in ExecutionConfig take no effect. They were left there
>             purely to satisfy the binary compatibility. Personally I
>             don't see any benefit of leaving a method and silently
>             dropping the underlying feature. The only configuration
>             that is respected is setting the number of execution retries.
>
>             >
>
>             > I also wanted to make it explicit that most of the
>             changes above would result in a binary incompatibility and
>             require additional exclusions in the japicmp. Those are:
>
>             >
>
>             > ExecutionConfig#set/getCodeAnalysisMode (deprecated in 1.9)
>
>             > ExecutionConfig#disable/enableSysoutLogging (deprecated
>             in 1.10)
>
>             > ExecutionConfig#set/isFailTaskOnCheckpointError
>             (deprecated in 1.9)
>
>             > ExecutionConfig#isLatencyTrackingEnabled (deprecated in 1.7)
>
>             > ExecutionConfig#(get)/setNumberOfExecutionRetries()
>             (deprecated in 1.1?)
>
>             > DataStream#fold and all related classes and methods such
>             as FoldFunction, FoldingState, FoldingStateDescriptor ...
>             (deprecated in 1.3/1.4)
>
>             > DataStream#split (deprecated in 1.8)
>
>             > Methods in (Connected)DataStream that specify keys as
>             either indices or field names such as DataStream#keyBy,
>             DataStream#partitionCustom, ConnectedStream#keyBy, ....
>             (deprecated in 1.11)
>
>             >
>             
> StreamExecutionEnvironment#readFile,readFileStream(...),socketTextStream(...),socketTextStream(...)
>             (deprecated in 1.2)
>
>             >
>
>             > Looking forward to more opinions on the issue.
>
>             >
>
>             > Best,
>
>             >
>
>             > Dawid
>
>             >
>
>             >
>
>             > On 17/08/2020 12:49, Kostas Kloudas wrote:
>
>             >
>
>             > Thanks a lot for starting this Dawid,
>
>             >
>
>             > Big +1 for the proposed clean-up, and I would also add
>             the deprecated
>
>             > methods of the StreamExecutionEnvironment like:
>
>             >
>
>             > enableCheckpointing(long interval, CheckpointingMode
>             mode, boolean force)
>
>             > enableCheckpointing()
>
>             > isForceCheckpointing()
>
>             >
>
>             > readFile(FileInputFormat inputFormat,String
>
>             > filePath,FileProcessingMode watchType,long interval,
>             FilePathFilter
>
>             > filter)
>
>             > readFileStream(...)
>
>             >
>
>             > socketTextStream(String hostname, int port, char
>             delimiter, long maxRetry)
>
>             > socketTextStream(String hostname, int port, char delimiter)
>
>             >
>
>             > There are more, like the
>             (get)/setNumberOfExecutionRetries() that were
>
>             > deprecated long ago, but I have not investigated to see
>             if they are
>
>             > actually easy to remove.
>
>             >
>
>             > Cheers,
>
>             > Kostas
>
>             >
>
>             > On Mon, Aug 17, 2020 at 10:53 AM Dawid Wysakowicz
>
>             > wrote:
>
>             >
>
>             > Hi devs and users,
>
>             >
>
>             > I wanted to ask you what do you think about removing
>             some of the deprecated APIs around the DataStream API.
>
>             >
>
>             > The APIs I have in mind are:
>
>             >
>
>             > RuntimeContext#getAllAccumulators (deprecated in 0.10)
>
>             > DataStream#fold and all related classes and methods such
>             as FoldFunction, FoldingState, FoldingStateDescriptor ...
>             (deprecated in 1.3/1.4)
>
>             >
>             StreamExecutionEnvironment#setStateBackend(AbstractStateBackend)
>             (deprecated in 1.5)
>
>             > DataStream#split (deprecated in 1.8)
>
>             > Methods in (Connected)DataStream that specify keys as
>             either indices or field names such as DataStream#keyBy,
>             DataStream#partitionCustom, ConnectedStream#keyBy, ....
>             (deprecated in 1.11)
>
>             >
>
>             > I think the first three should be straightforward. They
>             are long deprecated. The getAccumulators method is not
>             used very often in my opinion. The same applies to the
>             DataStream#fold which additionally is not very performant.
>             Lastly the setStateBackend has an alternative with a class
>             from the AbstractStateBackend hierarchy, therefore it will
>             be still code compatible. Moreover if we remove the
>             #setStateBackend(AbstractStateBackend) we will get rid off
>             warnings users have right now when setting a statebackend
>             as the correct method cannot be used without an explicit
>             casting.
>
>             >
>
>             > As for the DataStream#split I know there were some
>             objections against removing the #split method in the past.
>             I still believe the output tags can replace the split
>             method already.
>
>             >
>
>             > The only problem in the last set of methods I propose to
>             remove is that they were deprecated only in the last
>             release and those method were only partially deprecated.
>             Moreover some of the methods were not deprecated in
>             ConnectedStreams. Nevertheless I'd still be inclined to
>             remove the methods in this release.
>
>             >
>
>             > Let me know what do you think about it.
>
>             >
>
>             > Best,
>
>             >
>
>             > Dawid
>

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to