Hi Aljoscha,

Thanks for your response.

With all this preliminary information collected, I’ll start a formal process.

Thank everybody for your attention.

Best,
Xingcan

> On Jul 8, 2019, at 10:17 AM, Aljoscha Krettek <aljos...@apache.org> wrote:
> 
> I think this would benefit from a FLIP, that neatly sums up the options, and 
> which then gives us also a point where we can vote and ratify a decision.
> 
> As a gut feeling, I most like Option 3). Initially I would have preferred 
> option 1) (because of a sense of API purity), but by now I think it’s good 
> that users have this simpler option.
> 
> Aljoscha 
> 
>> On 8. Jul 2019, at 06:39, Xingcan Cui <xingc...@gmail.com 
>> <mailto:xingc...@gmail.com>> wrote:
>> 
>> Hi all,
>> 
>> Thanks for your participation.
>> 
>> In this thread, we got one +1 for option 1 and option 3, respectively. In 
>> the original thread[1], we got two +1 for option 1, one +1 for option 2, and 
>> five +1 and one -1 for option 3.
>> 
>> To summarize,
>> 
>> Option 1 (port side output to flatMap and deprecate split/select): three +1
>> Option 2 (introduce a new split/select and deprecate existing one): one +1
>> Option 3 ("correct" the existing split/select): six +1 and one -1
>> 
>> It seems that most people involved are in favor of "correcting" the existing 
>> split/select. However, this will definitely break the API compatibility, in 
>> a subtle way.
>> 
>> IMO, the real behavior of consecutive split/select's has never been 
>> thoroughly clarified. Even in the community, it hard to say that we come 
>> into a consensus on its real semantics[2-4]. Though the initial design is 
>> not ambiguous, there's no doubt that its concept has drifted. 
>> 
>> As the split/select is quite an ancient API, I cc'ed this to more members. 
>> It couldn't be better if you can share your opinions on this.
>> 
>> Thanks,
>> Xingcan
>> 
>> [1] 
>> https://lists.apache.org/thread.html/f94ea5c97f96c705527dcc809b0e2b69e87a4c5d400cb7c61859e1f4@%3Cdev.flink.apache.org%3E
>>  
>> <https://lists.apache.org/thread.html/f94ea5c97f96c705527dcc809b0e2b69e87a4c5d400cb7c61859e1f4@%3Cdev.flink.apache.org%3E>
>> [2] https://issues.apache.org/jira/browse/FLINK-1772 
>> <https://issues.apache.org/jira/browse/FLINK-1772>
>> [3] https://issues.apache.org/jira/browse/FLINK-5031 
>> <https://issues.apache.org/jira/browse/FLINK-5031>
>> [4] https://issues.apache.org/jira/browse/FLINK-11084 
>> <https://issues.apache.org/jira/browse/FLINK-11084>
>> 
>> 
>>> On Jul 5, 2019, at 12:04 AM, 杨力 <bill.le...@gmail.com 
>>> <mailto:bill.le...@gmail.com>> wrote:
>>> 
>>> I prefer the 1) approach. I used to carry fields, which is needed only for 
>>> splitting, in the outputs of flatMap functions. Replacing it with 
>>> outputTags would simplify data structures.
>>> 
>>> Xingcan Cui <xingc...@gmail.com <mailto:xingc...@gmail.com> 
>>> <mailto:xingc...@gmail.com <mailto:xingc...@gmail.com>>> 于 2019年7月5日周五 
>>> 上午2:20写道:
>>> Hi folks,
>>> 
>>> Two weeks ago, I started a thread [1] discussing whether we should discard 
>>> the split/select methods (which have been marked as deprecation since v1.7) 
>>> in DataStream API. 
>>> 
>>> The fact is, these methods will cause "unexpected" results when using 
>>> consecutively (e.g., ds.split(a).select(b).split(c).select(d)) or 
>>> multi-times on the same target (e.g., ds.split(a).select(b), 
>>> ds.split(c).select(d)). The reason is that following the initial design, 
>>> the new split/select logic will always override the existing one on the 
>>> same target operator, rather than append to it. Some users may not be aware 
>>> of that, but if you do, a current solution would be to use the more 
>>> powerful side output feature [2].
>>> 
>>> FLINK-11084 <https://issues.apache.org/jira/browse/FLINK-11084 
>>> <https://issues.apache.org/jira/browse/FLINK-11084>> added some 
>>> restrictions to the existing split/select logic and suggest to replace it 
>>> with side output in the future. However, considering that the side output 
>>> is currently only available in the process function layer and the 
>>> split/select could have been widely used in many real-world applications, 
>>> we'd like to start a vote andlisten to the community on how to deal with 
>>> them.
>>> 
>>> In the discussion thread [1], we proposed three solutions as follows. All 
>>> of them are feasible but have different impacts on the public API.
>>> 
>>> 1) Port the side output feature to DataStream API's flatMap and replace 
>>> split/select with it.
>>> 
>>> 2) Introduce a dedicated function in DataStream API (with the "correct" 
>>> behavior but a different name) that can be used to replace the existing 
>>> split/select.
>>> 
>>> 3) Keep split/select but change the behavior/semantic to be "correct".
>>> 
>>> Note that this is just a vote for gathering information, so feel free to 
>>> participate and share your opinions.
>>> 
>>> The voting time will end on July 7th 17:00 EDT.
>>> 
>>> Thanks,
>>> Xingcan
>>> 
>>> [1] 
>>> https://lists.apache.org/thread.html/f94ea5c97f96c705527dcc809b0e2b69e87a4c5d400cb7c61859e1f4@%3Cdev.flink.apache.org%3E
>>>  
>>> <https://lists.apache.org/thread.html/f94ea5c97f96c705527dcc809b0e2b69e87a4c5d400cb7c61859e1f4@%3Cdev.flink.apache.org%3E><https://lists.apache.org/thread.html/f94ea5c97f96c705527dcc809b0e2b69e87a4c5d400cb7c61859e1f4@%3Cdev.flink.apache.org%3E
>>>  
>>> <https://lists.apache.org/thread.html/f94ea5c97f96c705527dcc809b0e2b69e87a4c5d400cb7c61859e1f4@%3Cdev.flink.apache.org%3E>>
>>> [2] 
>>> https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/side_output.html
>>>  
>>> <https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/side_output.html>
>>>  
>>> <https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/side_output.html
>>>  
>>> <https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/side_output.html>>

Reply via email to