I think what your looking for is a a side output. Change the logic to a
process function. What is true goes to collector false can go to a side
output. Which now gives you 2 streams

On Mon, May 10, 2021, 8:14 PM Nikola Hrusov <n.hru...@gmail.com> wrote:

> Hi Arvid,
>
> In my case it's the latter, thus I have also thought about using the
> filter (map is not useful in my case).
>
> What I am not sure which is better to be used?
> In what case would you split a stream with side output and in what case
> with filter?
> Would there be any performance gain/pain based on which is used?
>
> Regards
> ,
> Nikola
> <%28%2B45%29%2060%2054%2032%2016>
>
>
> On Mon, May 10, 2021 at 6:00 PM Arvid Heise <ar...@apache.org> wrote:
>
>> Hi Nikola,
>>
>> if you just want to apply a different user function to the records
>> depending on the property "exist" the simplest way is to use
>>
>> source -> map(if exist do this else that) -> sink
>>
>> If it turns out that you want to apply a different subgraph, you can do
>>
>> source -> filter(if exist) -> do this -> union -> sink
>> source -> filter(if not exist) -> do that -^
>>
>> On Mon, May 10, 2021 at 3:07 PM Nikola Hrusov <n.hru...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I am trying to find some information on what is the best way to split a
>>> stream of the same data.
>>>
>>> For the given scenario: I have an object which has a property "exist"
>>>
>>> I want to split the stream based on this property, do something, and
>>> afterwards join it again into a single stream.
>>>
>>> Initial (A) -> Split stream based on exist (B) or not (C) -> union both
>>> streams (D)
>>>
>>> I could find some similar topics on StackOverflow:
>>> -
>>> https://stackoverflow.com/questions/53588554/apache-flink-using-filter-or-split-to-split-a-stream
>>> -
>>> https://stackoverflow.com/questions/61752728/how-to-get-output-of-the-values-that-are-not-matched-in-filter-function-in-apach
>>>
>>> but none of them really gives a definitive answer.
>>>
>>> What I am thinking about is using 1) filter or 2) side output.
>>>
>>> I know that one of the use cases of side output is that it can have
>>> different data types. That is not my case as it will be the same object
>>> going through the whole pipeline.
>>>
>>> So both options look more or less the same to me, however I do not know
>>> the flink internals as good as I would like to as of this point.
>>>
>>> Can some of you guys shed some light and perhaps tell me if I am
>>> mistaken in my thoughts?
>>>
>>> Thanks.
>>>
>>> Regards
>>> ,
>>> Nikola
>>>
>>

Reply via email to