I think what your looking for is a a side output. Change the logic to a process function. What is true goes to collector false can go to a side output. Which now gives you 2 streams
On Mon, May 10, 2021, 8:14 PM Nikola Hrusov <n.hru...@gmail.com> wrote: > Hi Arvid, > > In my case it's the latter, thus I have also thought about using the > filter (map is not useful in my case). > > What I am not sure which is better to be used? > In what case would you split a stream with side output and in what case > with filter? > Would there be any performance gain/pain based on which is used? > > Regards > , > Nikola > <%28%2B45%29%2060%2054%2032%2016> > > > On Mon, May 10, 2021 at 6:00 PM Arvid Heise <ar...@apache.org> wrote: > >> Hi Nikola, >> >> if you just want to apply a different user function to the records >> depending on the property "exist" the simplest way is to use >> >> source -> map(if exist do this else that) -> sink >> >> If it turns out that you want to apply a different subgraph, you can do >> >> source -> filter(if exist) -> do this -> union -> sink >> source -> filter(if not exist) -> do that -^ >> >> On Mon, May 10, 2021 at 3:07 PM Nikola Hrusov <n.hru...@gmail.com> wrote: >> >>> Hi, >>> >>> I am trying to find some information on what is the best way to split a >>> stream of the same data. >>> >>> For the given scenario: I have an object which has a property "exist" >>> >>> I want to split the stream based on this property, do something, and >>> afterwards join it again into a single stream. >>> >>> Initial (A) -> Split stream based on exist (B) or not (C) -> union both >>> streams (D) >>> >>> I could find some similar topics on StackOverflow: >>> - >>> https://stackoverflow.com/questions/53588554/apache-flink-using-filter-or-split-to-split-a-stream >>> - >>> https://stackoverflow.com/questions/61752728/how-to-get-output-of-the-values-that-are-not-matched-in-filter-function-in-apach >>> >>> but none of them really gives a definitive answer. >>> >>> What I am thinking about is using 1) filter or 2) side output. >>> >>> I know that one of the use cases of side output is that it can have >>> different data types. That is not my case as it will be the same object >>> going through the whole pipeline. >>> >>> So both options look more or less the same to me, however I do not know >>> the flink internals as good as I would like to as of this point. >>> >>> Can some of you guys shed some light and perhaps tell me if I am >>> mistaken in my thoughts? >>> >>> Thanks. >>> >>> Regards >>> , >>> Nikola >>> >>