Hi again,

something that I don't find (easily) in the documentation is what the
recommended method is to discard data from the stream.

On one hand, I could always use flatMap(), even if it is "per message"
since that allows me to return zero or one objects.

DataStream<MyType> stream =
    env.addSource( source )
       .flatMap( new MyFunction() )


But that seems a bit misleading, as the casual observer will get the idea
that MyFunction 'branches' out, but it doesn't.

The other "obvious" choice is to return null and follow with a filter...

DataStream<MyType> stream =
    env.addSource( source )
               .map( new MyFunction() )
               .filter( Objects::nonNull )

BUT, that doesn't work with Java 8 method references like above, so I have
to create my own filter to get the type information correct to Flink;

DataStream<MyType> stream =
    env.addSource( source )
               .map( new MyFunction() )
               .filter( new DiscardNullFilter<>() )


And in my opinion, that ends up looking ugly as the streams/pipeline (not
used to terminology yet) quickly have many transformations and branches,
and having a null check after each seems to put the burden of knowledge in
the wrong spot ("Can this function return null?")

Throwing an exception is shutting down the entire stream, which seems
overly aggressive for many data related discards.

Any other choices?

Cheers
-- 
Niclas Hedhman, Software Developer
http://zest.apache.org - New Energy for Java

Reply via email to