Hey! Careful: The semantics in SQL of a "not-equal" join are quite different from a NOT IN statement.
Here is how you do the equivalent of NOT IN: If the list of elements is small and known up front, create a hash set and give it to a filter function (closure or constructor). The filter function can look up whether the element is contained or not. If the elements are not known up front, use a broadcast variable that you attach to a RichFilterFunction. In the filter function's open() method, grab the broadcast variable and turn it into a hash set. The filter is the same as above then. Check out the API guides for some examples of how to use broadcast variables. Stephan Am 11.12.2014 12:17 schrieb "Malte Schwarzer" <[email protected]>: > Hi, > > is there an easy way to a NOT IN or something like > join().where().notEquals() on two datasets with Flink? > > Cheers > Malte >
