Hi there, I just started investigating Flink and I'm curious if I'm approaching my issue in the right way.
My current usecase is modeling a series of transformations, where I start with some transformations, which when done can yield another transformation, or a result to output to some sink, or a Join operation that will extract data from some other data set and combine it with existing data (and output a transformation that should be processed like any other transform). The transformations and results are easy to deal with, but joins are more troubling. Here is my current solution that I got to work: //initialSolution contains all external data to join on. initialSolution.iterateDelta(initialWorkset, 10000, Array("id")) { (solution: DataSet[Work[String]], workset: DataSet[Work[String]]) => { //handle joins separately val joined = handleJoins(solution, workset.filter(isJoin(_))) //transformations are handled separately as well val transformed = handleTransformations(workset.filter(isTransform(_))) val nextWorkSet = transformed.filter(isWork(_)).union(joined) val solutionUpdate = transformed.filter(isResult(_)) (solutionUpdate, nextWorkSet) } } My questions are: 1. Is this the right way to use Flink? Based on the documentation (correct me if I'm wrong) it seems that in the iterative case the external data (to be used in the join) should be in the solution DataSet, so if this usecase has multiple external data sources to join on, they are all collected in the initial solution DataSet. Would having all of this different data in the solution have bad repercussions for partitioning/performance? 2. Doing the joins as part of the iteration seems a bit wrong to me (I might just be thinking about the issue in the wrong way). I alternatively tried to model this approach as a series of DataStreams, where the code is pretty much the same as above, but where the iteration occurs on stream T, which splits off to two streams J and R, where R is just the result sink, and J has the logic that joins incoming data, and after the join sends the result back to stream T. But I didn't see a good way to say "send result of J back to T, and run all the standard iterative logic on that" using the data stream API. I could manually create some endpoints for these streams to hit and thus achieve this behavior, but is there an easy way I'm missing that can achieve this via the flink api? Thanks, Li