Another possibility would be injecting pseudo events into the source and having a stateful filter.
The event would be something like “key X is now owned by green”. I can do that because getting a list of keys seen in the past X minutes is cheap (we have it already) But it’s unclear what impact would be adding such state to the filter On Mon 11 Feb 2019 at 13:33, Stephen Connolly < stephen.alan.conno...@gmail.com> wrote: > > > On Mon, 11 Feb 2019 at 13:26, Stephen Connolly < > stephen.alan.conno...@gmail.com> wrote: > >> I have my main application updating with a blue-green deployment strategy >> whereby a new version (always called green) starts receiving an initial >> fraction of the web traffic and then - based on the error rates - we >> progress the % of traffic until 100% of traffic is being handled by the >> green version. At which point we decommission blue and green is the new >> blue when the next version comes along. >> >> Applied to Flink, my initial thought is that you would run the two >> topologies in parallel, but the first action of each topology would be a >> filter based on the key. >> >> You basically would use a consistent transformation of the key into a >> number between 0 and 100 and the filter would be: >> >> (key) -> color == green ? f(key) < level : f(key) >= level >> >> Then I can use a suitable metric to determine if the new topology is >> working and ramp up or down the level. >> >> One issue I foresee is what happens if the level changes mid-window, I >> will have output from both topologies when the window ends. >> >> In the case of my output, which is aggregatable, I will get the same >> results from two rows as from one row *provided* that the switch from blue >> to green is synchronized between the two topologies. That sounds like a >> hard problem though. >> >> >> Another thought I had was to let the web front-end decide based on the >> same key vs level approach. Rather than submit the raw event, I would add >> the target topology to the event and the filter just selects based on >> whether it is the target topology. This has the advantage that I know each >> event will only ever be processed by one of green or blue. Heck I could >> even use the main web application's blue-green deployment to drive the >> flink blue green deployment >> > > In other words, if a blue web node receives an event upload it adds > "blue", whereas if a green web node receives an event upload it adds > "green" (not quite those strings but rather the web deployment sequence > number). This has the advantage that the web nodes do not need to parse the > event payload. The % of web traffic will result in the matching % of events > being sent to blue and green. Also this means that all keys get processed > at the target % during the deployment, which can help flush out bugs. > > I can therefore stop the old topology at > 1 window after the green web > node started getting 100% of traffic in order to allow any existing windows > in flight to flush all the way to the datastore... > > Out of order events would be tagged as green once green is 100% of > traffic, and so can be processed correctly... > > And I can completely ignore topology migration serialization issues... > > Sounding very tempting... there must be something wrong... > > (or maybe my data storage plan just allows me to make this kind of > optimization!) > > >> as due to the way I structure my results I don't care if I get two rows >> of counts for a time window or one row of counts, because I'm adding up the >> total counts across multiple rows and sum is sum! >> > >> >> Anyone else had to try and deal with this type of thing? >> >> -stephenc >> >> >> -- Sent from my phone