Another possibility would be injecting pseudo events into the source and
having a stateful filter.

The event would be something like “key X is now owned by green”.

I can do that because getting a list of keys seen in the past X minutes is
cheap (we have it already)

But it’s unclear what impact would be adding such state to the filter

On Mon 11 Feb 2019 at 13:33, Stephen Connolly <
stephen.alan.conno...@gmail.com> wrote:

>
>
> On Mon, 11 Feb 2019 at 13:26, Stephen Connolly <
> stephen.alan.conno...@gmail.com> wrote:
>
>> I have my main application updating with a blue-green deployment strategy
>> whereby a new version (always called green) starts receiving an initial
>> fraction of the web traffic and then - based on the error rates - we
>> progress the % of traffic until 100% of traffic is being handled by the
>> green version. At which point we decommission blue and green is the new
>> blue when the next version comes along.
>>
>> Applied to Flink, my initial thought is that you would run the two
>> topologies in parallel, but the first action of each topology would be a
>> filter based on the key.
>>
>> You basically would use a consistent transformation of the key into a
>> number between 0 and 100 and the filter would be:
>>
>>     (key) -> color == green ? f(key) < level : f(key) >= level
>>
>> Then I can use a suitable metric to determine if the new topology is
>> working and ramp up or down the level.
>>
>> One issue I foresee is what happens if the level changes mid-window, I
>> will have output from both topologies when the window ends.
>>
>> In the case of my output, which is aggregatable, I will get the same
>> results from two rows as from one row *provided* that the switch from blue
>> to green is synchronized between the two topologies. That sounds like a
>> hard problem though.
>>
>>
>> Another thought I had was to let the web front-end decide based on the
>> same key vs level approach. Rather than submit the raw event, I would add
>> the target topology to the event and the filter just selects based on
>> whether it is the target topology. This has the advantage that I know each
>> event will only ever be processed by one of green or blue. Heck I could
>> even use the main web application's blue-green deployment to drive the
>> flink blue green deployment
>>
>
> In other words, if a blue web node receives an event upload it adds
> "blue", whereas if a green web node receives an event upload it adds
> "green" (not quite those strings but rather the web deployment sequence
> number). This has the advantage that the web nodes do not need to parse the
> event payload. The % of web traffic will result in the matching % of events
> being sent to blue and green. Also this means that all keys get processed
> at the target % during the deployment, which can help flush out bugs.
>
> I can therefore stop the old topology at > 1 window after the green web
> node started getting 100% of traffic in order to allow any existing windows
> in flight to flush all the way to the datastore...
>
> Out of order events would be tagged as green once green is 100% of
> traffic, and so can be processed correctly...
>
> And I can completely ignore topology migration serialization issues...
>
> Sounding very tempting... there must be something wrong...
>
> (or maybe my data storage plan just allows me to make this kind of
> optimization!)
>
>
>> as due to the way I structure my results I don't care if I get two rows
>> of counts for a time window or one row of counts, because I'm adding up the
>> total counts across multiple rows and sum is sum!
>>
>
>>
>> Anyone else had to try and deal with this type of thing?
>>
>> -stephenc
>>
>>
>> --
Sent from my phone

Reply via email to