Hi,

I've very recently come upon flink and I'm trying to use it to solve a
problem that I have.

I have a stream of User Settings updates coming through kafka queue. I need
to store the most recent settings along with a history of settings for each
user in redshift which then feeds into analytics dashboards.

I've been contemplating using Flink for this problem. I wanted some
guidance from people experienced in Flink to help me decide if Flink is
suited to this problem and if so what approach might work best. I am
considering the following approaches:

1. Create a secondary key-value database with the users latest settings and
lookup these settings after grouping the stream byKey(userId) to check if a
setting has changed and if so create a history record. I came across this
stackoverflow thread:
http://stackoverflow.com/questions/38866078/how-to-look-up-and-update-the-state-of-a-record-from-a-database-in-apache-flink
to help with this approach.

2. Pull the current snapshot of users from redshift and keep it as state in
Flink program at program start (the snapshot isn't huge ~1GB). Subsequently
lookup from this state and update it when processing events.

In both these cases I plan to create a Redshift sink that batches updates
to history as well as latest state and persists to redshift by batches
(through s3 and copy command for history, through a update on join for
snapshot).

Is one of these designs more suited to working with Flink? Is there an
alternative I should consider?

Thanks!

-H

Reply via email to