Hi,

My current application makes use of a DynamoDB database too map a key to a 
value. As each record enters the system the async-io calls this db and requests 
a value for the key but if that value doesn't exist a new value is generated 
and inserted.  I have managed to do all this in one update operation to the 
dynamodb so performance isn't too bad.  This is usable for our current load, 
but our load will increase considerably in the near future and as writes are 
expensive (each update even if it actually returns the existing value is 
classed as a write) this could be a cost factor going forward.

Looking at broadcast state seems like it might be the answer.  DynamoDB allows 
'streams' of table modification events to be output to what is essentially a 
kinesis stream, so it might be possible to avoid the majority of write calls by 
storing local copies of the mapping.  I should also point out that these 
mappings are essentially capped.  The majority of events that come through will 
have an existing mapping.

My idea is to try the following:

1. Application startup request the entire dataset from the DB (this is ~5m 
key:value pairs)
2. Inject this data into flink state somehow, possibly via broadcast state?
3. Subscribe to the DyanmoDB stream via broadcast state to capture updates to 
this table and update the flink state
4. When a record is processed, check flink state for existing mapping and 
proceed if found.  If not, then AsyncIO process as before to generate a new 
mapping
5. DynamoDB writes the new value to the stream so all operators get the new 
value via broadcast state

Is this idea workable?  I am unsure about the initial DB fetch and the AsyncIO 
process should a new value need to be inserted.

Any thoughts appreciated.

Thanks

O

Reply via email to