Hi, 

What would you say is the best way to persist data to multiple states ? 
Currently i have 3 options in mind: 

1- Process data and use the stream to send data to both state 
Stream stream = ...each...filter...bla.... 
stream.partitionPersist(state1, ...) 
stream.partitionPersist(state2, ...) 

2- Process data and chain the persists 
Stream stream = ...each...filter...bla.... 
stream.partitionPersist(state1, ...) .newValuesStream() 
.partitionPersist(state2, ...) 

3- Do a topology for each state which would all mostly does the same thing but 
for the persist part. 

My main concerns here is handling failures and efficiency. 

In my usecase i actually have 3 states. 2 of them can store in a non 
transactionnal way and the other should be opaque transactionnal but actually 
can't as it's just an api call that doesn't recognize duplicates. 
That's no big deal if we could just make sure it's not bound to the failures of 
the other states (meaning that if an other state fails we're sure this one 
hasn't yet processed data). 

This makes option n°1 a bit tricky as i'm never sure of the order in which the 
state will be processed. Or is there a way to be sure ? 
Option 2 would do i guess but i have to pass allong in the first state all the 
data needed for the second. Potentially i would like to filter the tuples that 
goes to state 1 or state 2. I would then have to make my own updater that uses 
a filter for the first persists so that it doesn't send everything to the state 
but still emits everything in the end. 
Options 3 would also do but there i wouldn't be that efficient: reading my 
spout two times, processing data the same way in both topology up until the 
persist part. 

Any ideas on the best way to handle this ? 
Thanks 


Regards 
Laurent 

Reply via email to