Hi, What would you say is the best way to persist data to multiple states ? Currently i have 3 options in mind:
1- Process data and use the stream to send data to both state Stream stream = ...each...filter...bla.... stream.partitionPersist(state1, ...) stream.partitionPersist(state2, ...) 2- Process data and chain the persists Stream stream = ...each...filter...bla.... stream.partitionPersist(state1, ...) .newValuesStream() .partitionPersist(state2, ...) 3- Do a topology for each state which would all mostly does the same thing but for the persist part. My main concerns here is handling failures and efficiency. In my usecase i actually have 3 states. 2 of them can store in a non transactionnal way and the other should be opaque transactionnal but actually can't as it's just an api call that doesn't recognize duplicates. That's no big deal if we could just make sure it's not bound to the failures of the other states (meaning that if an other state fails we're sure this one hasn't yet processed data). This makes option n°1 a bit tricky as i'm never sure of the order in which the state will be processed. Or is there a way to be sure ? Option 2 would do i guess but i have to pass allong in the first state all the data needed for the second. Potentially i would like to filter the tuples that goes to state 1 or state 2. I would then have to make my own updater that uses a filter for the first persists so that it doesn't send everything to the state but still emits everything in the end. Options 3 would also do but there i wouldn't be that efficient: reading my spout two times, processing data the same way in both topology up until the persist part. Any ideas on the best way to handle this ? Thanks Regards Laurent