I don't know what the "best practice" is... but I actually like a 4th option: creating a composite state.
Instead of sending all data to every state, I needed to randomly shard data between an arbitrary number of states. I've thrown this on a gist here: https://gist.github.com/codyaray/d58c1aaf688f27b72fdd You could probably take a similar approach with a CompositeState that would send the data to all TridentStates instead of randomly choosing a state. Good luck! -Cody On Fri, May 2, 2014 at 3:12 AM, Laurent Thoulon < laurent.thou...@ldmobile.net> wrote: > Hi, > > What would you say is the best way to persist data to multiple states ? > Currently i have 3 options in mind: > > 1- Process data and use the stream to send data to both state > Stream stream = ...each...filter...bla.... > stream.partitionPersist(state1, ...) > stream.partitionPersist(state2, ...) > > 2- Process data and chain the persists > Stream stream = ...each...filter...bla.... > stream.partitionPersist(state1, > ...).newValuesStream().partitionPersist(state2, ...) > > 3- Do a topology for each state which would all mostly does the same thing > but for the persist part. > > My main concerns here is handling failures and efficiency. > > In my usecase i actually have 3 states. 2 of them can store in a non > transactionnal way and the other should be opaque transactionnal but > actually can't as it's just an api call that doesn't recognize duplicates. > That's no big deal if we could just make sure it's not bound to the > failures of the other states (meaning that if an other state fails we're > sure this one hasn't yet processed data). > > This makes option n°1 a bit tricky as i'm never sure of the order in which > the state will be processed. Or is there a way to be sure ? > Option 2 would do i guess but i have to pass allong in the first state all > the data needed for the second. Potentially i would like to filter the > tuples that goes to state 1 or state 2. I would then have to make my own > updater that uses a filter for the first persists so that it doesn't send > everything to the state but still emits everything in the end. > Options 3 would also do but there i wouldn't be that efficient: reading my > spout two times, processing data the same way in both topology up until the > persist part. > > Any ideas on the best way to handle this ? > Thanks > > Regards > Laurent > -- Cody A. Ray, LEED AP cody.a....@gmail.com 215.501.7891