A very interesting thread in the JSR-107 group, which appears just as Mircea has looked into the XA transactions and cache loaders/stores. Going back to that thread, it wasn't very clear what would happen if Infinispan caches were configured with XA transactions and they had a cache store. What's should a user expect in that case? IOW, how does our approach here compare to what's being suggested in the thread below? My feeling is that we're doing a variant of Option 3, where each cache store will run its own transaction (if they support it...)
@Manik, It's also interesting from a data grid perspective since it highlights the boundaries of a cache vs data grid in this area. Cheers, Begin forwarded message: > From: Brian Oliver <brian.oli...@oracle.com> > Subject: Re: Transaction Semantics when using CacheLoaders and CacheWriters > Date: August 1, 2013 5:55:14 PM GMT+02:00 > To: jsr...@googlegroups.com > Reply-To: jsr...@googlegroups.com > > Thanks for your feedback. It's much appreciated. > > Interestingly Oracle Coherence mostly takes much the same approach. > Transactional (XA) Multi-Version-Concurrency-Control Caches don't allow Cache > Loaders or Cache Writers (or Expiry) aka: a stronger form of Option 2. > > Personally I don't really classify these Caches as Caches (as eviction and > expiry isn't supported). In essence they are really a transactional map, but > leverage the Coherence NamedCache interface. Ultimately it's pure "Data > Grid" functionality. > > While I think developers may like to think Option 1 is possible, when anyone > explains the "cost" of this, they reluctantly decide to use Option 2, or move > to using Entry Processors - which provides the atomicity for the most part. > > Historically Coherence also supported a form of Option 3 - but that also > presents some challenges. > > I'm trying hard to find an answer to these challenges, but the way forward is > unclear. What I can tell from our discussions here, in this group and at > conferences, those that have shown interest in "transactionality" of Caches > aren't really wanting Caches. They want an "fast in-memory" data-stores, > perhaps like a map or nosql, to transact against, because they don't want to > transact against a database. Why? They are seen as bottleneck or they are > seen as being to "slow" and are trying to solve the architectural problem of > the layer below their application tier. They like to call these "Caches", > because they are "in-memory", but technically they aren't Caches. When you > get down to it, ultimately the features and semantics being requested aren't > really caches. So perhaps this is where the Data Grid specification can > come into play? > > With my "standardization hat" on, my biggest concern is that anytime a > developer needs to change their application, say between vendors, especially > to adopt transactions that are "implementation specific", it leads me to > believe there's something wrong with the specification. Personally I think we > should be making it "easier" to adopt not harder. > > On Thursday, August 1, 2013 10:55:21 AM UTC-4, Brian Martin wrote: > Brian, > > I think you are spot-on with the problem and this is why we don't currently > (in WebSphere eXtreme Scale) allows Loaders to be part of a distributed > transaction that cross containers [your option 2]. If the transaction is to > a single container, then we allow the local transaction (a believe this is > equivalent to a variation of your option 3). As your dialog indicates, the > scenario is messy and I don't like the state we are in currently with > different capabilities depending on how many containers are enlisted in your > transaction. At the moment, I don't have a better suggestion but I think > your concern is valid and we should hash at a solution the community agrees > with. > > Brian Martin > IBM > WebSphere eXtreme Scale > > > On Thu, Aug 1, 2013 at 9:55 AM, Brian Oliver <brian....@oracle.com> wrote: > Hi All, > > I'd like to propose the challenge of how we think vendors should deal with > transactions in the context of Caches with CacheLoaders/Writers configured, > especially in the context of a distributed Cache. While this is an > "implementation concern", it's very important to see how this may be > implemented as it very much effects the API design. > > As part of reviewing the specification with the Java EE team, and in > particular how multiple-servers will interact, we've found a few challenges. > In the spirit of openness, I've added some commentary to the following > issue: https://github.com/jsr107/jsr107spec/issues/153 > > Currently I feel that the way the API is defined, all CacheLoader and > CacheWriter operations will need to be performed "locally" which > fundamentally prevents efficient (or any) implementation in a highly > concurrent and distributed manner. Furthermore, interaction across multiple > application processes, Java SE or otherwise may be a problem, simply because > the API doesn't provide enough fidelity for CacheLoader and CacheWriter > operations to be part of a larger transaction. eg: there's no "prepare" and > "commit" for CacheWriters! Just "store". > > Even with a few changes, as I've suggested in the issue above, I honestly > feel we're essentially forcing vendors to implement fully recoverable XA > Transaction Managers as part of their Caching infrastructure, simply to > coordinate transactions across the underlying Cache Writers in a distributed > setting. Why? because the API basically implies this coordination would > need to be performed by the Cache implementation itself - even in "local" > mode! > > eg: Say a developer starts a transaction that updated n entries, those of > which are partitioned across n servers. As part of the "commit", all n > servers will need to take care of committing, say to memory. Behind this > are the Cache Writers, which also need to be coordinated. The entries need > to be stored as part of the Caching contract. > > Unfortunately our current API provides no mechanism to coordinately this, eg: > share a global transaction to a single database across said the n Cache > Writers. Without this what essentially happens at the moment is that each > CacheWriter starts their own individual transaction, not attached to or part > of the application transaction. That may seem reasonable to some, but > consider the case where there is a parent-child or some other relationship > between the cache entries that are being updated (which is why your using a > transaction in the first place). If individual transactions are used by the > Cache Writers and are committed in some non-deterministic order (as there is > no ordering constraints or ways to control this in the API) database > integrity constraints are likely to be violated. So while the "commit" to the > Cache may seem to be atomic, the "stores" to the underlying Cache Writers > aren't. > > Essentially there are a few options (as I've covered in the issue). > > 1. Allow a global transaction to be provided to all of the Cache Writers. > Wow... that would be pretty crazy and horribly slow. Every server would > need to contact the transaction manager, do a bunch of work, etc, just to set > things up. > > This sort of contradicts the entire reason people would be using a cache in > the first place. To even achieve this I think we'd need to change the > CacheLoader/Writer API. Specifically we'd need to add "prepare", "commit" > and "rollback". > > 2. Don't allow CacheLoaders/Writers to be configured with Caches. I think > this is pretty easy to do, but again, wow... that would force developers to > change their application code significantly to use Transactional Caches with > external stores. > > 3. Only allow "local" transactions to be performed. This would ultimately > mean that Caches would be the last-local-resource in XA transactions (not too > bad, though it's a challenge if there are others as well). Additionally in > the distributed case, while entries may be distributed, the loading / writing > would always occur locally. This works, but significantly reduces > scalability as all "versioning" of data being touched may need to be held > locally. It's highly likely a huge amount of distributed locks would be > required (if the Cache isn't using MVVC), which we know is horribly slow. > eg: imagine a transaction with a "putAll" containing a few million entries. > In pessimistic mode, an implementation may need to do a lot of work locally > to ensure versioning is held and updated correctly. It may also need to > perform a few million locks! Saying that a developer shouldn't use "putAll" > with transactions probably isn't a solution either. > > Personally I'm not sure if any of this is desirable? I haven't really seen > much of this discussed or addressed. Perhaps I'm missing something? I'd > certainly be happy to do some further research! > > The bottom line is that while we're trying to define an API that provides > developers with a means to improve the performance, through-put and > scalability of an application through the temporary storage of data, the > requirements to implement transactions, even optionally, may throw much of > the benefit away. > > It would be great to get your thoughts on this. I don't think we can get > away with the statement "transactions are implementation specific" in the > specification, especially if the API doesn't provide enough fidelity to cover > these simple use-cases. > > -- Brian > > > > > > > -- > You received this message because you are subscribed to the Google Groups > "jsr107" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to jsr107+un...@googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > > > > > -- > You received this message because you are subscribed to the Google Groups > "jsr107" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to jsr107+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. > > -- Galder Zamarreño gal...@redhat.com twitter.com/galderz Project Lead, Escalante http://escalante.io Engineer, Infinispan http://infinispan.org
_______________________________________________ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev