A very interesting thread in the JSR-107 group, which appears just as Mircea 
has looked into the XA transactions and cache loaders/stores. Going back to 
that thread, it wasn't very clear what would happen if Infinispan caches were 
configured with XA transactions and they had a cache store. What's should a 
user expect in that case? IOW, how does our approach here compare to what's 
being suggested in the thread below? My feeling is that we're doing a variant 
of Option 3, where each cache store will run its own transaction (if they 
support it...)

@Manik, It's also interesting from a data grid perspective since it highlights 
the boundaries of a cache vs data grid in this area. 

Cheers,

Begin forwarded message:

> From: Brian Oliver <brian.oli...@oracle.com>
> Subject: Re: Transaction Semantics when using CacheLoaders and CacheWriters
> Date: August 1, 2013 5:55:14 PM GMT+02:00
> To: jsr...@googlegroups.com
> Reply-To: jsr...@googlegroups.com
> 
> Thanks for your feedback.  It's much appreciated.
> 
> Interestingly Oracle Coherence mostly takes much the same approach.   
> Transactional (XA) Multi-Version-Concurrency-Control Caches don't allow Cache 
> Loaders or Cache Writers (or Expiry) aka: a stronger form of Option 2.  
> 
> Personally I don't really classify these Caches as Caches (as eviction and 
> expiry isn't supported).  In essence they are really a transactional map, but 
> leverage the Coherence NamedCache interface.  Ultimately it's pure "Data 
> Grid" functionality.
> 
> While I think developers may like to think Option 1 is possible, when anyone 
> explains the "cost" of this, they reluctantly decide to use Option 2, or move 
> to using Entry Processors - which provides the atomicity for the most part.
> 
> Historically Coherence also supported a form of Option 3 - but that also 
> presents some challenges.   
> 
> I'm trying hard to find an answer to these challenges, but the way forward is 
> unclear.  What I can tell from our discussions here, in this group and at 
> conferences, those that have shown interest in "transactionality" of Caches 
> aren't really wanting Caches.   They want an "fast in-memory" data-stores, 
> perhaps like a map or nosql, to transact against, because they don't want to 
> transact against a database.  Why?  They are seen as bottleneck or they are 
> seen as being to "slow" and are trying to solve the architectural problem of 
> the layer below their application tier.  They like to call these "Caches", 
> because they are "in-memory", but technically they aren't Caches.   When you 
> get down to it, ultimately the features and semantics being requested aren't 
> really caches.   So perhaps this is where the Data Grid specification can 
> come into play?
> 
> With my "standardization hat" on, my biggest concern is that anytime a 
> developer needs to change their application, say between vendors, especially 
> to adopt transactions that are "implementation specific", it leads me to 
> believe there's something wrong with the specification. Personally I think we 
> should be making it "easier" to adopt not harder.
> 
> On Thursday, August 1, 2013 10:55:21 AM UTC-4, Brian Martin wrote:
> Brian,
> 
> I think you are spot-on with the problem and this is why we don't currently 
> (in WebSphere eXtreme Scale) allows Loaders to be part of a distributed 
> transaction that cross containers [your option 2].  If the transaction is to 
> a single container, then we allow the local transaction (a believe this is 
> equivalent to a variation of your option 3).    As your dialog indicates, the 
> scenario is messy and I don't like the state we are in currently with 
> different capabilities depending on how many containers are enlisted in your 
> transaction.   At the moment, I don't have a better suggestion but I think 
> your concern is valid and we should hash at a solution the community agrees 
> with.
> 
> Brian Martin
> IBM
> WebSphere eXtreme Scale
> 
> 
> On Thu, Aug 1, 2013 at 9:55 AM, Brian Oliver <brian....@oracle.com> wrote:
> Hi All,
> 
> I'd like to propose the challenge of how we think vendors should deal with 
> transactions in the context of Caches with CacheLoaders/Writers configured, 
> especially in the context of a distributed Cache.   While this is an 
> "implementation concern", it's very important to see how this may be 
> implemented as it very much effects the API design.
> 
> As part of reviewing the specification with the Java EE team, and in 
> particular how multiple-servers will interact, we've found a few challenges.  
>  In the spirit of openness, I've added some commentary to the following 
> issue: https://github.com/jsr107/jsr107spec/issues/153
> 
> Currently I feel that the way the API is defined, all CacheLoader and 
> CacheWriter operations will need to be performed "locally" which 
> fundamentally prevents efficient (or any) implementation in a highly 
> concurrent and distributed manner.   Furthermore, interaction across multiple 
> application processes, Java SE or otherwise may be a problem, simply because 
> the API doesn't provide enough fidelity for CacheLoader and CacheWriter 
> operations to be part of a larger transaction.  eg: there's no "prepare" and 
> "commit" for CacheWriters!  Just "store".
> 
> Even with a few changes, as I've suggested in the issue above, I honestly 
> feel we're essentially forcing vendors to implement fully recoverable XA 
> Transaction Managers as part of their Caching infrastructure, simply to 
> coordinate transactions across the underlying Cache Writers in a distributed 
> setting.   Why? because the API basically implies this coordination would 
> need to be performed by the Cache implementation itself - even in "local" 
> mode!
> 
> eg:   Say a developer starts a transaction that updated n entries, those of 
> which are partitioned across n servers.   As part of the "commit", all n 
> servers will need to take care of committing, say to memory.   Behind this 
> are the Cache Writers, which also need to be coordinated.   The entries need 
> to be stored as part of the Caching contract. 
> 
> Unfortunately our current API provides no mechanism to coordinately this, eg: 
> share a global transaction to a single database across said the n Cache 
> Writers. Without this what essentially happens at the moment is that each 
> CacheWriter starts their own individual transaction, not attached to or part 
> of the application transaction.   That may seem reasonable to some, but 
> consider the case where there is a parent-child or some other relationship 
> between the cache entries that are being updated (which is why your using a 
> transaction in the first place).  If individual transactions are used by the 
> Cache Writers and are committed in some non-deterministic order (as there is 
> no ordering constraints or ways to control this in the API) database 
> integrity constraints are likely to be violated. So while the "commit" to the 
> Cache may seem to be atomic, the "stores" to the underlying Cache Writers 
> aren't.
> 
> Essentially there are a few options (as I've covered in the issue).
> 
> 1. Allow a global transaction to be provided to all of the Cache Writers.   
> Wow... that would be pretty crazy and horribly slow.   Every server would 
> need to contact the transaction manager, do a bunch of work, etc, just to set 
> things up.   
> 
> This sort of contradicts the entire reason people would be using a cache in 
> the first place.   To even achieve this I think we'd need to change the 
> CacheLoader/Writer API.  Specifically we'd need to add "prepare", "commit" 
> and "rollback".
> 
> 2. Don't allow CacheLoaders/Writers to be configured with Caches.   I think 
> this is pretty easy to do, but again, wow... that would force developers to 
> change their application code significantly to use Transactional Caches with 
> external stores.
> 
> 3. Only allow "local" transactions to be performed.   This would ultimately 
> mean that Caches would be the last-local-resource in XA transactions (not too 
> bad, though it's a challenge if there are others as well).   Additionally in 
> the distributed case, while entries may be distributed, the loading / writing 
> would always occur locally.   This works, but significantly reduces 
> scalability as all "versioning" of data being touched may need to be held 
> locally.  It's highly likely a huge amount of distributed locks would be 
> required (if the Cache isn't using MVVC), which we know is horribly slow.  
> eg: imagine a transaction with a "putAll" containing a few million entries.  
> In pessimistic mode, an implementation may need to do a lot of work locally 
> to ensure versioning is held and updated correctly.  It may also need to 
> perform a few million locks!   Saying that a developer shouldn't use "putAll" 
> with transactions probably isn't a solution either.
> 
> Personally I'm not sure if any of this is desirable?   I haven't really seen 
> much of this discussed or addressed.   Perhaps I'm missing something?   I'd 
> certainly be happy to do some further research!
> 
> The bottom line is that while we're trying to define an API that provides 
> developers with a means to improve the performance, through-put and 
> scalability of an application through the temporary storage of data, the 
> requirements to implement transactions, even optionally, may throw much of 
> the benefit away.
> 
> It would be great to get your thoughts on this.   I don't think we can get 
> away with the statement "transactions are implementation specific" in the 
> specification, especially if the API doesn't provide enough fidelity to cover 
> these simple use-cases.
> 
> -- Brian
> 
> 
> 
> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "jsr107" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to jsr107+un...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>  
>  
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "jsr107" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to jsr107+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>  
>  


--
Galder Zamarreño
gal...@redhat.com
twitter.com/galderz

Project Lead, Escalante
http://escalante.io

Engineer, Infinispan
http://infinispan.org

_______________________________________________
infinispan-dev mailing list
infinispan-dev@lists.jboss.org
https://lists.jboss.org/mailman/listinfo/infinispan-dev

Reply via email to