On 30 August 2012 19:19, Shane Hathaway <sh...@hathawaymix.org> wrote: > On 08/30/2012 10:14 AM, Marius Gedminas wrote: >> >> On Wed, Aug 29, 2012 at 06:30:50AM -0400, Jim Fulton wrote: >>> >>> On Wed, Aug 29, 2012 at 2:29 AM, Marius Gedminas <mar...@gedmin.as> >>> wrote: >>>> >>>> On Tue, Aug 28, 2012 at 06:31:05PM +0200, Vincent Pelletier wrote: >>>>> >>>>> On Tue, 28 Aug 2012 16:31:20 +0200, >>>>> Martijn Pieters <m...@zopatista.com> wrote : >>>>>> >>>>>> Anything else different? Did you make any performance comparisons >>>>>> between RelStorage and NEO? >>>>> >>>>> >>>>> I believe the main difference compared to all other ZODB Storage >>>>> implementation is the finer-grained locking scheme: in all storage >>>>> implementations I know, there is a database-level lock during the >>>>> entire second phase of 2PC, whereas in NEO transactions are serialised >>>>> only when they alter a common set of objects. >>>> >>>> >>>> This could be a compelling point. I've seen deadlocks in an app that >>>> tried to use both ZEO and PostgreSQL via the Storm ORM. (The thread >>>> holding the ZEO commit lock was blocked waiting for the PostgreSQL >>>> commit to finish, while the PostgreSQL server was waiting for some other >>>> transaction to either commit or abort -- and that other transaction >>>> couldn't proceed because it was waiting for the ZEO lock.) >>> >>> >>> This sounds like an application/transaction configuration problem. >> >> >> *shrug* >> >> Here's the code to reproduce it: http://pastie.org/4617132 >> >>> To avoid this sort of deadlock, you need to always commit in a >>> a consistent order. You also need to configure ZEO (or NEO) >>> to time-out transactions that take too long to finish the second phase. >> >> >> The deadlock happens in tpc_begin() in both threads, which is the first >> phase, AFAIU. >> >> AFAICS Thread #2 first performs tpc_begin() for ClientStorage and takes >> the ZEO commit lock. Then it enters tpc_begin() for Storm's >> StoreDataManager and blocks waiting for a response from PostgreSQL -- >> which is delayed because the PostgreSQL server is waiting to see if >> the other thread, Thread #1, will commit or abort _its_ transaction, which >> is conflicting with the one from Thread #2. >> >> Meanwhile Thread #1 is blocked in ZODB's tpc_begin(), trying to acquire >> the >> ZEO commit lock held by Thread #2. > > > So thread 1 acquires in this order: > > 1. PostgreSQL > 2. ZEO > > Thread 2 acquires in this order: > > 1. ZEO > 2. PostgreSQL > > SQL databases handle deadlocks by detecting and automatically rolling back > transactions, while the "transaction" package expects all data managers to > completely avoid deadlocks using the sortKey method. > > I haven't looked at the code, but I imagine Storm's StoreDataManager > implements IDataManager. I wonder if StoreDataManager provides a consistent > sortKey. The sortKey method must return a string (not an integer or other > object) that is consistent yet different from all other participating data > managers.
Storm's DataManager defines sortKey as: def sortKey(self): # Stores in TPC mode should be the last to be committed, this makes # it possible to have TPC behavior when there's only a single store # not in TPC mode. if self._store._tpc: prefix = "zz" else: prefix = "aa" return "%s_store_%d" % (prefix, id(self)) http://bazaar.launchpad.net/~storm/storm/trunk/view/head:/storm/zope/zstorm.py#L320 (By default self._store._tpc is set to False.) This is essentially similar to zope.sqlalchemy's, the single phase variant being: 105 def sortKey(self): 106 # Try to sort last, so that we vote last - we may commit in tpc_vote(), 107 # which allows Zope to roll back its transaction if the RDBMS 108 # threw a conflict error. 109 return "~sqlalchemy:%d" % id(self.tx) http://zope3.pov.lt/trac/browser/zope.sqlalchemy/trunk/src/zope/sqlalchemy/datamanager.py#L105 (The TPC variant simply omits the leading tilde as it is not required to sort last - zope.sqlalchemy commits in tpc_vote() rather than tpc_finish() when using single phase commit.) ZEO's sortKey is: 698 def sortKey(self): 699 # If the client isn't connected to anything, it can't have a 700 # valid sortKey(). Raise an error to stop the transaction early. 701 if self._server_addr is None: 702 raise ClientDisconnected 703 else: 704 return '%s:%s' % (self._storage, self._server_addr) http://zope3.pov.lt/trac/browser/ZODB/trunk/src/ZEO/ClientStorage.py#L698 (self._storage defaults to the string '1'.) This should mean that ZEO always gets a sortKey like '1:./zeosock' in the example given whereas Storm gets a sortKey like 'aa_storm_12345' (though the final number will vary per transaction.) Which should mean a consistent sort order and ZEO always committing first. It seems StormDataManager only commits in tpc_finish, doing nothing in either of commit() or tpc_vote() stages when in 1PC mode. As ZEO sorts first a failure to commit by Storm could never abort the ZEO server's transaction. Postgres can raise the equivalent of a ConflictError too. That situation would lead to transaction logging a critical error and raising the exception. The ZEO transaction would already have finished by then Laurence p.s. It would be interesting to try a zope.sqlalchemy equivalent of the storm deadlock maker to see whether it holds up any better. _______________________________________________ For more information about ZODB, see http://zodb.org/ ZODB-Dev mailing list - ZODB-Dev@zope.org https://mail.zope.org/mailman/listinfo/zodb-dev