> From: David F. Skoll [mailto:d...@roaringpenguin.com] > > On Wed, 3 Aug 2011 16:46:23 +0200 > giampa...@tomassoni.biz wrote: > > > I think your suggestion is an overkill. > > Not in my experience. > > > Iff I'm right about the problem being on transaction competition, it > > can be solved by a much easier solution. The way bayes uses and > > updates tokens data would allow reading in a transaction and writing > > in a new one: you always increase tokens occurrences by a value > > which doesn't depend on the value they had at read time. > > Concurrent updates of the same row alway cause contention. Believe me, > we know this through bitter experience and not theory. That's why we > went to all the trouble of ensuring there'd only ever be one writer.
The problem is possibly this: /--------------------- | <-- transaction starts here | Bayes reads some token rows (say ~1ms) |.... | | Bayes and others checks | (say ~6s) | |--------------------- | Classification completed. |.... | | Bayes updates/inserts token occurrences | (say ~5ms) | <-- transaction ends here |.... | | cleanup, results reporting and the like | \--------------------- In case this is a good approximation of the transaction boundaries, you may see it lasts ~6s, during which all other running instances of SA are possibly waiting for the first one to complete. Instead, if one rewrites the boundaries this way: /--------------------- | <-- ro transaction starts here | Bayes reads some token rows (say ~1ms) | <-- ro transaction ends here |.... | | Bayes and others checks | (say ~6s) | |--------------------- | Classification completed. |.... | | <-- rw transaction starts here | Bayes updates/inserts token occurrences | (say ~5ms) | <-- rw transaction ends here |.... | | cleanup, results reporting and the like | \--------------------- you may see that the contention per scan lasts at most ~5ms, which is ~1/1000 of the above one. So, if this is the case I don't really see any need to involve any MVCC logic or database cloning. And I don't think this would be theory. It is a matter of facts, however, that I don't know if the transaction boundaries in Mail::SpamAssassin::BayesStore::MySql are like the ones I depicted in the previous figure. I don't know exactly the timing of the various bayes operators in SA. Giampaolo > > Regards, > > David.