> From: David F. Skoll [mailto:d...@roaringpenguin.com]
> 
> On Wed, 3 Aug 2011 16:46:23 +0200
> giampa...@tomassoni.biz wrote:
> 
> > I think your suggestion is an overkill.
> 
> Not in my experience.
> 
> > Iff I'm right about the problem being on transaction competition, it
> > can be solved by a much easier solution. The way bayes uses and
> > updates tokens data would allow reading in a transaction and writing
> > in a new one: you always increase tokens occurrences by a value
> > which doesn't depend on the value they had at read time.
> 
> Concurrent updates of the same row alway cause contention.  Believe me,
> we know this through bitter experience and not theory.  That's why we
> went to all the trouble of ensuring there'd only ever be one writer.

The problem is possibly this:

/---------------------
| <-- transaction starts here
| Bayes reads some token rows (say ~1ms)
|....
|
|       Bayes and others checks
|       (say ~6s)
|
|---------------------
| Classification completed.
|....
|
| Bayes updates/inserts token occurrences
| (say ~5ms)
| <-- transaction ends here
|....
|
| cleanup, results reporting and the like
|
\---------------------


In case this is a good approximation of the transaction boundaries, you may
see it lasts ~6s, during which all other running instances of SA are
possibly waiting for the first one to complete.

Instead, if one rewrites the boundaries this way:

/---------------------
| <-- ro transaction starts here
| Bayes reads some token rows (say ~1ms)
| <-- ro transaction ends here
|....
|
|       Bayes and others checks
|       (say ~6s)
|
|---------------------
| Classification completed.
|....
|
| <-- rw transaction starts here
| Bayes updates/inserts token occurrences
| (say ~5ms)
| <-- rw transaction ends here
|....
|
| cleanup, results reporting and the like
|
\---------------------


you may see that the contention per scan lasts at most ~5ms, which is
~1/1000 of the above one.

So, if this is the case I don't really see any need to involve any MVCC
logic or database cloning. And I don't think this would be theory.

It is a matter of facts, however, that I don't know if the transaction
boundaries in Mail::SpamAssassin::BayesStore::MySql are like the ones I
depicted in the previous figure. I don't know exactly the timing of the
various bayes operators in SA.

Giampaolo

> 
> Regards,
> 
> David.

Reply via email to