> I'm looking to setup SpamBayes for a small network of users, > about 25, who are all using Outlook 2002. I would ideally like > the spam database to be shared across by everyone,
Is there a reason for that? The very strength of filters like SpamBayes is that they are trained to the individual. It would also mean that installation was simple. [...] > It looks like I could set up each client to look at a shared > drive on a file server, and then just make sure that their > profile name was unique (which is all easy enough). It seems > to be working ok for the two test machines I have sharing the > database. I created the initial database from from a large > sample (about 400 spam and 400 good messages), and now each is > able to add new spam or good messages to the filter as they go. Basically, you're pointing both instances of the Outlook plug-in at the same database file, correct? [...] > That's all fine, but I want to be sure if I will run into any > trouble doing this on a larger scale, with all 25 users. Yes, you will run into a lot of trouble. You will find that the database gets regularly corrupted (you'll find this will the two users, as well, but it may take a bit longer). SpamBayes doesn't have any support for concurrent access to the database, which is what you are after here. There are various ways you could achieve this, but none are easy: 1. You could leave the plug-in running as it does by default, with individual databases, and create a script that synchronises them at some point (e.g. overnight) when they are not being used. 2. IIRC, some of the experimental database backends (mysql (1.0+), postgresql (1.0+), ZODB/ZEO (1.1a1+)) can manage concurrent access. However, you'd have to add additional code to handle this (with ZEO, at least; I don't really know what the situation with the SQL ones are) - basically if concurrent access is attempted, it will raise an exception, which you can catch, wait, and try again later. 3. You could leave the plug-in running as it does by default, with individual databases, and have a script that creates a fresh database from messages in certain folders (easy with spam, hard with ham) overnight, and replace the individual databases with it. 4. You could do the filtering server-side somehow (e.g. <http://spambayes.org/server_side.html>), although I'm not sure how you would work in the bit about the users doing their own training. 5. I think (but am not 100%) that if you used 1.1a1 (or ran from source) and used a pickle for storage, that what would happen is that the database wouldn't get corrupted, but you'd lose training data. Each time the database was saved to disk, it would become the valid copy, but the other instances of SpamBayes wouldn't load that until they were restarted, so if one of those then saved the database, the information would be replaced. Essentially, unless you really do have good reason for wanting a shared database (or have the resources to implement something like one of the above), it would be best to leave the plug-in working with individual databases. =Tony.Meyer -- Please always include the list (spambayes at python.org) in your replies (reply-all), and please don't send me personal mail about SpamBayes. http://www.massey.ac.nz/~tameyer/writing/reply_all.html explains this. _______________________________________________ [email protected] http://mail.python.org/mailman/listinfo/spambayes Check the FAQ before asking: http://spambayes.sf.net/faq.html
