Feature Requests item #859339, was opened at 2003-12-13 17:45 Message generated for change (Comment added) made by anadelonbrin You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=859339&group_id=61702
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Priority: 5 Submitted By: Vladimir Ulogov (vulogov) Assigned to: Nobody/Anonymous (nobody) Summary: Add sqlite storage option Initial Comment: In addition to the BerkleyDB, I'd like to use sqlite as well ---------------------------------------------------------------------- >Comment By: Tony Meyer (anadelonbrin) Date: 2005-12-06 10:29 Message: Logged In: YES user_id=552329 If anyone is, two things to consider would be: 1. Looking at the way the DBClassifier works and copying some ideas from there. IIRC it caches non-hapax tokens and so tries to minimise actually accessing the db. 2. Using sqlite for the token database (hammie.db) and something else for the messageinfo database. The messageinfo db gets written a lot more often (once per message train/classify). pickle also does poorly here at the moment, and dbm is the one that gave us all the trouble, so I'm not sure what to suggest, though. ---------------------------------------------------------------------- Comment By: Kenny Pitt (kpitt) Date: 2005-12-06 04:04 Message: Logged In: YES user_id=859086 I actually had a mostly working SQLite storage class back when SQLite 2.x was current, but the performance was so abysmal that I didn't go any further with it. I never got around to digging it out and testing it with the 3.x version of SQLite, which is supposed to have better performance. SQLite generally performs pretty well for reads and for writes that are batched together in a transaction. Unfortunately, the current SpamBayes database access includes a fairly large number of writes (especially when training, although it also tracks statistics on every message received), and the writes are generally committed after each change rather than batched until the end of a large operation. This mode of access is pretty much the worst case scenario for SQLite performance. If anyone is interested in doing any more work with this, I'll see if I can locate my old code and post it as a patch. ---------------------------------------------------------------------- Comment By: Tony Meyer (anadelonbrin) Date: 2005-12-05 22:11 Message: Logged In: YES user_id=552329 Note also that there have been a few people on the mailing list that have mentioned intending to do this. You could try seeing if any have, and if they'd be willing to contribute the code either to you or back to the project. ---------------------------------------------------------------------- Comment By: Tony Meyer (anadelonbrin) Date: 2004-07-16 13:40 Message: Logged In: YES user_id=552329 This is a feature request, not a patch, so changing type. ---------------------------------------------------------------------- Comment By: Tony Meyer (anadelonbrin) Date: 2004-02-05 21:45 Message: Logged In: YES user_id=552329 Note that you can use mysql or postgresql. Are either of those good enough? If not, then you could maybe write your own SQLiteClassifier class, based on the other SQL ones in storage.py. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=498106&aid=859339&group_id=61702 _______________________________________________ Spambayes-bugs mailing list Spambayes-bugs@python.org http://mail.python.org/mailman/listinfo/spambayes-bugs