Gre7g Luterman <[EMAIL PROTECTED]> writes: > On 19 Jun 2003 03:31:16 -0500, Tim Legant > <[EMAIL PROTECTED]> wrote: > > > A final note: I made a deliberate decision to pull the list of > > addresses (black or white or whatever) back from the database into > > TMDA. This allows us to use the documented wildcard support that TMDA > > provides, which is not provided by any database. Regular expressions > > were a possibility, but the problem is that the syntax for using them > > is different across databases; e.g MySQL uses "RLIKE", PostgreSQL uses > > "~", etc. > > <advocate type=devil>
<grin> > Doesn't that defeat the purpose of using a database? One of the biggest > advantages an SQL has over a flat file in this case is speed. I have a hard time with this assertion. I find it difficult to believe that reading a 3K or 4K file from a local disk is slower than reading the exact same data from a database, given the protocol overhead and the possibility of going over a wire. Also, I would opine that the biggest advantage of using a database is centralized management, not speed. When I made this decision, I based it on these observations: 1) We already document the email wildcards that TMDA uses. No database that I'm aware of supports that particular syntax, since it's based on shell wildcarding with a couple of additions. 2) As I mentioned in my previous mail, the syntax for using regular expressions is different cross-database. This can be addressed by the answer to Lloyd's question. See below. To get that solution into 1.0 is unlikely, though. 3) Regular expression dialects themselves are different cross-database. There are two ways I can think of to solve this. First, we can have separate from-/to-* rules for each database; from-mysql, from-postgres, from-oracle, from-sybase.... Blah. No way do I want that kind of mess in the filter language. Second, we can have the users write the regular expressions in a standard dialect, say Python's, and translate them to the dialect of the database in use. It's possible, but would require an NFA to parse the Python REs and a backend per database to translate from the NFA data structure into the appropriate dialect. This is so wildly beyond the scope of TMDA that I wouldn't even consider it. Lloyd had asked what my thoughts were going forward. As one wag noted, the solution to any problem in computer science is another layer of abstraction. <wink> I'm thinking of having a TMDA database interface and descendant classes for each supported database. The operations in this interface are not so much database operations as operations that TMDA needs, at a higher level than executing SQL and fetching rows. Using the DB API in those classes means that most behavior will be the same cross-database and therefore implemented in the base class. Oddities per database can be handled in the descendant. That makes issue 2, above, go away. It also allows us to have a file-system "database" class, so that the base code calls something like database.append_confirm_address() and if the file-system class is in play, we write to a local file. If the MySQL class is being used, we update a table. The final database problem is configuration information (/etc/tmdarc, <user>/.tmda/config), which would fully realize the value of a database in conjunction with TMDA. I thought about this last summer and got exactly nowhere. I didn't like any of the solutions I came up with. It needs more thinking. It's another feature that I believe won't make it into 1.0. > Pulling the entire database into Python negates this. This is a bit disingenuous. The whole point of a SELECT statement is to avoid pulling the whole database in. What comes back is a specific list, unique to a particular user. My whitelist, with lots of comments, is less than 4K. Three other lists I use total less than 1K (including minimal comments). 5K of data coming over the wire is trivial, especially since it wouldn't even be 5K (comments). I thought a lot about this and given the consistency with other email address processing in the filter and the difficulties of making the matching code work across databases I decided to go with the solution I've outlined rather than trying to save microseconds by doing the search in the database. As a side note, I considered writing a stored procedure for each database that we want to support (or asking volunteers to do so for their favorite database) that would implement the TMDA style of wildcards, but the versions of MySQL in common use don't support stored procedures. That seems to be the common phrase in database discussions. "We could do X, but MySQL doesn't support it." Oh well, eventually we'll all be using version 4 or later. Tim _________________________________________________ tmda-workers mailing list ([EMAIL PROTECTED]) http://tmda.net/lists/listinfo/tmda-workers
