Gre7g Luterman <[EMAIL PROTECTED]> writes:

> On 19 Jun 2003 03:31:16 -0500, Tim Legant
> <[EMAIL PROTECTED]> wrote:
> 
> > A final note: I made a deliberate decision to pull the list of
> > addresses (black or white or whatever) back from the database into
> > TMDA.  This allows us to use the documented wildcard support that TMDA
> > provides, which is not provided by any database.  Regular expressions
> > were a possibility, but the problem is that the syntax for using them
> > is different across databases; e.g MySQL uses "RLIKE", PostgreSQL uses
> > "~", etc.
> 
> <advocate type=devil>

<grin>

> Doesn't that defeat the purpose of using a database?  One of the biggest
> advantages an SQL has over a flat file in this case is speed.

I have a hard time with this assertion.  I find it difficult to
believe that reading a 3K or 4K file from a local disk is slower than
reading the exact same data from a database, given the protocol
overhead and the possibility of going over a wire.

Also, I would opine that the biggest advantage of using a database is
centralized management, not speed.  When I made this decision, I based
it on these observations:

1) We already document the email wildcards that TMDA uses.  No
   database that I'm aware of supports that particular syntax, since
   it's based on shell wildcarding with a couple of additions.

2) As I mentioned in my previous mail, the syntax for using regular
   expressions is different cross-database.  This can be addressed by
   the answer to Lloyd's question.  See below.  To get that solution
   into 1.0 is unlikely, though.

3) Regular expression dialects themselves are different
   cross-database.  There are two ways I can think of to solve this.

   First, we can have separate from-/to-* rules for each database;
   from-mysql, from-postgres, from-oracle, from-sybase....  Blah.  No
   way do I want that kind of mess in the filter language.

   Second, we can have the users write the regular expressions in a
   standard dialect, say Python's, and translate them to the dialect
   of the database in use.  It's possible, but would require an NFA to
   parse the Python REs and a backend per database to translate from
   the NFA data structure into the appropriate dialect.  This is so
   wildly beyond the scope of TMDA that I wouldn't even consider it.

Lloyd had asked what my thoughts were going forward.  As one wag
noted, the solution to any problem in computer science is another
layer of abstraction. <wink>

I'm thinking of having a TMDA database interface and descendant
classes for each supported database.  The operations in this interface
are not so much database operations as operations that TMDA needs, at
a higher level than executing SQL and fetching rows.  Using the DB API
in those classes means that most behavior will be the same
cross-database and therefore implemented in the base class.  Oddities
per database can be handled in the descendant.  That makes issue 2,
above, go away.

It also allows us to have a file-system "database" class, so that the
base code calls something like database.append_confirm_address() and
if the file-system class is in play, we write to a local file.  If the
MySQL class is being used, we update a table.

The final database problem is configuration information (/etc/tmdarc,
<user>/.tmda/config), which would fully realize the value of a
database in conjunction with TMDA.  I thought about this last summer
and got exactly nowhere.  I didn't like any of the solutions I came up
with.  It needs more thinking.  It's another feature that I believe
won't make it into 1.0.

> Pulling the entire database into Python negates this.

This is a bit disingenuous.  The whole point of a SELECT statement is
to avoid pulling the whole database in.  What comes back is a specific
list, unique to a particular user.  My whitelist, with lots of
comments, is less than 4K.  Three other lists I use total less than 1K
(including minimal comments).  5K of data coming over the wire is
trivial, especially since it wouldn't even be 5K (comments).

I thought a lot about this and given the consistency with other email
address processing in the filter and the difficulties of making the
matching code work across databases I decided to go with the solution
I've outlined rather than trying to save microseconds by doing the
search in the database.

As a side note, I considered writing a stored procedure for each
database that we want to support (or asking volunteers to do so for
their favorite database) that would implement the TMDA style of
wildcards, but the versions of MySQL in common use don't support
stored procedures.  That seems to be the common phrase in database
discussions.  "We could do X, but MySQL doesn't support it."  Oh well,
eventually we'll all be using version 4 or later.


Tim
_________________________________________________
tmda-workers mailing list ([EMAIL PROTECTED])
http://tmda.net/lists/listinfo/tmda-workers

Reply via email to