Re: Rewritten URIBL plugin

2010-07-29 Thread Robert Spier
  So if you're interested in the parsed mime functionality, your
  plugin can plugin_use util/parsed-mime and the right magic happens.
 
 Oh yeah that's right, someone *did* implement what you're talking about. 
 You can do it with 'plugin inheritance' (which ironically i knew nothing
 about until looking at the async uribl plugin the other night):

You do know who that someone was, right? ;)

snip

 Voila!  A lazy 'plugin model' that is *also* and API, which doesn't 'use
 MIME::Parser' until you *want* to use it; furthermore, you don't have to
 just put the official mime_parser plugin in there, with a little
 modification (or alternatively some care to use the same filename within
 an alternate directory) you could use your home-rolled
 messier-but-more-efficient version that doesn't use MIME::Parser at
 all.

Nice.  That does work out to be a nice bit of code reuse.  The
syntactic sugar is.. well..  lacking, but that's ok.  It's
understandable!

 I could easily modify my own stuff to do this and test it.  I'd even be
 interested in pulling a couple of our own home-grown MIME recursion
 methods into it.  If this is indeed the will of the council ;)

I like it.  But I'm just one person.

 p.s. I'm kind of stoke about plugin inheritance these days.  I'm becoming
 convinced that after some code churn on our side it will allow us to
 finally switch to async without having to do just rewrite all our plugins,
 switch to the async daemon, and see what happens.  And with a little churn
 on the upstream side, if that manages to happen, I also think it could
 allow us to un-fork 90% of the QP plugin code we currently have re-written
 and instead submit patches to QP that have some guarantee to be tested and
 aren't surrounded by completely useless context :)

Nice.

-R


Re: Rewritten URIBL plugin

2010-07-29 Thread Matt Sergeant

Jared Johnson wrote:

sub parse_mime {


That works, only this should be called parsed_mime because you're 
asking for the parsed bit, not telling it to parse (every time).


Matt.


Re: Rewritten URIBL plugin

2010-07-28 Thread Jared Johnson


 Jared Johnson wrote:

  I think we should probably consider putting support for parsed
 messages
  into core, with the parsing done lazily if requested by the API.

 I forgot, we did kinda think of a couple of reasons not to want an API.
 depending on where you put it, you may find QP in general depending on
 MIME::Parser even if it never uses it.  The benefit of the plugin is
 that
 /etc/qpsmtpd/plugins controls whether you wind up having to install, and
 take up your memory with, MIME::Parser

 One thing we've discussed in the past (at least in my imagination)
 although not quite figured out how to implement, is making plugins act
 a little more like normal modules, so that one plugin can use
 another.

 So if you're interested in the parsed mime functionality, your
 plugin can plugin_use util/parsed-mime and the right magic happens.

Oh yeah that's right, someone *did* implement what you're talking about. 
You can do it with 'plugin inheritance' (which ironically i knew nothing
about until looking at the async uribl plugin the other night):

# /usr/share/qpsmtpd/plugins/mime_parser
use MIME::Parser;
sub parse_mime {
my ( $self, $txn ) = @_;
return $txn-notes('mime_body') if $txn-notes('mime_body');
 a bunch of code to create a $msg ...
$txn-notes( mime_body = $msg );
return $txn-notes('mime_body');
}

# /usr/share/qpstmpd/plugins/some_plugin_that_might_want_mime_parsing
sub init {
my ( $self, $qp, %args ) = @_;
return unless $qp-config('parse_mime') or $args{parse_mime};
$self-isa('mime_parser');
$self-{can_parse_mime} = 1;
}

sub hook_data_post {
my ( $self, $txn ) = @_;
if ( $self-{can_parse_mime} ) {
$self-some_recursive_mime_function( $self-parse_mime );
} else {
$self-regular_body_getline_function( $txn );
}
}


Voila!  A lazy 'plugin model' that is *also* and API, which doesn't 'use
MIME::Parser' until you *want* to use it; furthermore, you don't have to
just put the official mime_parser plugin in there, with a little
modification (or alternatively some care to use the same filename within
an alternate directory) you could use your home-rolled
messier-but-more-efficient version that doesn't use MIME::Parser at all.

I could easily modify my own stuff to do this and test it.  I'd even be
interested in pulling a couple of our own home-grown MIME recursion
methods into it.  If this is indeed the will of the council ;)

p.s. I'm kind of stoke about plugin inheritance these days.  I'm becoming
convinced that after some code churn on our side it will allow us to
finally switch to async without having to do just rewrite all our plugins,
switch to the async daemon, and see what happens.  And with a little churn
on the upstream side, if that manages to happen, I also think it could
allow us to un-fork 90% of the QP plugin code we currently have re-written
and instead submit patches to QP that have some guarantee to be tested and
aren't surrounded by completely useless context :)

-Jared



Re: Rewritten URIBL plugin

2010-07-27 Thread Matt Sergeant

On Sun, 25 Jul 2010, Jared Johnson wrote:


The plugin has the following advantages over the original:

- Uses MIME::Parser to unpack message text so that we can look for URI's
in base64-encoded data, etc., and _not_ look for URI's in noise.


I think we should probably consider putting support for parsed messages 
into core, with the parsing done lazily if requested by the API.


Thoughts?

Matt.


Re: Rewritten URIBL plugin

2010-07-27 Thread Jared Johnson
 I think we should probably consider putting support for parsed messages
 into core, with the parsing done lazily if requested by the API.

It's a good idea, it's only that I haven't yet had the time and
justification to write and to test a move from the plugin model (see the
attached plugin in my newer post) to an internal API model.  The plugin I
attached represents a trivially-edited version of a plugin we've been
testing in production for quite some time.  It probably wouldn't be *that*
difficult or risky, though, to take the very same code and put it into the
lazy API of your choice.  And I think it'd be great, even though we
wouldn't actually benefit from the laziness ourselves ;)

If you don't hate storing the resulting parsed mime in
$txn-notes('mime_body'), though, that would rock, since then it wouldn't
break our existing plugin-based forked code ;)

-Jared



Re: Rewritten URIBL plugin

2010-07-27 Thread Jared Johnson
 I think we should probably consider putting support for parsed messages
 into core, with the parsing done lazily if requested by the API.

I forgot, we did kinda think of a couple of reasons not to want an API. 
depending on where you put it, you may find QP in general depending on
MIME::Parser even if it never uses it.  The benefit of the plugin is that
/etc/qpsmtpd/plugins controls whether you wind up having to install, and
take up your memory with, MIME::Parser

You could conditionally 'require', although that does do different things
to memory management that I don't understand

Another thought, there are different reasons people might want mime to be
parsed that benefit from different implementations.  If you only want the
text data, maybe you want to optimize for that somehow and throw away the
binary attachment data, or vice versa (unfortunately there doesn't seem to
be a way to do this with MIME::Parser that i've found though...).  If you
only want the text data *and* you don't mind having it not perfectly
reconstructed (like the URIBL) plugin, then perhaps you could come up with
a way to parse without MIME::Parser that is more efficient and more lazy
(not in the sense you were suggesting earlier :P)

-Jared



Re: Rewritten URIBL plugin

2010-07-27 Thread Robert Spier


Jared Johnson wrote:
 
  I think we should probably consider putting support for parsed messages
  into core, with the parsing done lazily if requested by the API.
 
 I forgot, we did kinda think of a couple of reasons not to want an API. 
 depending on where you put it, you may find QP in general depending on
 MIME::Parser even if it never uses it.  The benefit of the plugin is that
 /etc/qpsmtpd/plugins controls whether you wind up having to install, and
 take up your memory with, MIME::Parser

One thing we've discussed in the past (at least in my imagination)
although not quite figured out how to implement, is making plugins act
a little more like normal modules, so that one plugin can use
another.

So if you're interested in the parsed mime functionality, your
plugin can plugin_use util/parsed-mime and the right magic happens.

-R


Re: Rewritten URIBL plugin

2010-07-26 Thread Jared Johnson
 - Introduces support for URIBL services that may not have worked right,
 at
 least out of the box, before.  Defines the subtle differences between
 various known URIBL services in order to maximize compatibility.

 Is it worth pulling some of this config out of the code and putting it
 into some sort of config file?

Probably would be worth it, and not that difficult to do.  Any new config
scheme should probably be both backward compatible and non-spaghetti.  The
current scheme looks like:

multi.uribl.com 2,4 deny

the new plugin does add one additional feature, you can now do:

multi.uribl.com 2 deny
multi.uribl.com 4 log

but $zones defines a bunch of other things - labels for the zone in
general and specific masks, check_ip, check_host,
host_wildcards_supported, lookup_a_record, lookup_ns_record.  Ignoring the
label issue, all the other directives are related only to the zone;
perhaps we could add a new scheme on top of the mask definition scheme
(which we would continue to read in the old way), getting even further
away from the one-line-per-zone style:

multi.uribl.com action deny
multi.uribl.com check_ip 1
multi.uribl.com check_host 1
multi.uribl.com lookup_a_record 0
multi.uribl.com lookup_ns_record 0
multi.uribl.com host_wildcards_support 0

If nobody hates this scheme it does seem like something I could add pretty
easily send an updated copy

 - Uses Net::DNS::Async to simplify the code, and also to ensure the
 afore-mentioned A and NS lookups will prompt new URIBL lookups in an
 efficient and simple manner via callbacks

 Does the code still work with the async qpsmtpd cores?

Well... I didn't change the existing plugin a whole lot; I did change from
data_handler() registered via register(), to plain old hook_data_post();
this may need changed considering that the comments above each sub (which
I also removed) suggested that hook_data_post() and register() were 'not
used' for async.  These things could easily be changed if needed.  However
we don't run async anywhere, not even in test environments, so I'm not
very qualified to know if they need changed, or give any guarantees.  And
I'm confused by that bit.  From what I can tell, the async/uribl plugin
ignores plugins/uribl entirely and uses
Qpsmtpd::Plugins::Async::DNSBLBase, which in turn uses ParaDNS.

Now I noticed the async rhsbl plugin used Q::P::A::D and that this used
ParaDNS before and looked at it briefly to see if I could use it for my
purposes, but at the top of the docs it's mentioned that only A lookups
are supported, so I skipped it and went to Net::DNS::Async.  I don't
really know the strongpoints of N::D::A vs. ParaDNS but it did note that
the latter more expressly claims to be interested in performance and
scalability.  So in conclusion:

(1) I think I'll look into using ParaDNS and maybe even
Q::P::Async::DNSBLBase if appropriate, even though my plugin is not in
async; it might still have advantages, and the interesting logic is in the
callbacks and not the module use anyway
(2) I don't know if it matters whether this particular plugin works with
async, I guess it depends on whether the old one really ever did.  At
least Net::DNS::Async should not make it any more blocking than the old
plugin's collect_results().  At any rate somebody who plays in async
territory should probably comment before any action is taken on the new
plugin :)

 Attached also is tld_lists.pl, a companion file that needs to be dropped
 in lib/Qpsmtpd/ which provides the list of first, second, and third
 level
 TLDs that we care about.  It's derived from our URIBL datafeed as well
 as

 Do the owners of that data care about it being used this way?  You may
 be violating any agreement with them.  Would they be ok if this was
 released as an independent CPAN module?Either way, can we
 structure this as an API instead of just an include file?

(1) I suspect that anyone who runs a URIBL service would be thrilled with
this information being available for this particular use, because it
minimizes unnecessary queries to their services.  Consider the old method
that sent 2 or 3 queries instead of one because it didn't know which one
was right.  The fact that SURBL and URIBL both publish the two and three
level lists and encourage their use is evidence of this :)

(2) First-level TLDs are public information, of course, and with the
two-level and three-level lists being public, the interesting business is
that the first-level list and the SURBL second and third level lists are
pruned based on URIBL's data.  So it comes down to whether anyone can be
considered to have a claim on the privilege of knowing how to *exclude*
results.  I guess maybe they could if it came down to it...

(3) I'm pretty sure I could get URIBL's explicit permission, but it would
take a long time.  The founder and sole owner of URIBL was also the
original author of DoubleCheck, so I feel we would have a bit of an in :) 
However even so he always takes forever to get back to us in

Re: Rewritten URIBL plugin

2010-07-26 Thread Jared Johnson
 Do the owners of that data care about it being used this way?  You may
 be violating any agreement with them.  Would they be ok if this was
 released as an independent CPAN module?Either way, can we
 structure this as an API instead of just an include file?

Forgot to mention, yes, I'd be happy to add the few lines it would take to
make this an API, even if it's just in lib/Qpsmtpd.  It might be nice for
us to get datafeeds with other services (we're looking to at least get one
from DBL, possibly SURBL and imvURI) before publishing to CPAN, mainly so
that we know for sure that we have not over-pruned the first level or
SURBL second  third level lists.  As they are though, I consider the
lists good enough to warrant production use of this method over the
current method in the existing QP uribl plugin.  After all any TLD that
manages to make it onto other services but has garnered *zero* hits on
URIBL, probably has very very few entries on those other services. 
Anyway, if it's a bit of a licensing/copyright problem to use the data
from URIBL, it'll be more of one to use the data from other services as
well, but we can probably work that question in from the get-go

-Jared



Re: Rewritten URIBL plugin

2010-07-26 Thread Devin Carraway
On Sun, Jul 25, 2010 at 04:43:36AM -0500, Jared Johnson wrote:
 The plugin has the following advantages over the original:

Based on a single read-through of the code, I like most of it.  Some
assorted observations, though:

- Parsing out the MIME sections is worthwhile, though it's really something
  that ought to be done in the core libs when plugins indicate they need it,
  or in a utility plugin if that's practical.  This was something I meant
  to bring up a while back -- we have quite a few different content-scanning
  plugins, and if they're all obliged to invoke MIME::Parser repeatedly it'll
  be even more costly than that module already is.  Not that I have time to
  do it myself, though.  :P
- For messages below a few hundred kb or so, consider
  MIME::Parser-output_to_core(), so we can avoid the fs churn.  Annoyingly,
  MIME::Parser doesn't provide a mechanism to only consider particular
  mimetypes, or we could skip decoding non-text bodies.
- It looks like you're scanning inside all parts of multipart MIME, where
  doing so only on text parts might be preferable.
- The config really needs to live in a config file -- changing or expanding 
  the format is fine if needed, but config that lives in code can't really
  be edited.  I'm not bothered by having per-list defaults in the plugin,
  but one should be able to fully add a new list or alter an existing one
  without changing Perl code.
- In the unwinding of HTML entity escaping -- I suggest doing amp; last.
  amp;gt; is gt;, not .

 the SURBL two-level and three-level lists -- but the SURBL lists are
 pruned to exclude stuff that we're pretty sure is never going to actually
 get listed -- when was the last time you saw a spammer use .mil? 
 uribl.com never has, according to our datafeed :)

We'll probably see compromised .gov and .mil sites hosting malware, much as we
see in other domains.  They're not that useful for classical spammer purposes
since they can't be registered in the usual fashion.


Devin

-- 
Devin  \ aqua(at)devin.com, IRC:Requiem; http://www.devin.com
Carraway \ 1024D/E9ABFCD2: 13E7 199E DD1E 65F0 8905 2E43 5395 CA0D E9AB FCD2


Re: Rewritten URIBL plugin

2010-07-26 Thread Jared Johnson
 On Sun, Jul 25, 2010 at 04:43:36AM -0500, Jared Johnson wrote:
 The plugin has the following advantages over the original:

 Based on a single read-through of the code, I like most of it.  Some
 assorted observations, though:

 - Parsing out the MIME sections is worthwhile, though it's really
 something
   that ought to be done in the core libs when plugins indicate they need
 it,
   or in a utility plugin if that's practical.  This was something I meant
   to bring up a while back -- we have quite a few different
 content-scanning
   plugins, and if they're all obliged to invoke MIME::Parser repeatedly
 it'll
   be even more costly than that module already is.  Not that I have time
 to
   do it myself, though.  :P

Our internal plugin actually just uses $txn-notes('mime_body') instead of
creating a new $msg, because we do indeed have such a utility plugin.  A
bit short of being transparently handled by internals, but I could at
least include code in the plugin to use $txn-notes('mime_body') instead
of parsing when that exists.  This would also allow our internal plugin to
be slightly less forked from what winds up in upstream :)  Your idea of
doing this internally sounds cool to me, but we're unlikely to get around
to implementing it ourselves any time soon given the system that already
exists is working OK.  And unfortunately our utility plugin is really just
one aspect of a bigger plugin that wouldn't really be useful to QP, and we
have no method of 'asking' for MIME parsing from said plugin since for our
purposes it's *always* necessary to parse mime.

 - For messages below a few hundred kb or so, consider
   MIME::Parser-output_to_core(), so we can avoid the fs churn.
 Annoyingly,
   MIME::Parser doesn't provide a mechanism to only consider particular
   mimetypes, or we could skip decoding non-text bodies.

The MIME::Parser docs are pretty insistent that using output_to_core is
detrimental to performance in general, and should only be bothered with
when there are actual disk constraints.  I've never tested this theory for
really small messages, but it's kept me away from trying it :)

If it's not very difficult, perhaps it could be optional via a plugin
argument, and later via the handy dandy general
mime-parsing-needed-feature if it's ever written

 - It looks like you're scanning inside all parts of multipart MIME, where
   doing so only on text parts might be preferable.

No, in scan_body() we do:

return unless $part-effective_type ~~ [qw( text/plain text/html )];

I think that's what we want anyway...

 - The config really needs to live in a config file -- changing or
 expanding
   the format is fine if needed, but config that lives in code can't really
   be edited.  I'm not bothered by having per-list defaults in the plugin,
   but one should be able to fully add a new list or alter an existing one
   without changing Perl code.

I agree.  If you don't object to the spec I proposed in my reply to
Robert, I can get that changed soon.

 - In the unwinding of HTML entity escaping -- I suggest doing amp; last.
   amp;gt; is gt;, not .

Good idea, thanks :)

 the SURBL two-level and three-level lists -- but the SURBL lists are
 pruned to exclude stuff that we're pretty sure is never going to
 actually
 get listed -- when was the last time you saw a spammer use .mil?
 uribl.com never has, according to our datafeed :)

 We'll probably see compromised .gov and .mil sites hosting malware, much
 as we
 see in other domains.  They're not that useful for classical spammer
 purposes
 since they can't be registered in the usual fashion.

Ideally, we should only bother querying services for URIs that have some
chance of being listed on those services, regardless of their chance of
actually being used in spam.  Thus, if the datafeeds don't have any hosts
with .mil, there's no point checking them.  However, (1) we're only
deriving our list from URIBL right now, and even as we expand we can never
say that we derived our list from every service that someone might use;
(2) even if I ran my tld_list generating script every day against every
service and sent in an updated list as soon as something got listed on
.mil, there would still be the lag time to get a release out with the
updated list and then have people upgrade, during which time there would
be some misses.

I think that using 'pruned' lists is eventually going to cause some misses
somewhere.  For our own use, since we have control of update timelines,
it's worth the risk, and we also don't have to worry about legal issues
with using the data.  If either of these is a big enough problem for QP in
general, though, you could work around it by using an un-pruned
tld_lists.pl.  This would give %firstlevel_tlds 263 entries instead of
128; %surbl_secondlevel_tlds would have 3,801 instead of 2,426; and
%surbl_thirdlevel_tlds would have 155 instead of 135.  If you're worried
about the legal issue but *not* the timeliness issue, you could still
exclude a few 

Re: Rewritten URIBL plugin

2010-07-26 Thread Robert Spier


Jared Johnson wrote:
 
  - Introduces support for URIBL services that may not have worked right,
  at
  least out of the box, before.  Defines the subtle differences between
  various known URIBL services in order to maximize compatibility.
 
  Is it worth pulling some of this config out of the code and putting it
  into some sort of config file?
 
 Probably would be worth it, and not that difficult to do.  Any new config
 scheme should probably be both backward compatible and non-spaghetti.  The
 current scheme looks like:
 
 multi.uribl.com 2,4 deny
 
 the new plugin does add one additional feature, you can now do:
 
 multi.uribl.com 2 deny
 multi.uribl.com 4 log
 
 but $zones defines a bunch of other things - labels for the zone in
 general and specific masks, check_ip, check_host,
 host_wildcards_supported, lookup_a_record, lookup_ns_record.  Ignoring the
 label issue, all the other directives are related only to the zone;
 perhaps we could add a new scheme on top of the mask definition scheme
 (which we would continue to read in the old way), getting even further
 away from the one-line-per-zone style:
 
 multi.uribl.com action deny
 multi.uribl.com check_ip 1
 multi.uribl.com check_host 1
 multi.uribl.com lookup_a_record 0
 multi.uribl.com lookup_ns_record 0
 multi.uribl.com host_wildcards_support 0
 
 If nobody hates this scheme it does seem like something I could add pretty
 easily send an updated copy

I don't hate it, FWIW.  :)

Sticking with one line per zone is NOT a requirement.  (Heck, I'd even
be willing to go so far as saying we could go to a even more
complicated config scheme.  One of qpsmtpd's unwritten goals is to NOT
pull in half of CPAN to run, but we could expand a little.)

 
  - Uses Net::DNS::Async to simplify the code, and also to ensure the
  afore-mentioned A and NS lookups will prompt new URIBL lookups in an
  efficient and simple manner via callbacks
 
  Does the code still work with the async qpsmtpd cores?
 
 Well... I didn't change the existing plugin a whole lot; I did change from
 data_handler() registered via register(), to plain old hook_data_post();
 this may need changed considering that the comments above each sub (which
 I also removed) suggested that hook_data_post() and register() were 'not
 used' for async.  These things could easily be changed if needed.  However
 we don't run async anywhere, not even in test environments, so I'm not
 very qualified to know if they need changed, or give any guarantees.  And
 I'm confused by that bit.  From what I can tell, the async/uribl plugin
 ignores plugins/uribl entirely and uses
 Qpsmtpd::Plugins::Async::DNSBLBase, which in turn uses ParaDNS.

In that case, it's possible (and likely) that it may never have
worked with async.  When time is short, it's easy to just ask questions :)

 
 Now I noticed the async rhsbl plugin used Q::P::A::D and that this used
 ParaDNS before and looked at it briefly to see if I could use it for my
 purposes, but at the top of the docs it's mentioned that only A lookups
 are supported, so I skipped it and went to Net::DNS::Async.  I don't
 really know the strongpoints of N::D::A vs. ParaDNS but it did note that
 the latter more expressly claims to be interested in performance and
 scalability.  So in conclusion:
 
 (1) I think I'll look into using ParaDNS and maybe even
 Q::P::Async::DNSBLBase if appropriate, even though my plugin is not in
 async; it might still have advantages, and the interesting logic is in the
 callbacks and not the module use anyway
 (2) I don't know if it matters whether this particular plugin works with
 async, I guess it depends on whether the old one really ever did.  At
 least Net::DNS::Async should not make it any more blocking than the old
 plugin's collect_results().  At any rate somebody who plays in async
 territory should probably comment before any action is taken on the new
 plugin :)
 
  Attached also is tld_lists.pl, a companion file that needs to be dropped
  in lib/Qpsmtpd/ which provides the list of first, second, and third
  level
  TLDs that we care about.  It's derived from our URIBL datafeed as well
  as
 
  Do the owners of that data care about it being used this way?  You may
  be violating any agreement with them.  Would they be ok if this was
  released as an independent CPAN module?Either way, can we
  structure this as an API instead of just an include file?
 
 (1) I suspect that anyone who runs a URIBL service would be thrilled with
 this information being available for this particular use, because it
 minimizes unnecessary queries to their services.  Consider the old method
 that sent 2 or 3 queries instead of one because it didn't know which one
 was right.  The fact that SURBL and URIBL both publish the two and three
 level lists and encourage their use is evidence of this :)
 
 (2) First-level TLDs are public information, of course, and with the
 two-level and three-level lists being public, the interesting business is
 that the 

Re: Rewritten URIBL plugin

2010-07-26 Thread Jared Johnson
 And
 I'm confused by that bit.  From what I can tell, the async/uribl plugin
 ignores plugins/uribl entirely and uses
 Qpsmtpd::Plugins::Async::DNSBLBase, which in turn uses ParaDNS.

 In that case, it's possible (and likely) that it may never have
 worked with async.  When time is short, it's easy to just ask questions :)

OK, I looked closer and I guess the async plugin actually 'inherits' from
the non-async one, so it uses its methods.  That's a new topic for me but
I think I can come up with the necessary changes to the non-async plugin
and maybe a patch to the async one to use it correctly.  But I won't be
able to test the theory :)

-Jared



Re: Rewritten URIBL plugin

2010-07-26 Thread Jared Johnson
 Attached also is tld_lists.pl, a companion file that needs to be dropped
 in lib/Qpsmtpd/ which provides the list of first, second, and third
 level
 TLDs that we care about.  It's derived from our URIBL datafeed as well
 as

 Would they be ok if this was
 released as an independent CPAN module?Either way, can we
 structure this as an API instead of just an include file?

I've already pretty much done the work to access tld_lists as an API, but
I'm now doubting the use unless it is released to CPAN:  it really makes
the most sense to have Qpsmtpd::TLDLists.pm itself still pull in the
contents of tld_lists.pl as I posted them, in a 'require', because those
contents are created by an automated script.  It seems silly to require
the modification of said automated script if we want it to generate
different logic.  But when Qpsmtpd::TLDLists require's tld_lists.pl, then
comes the question 'if Qpsmtpd::TLDLIsts is the only thing that accesses
this data, and plugins/uribl is the only thing that uses
Qpsmtpd::TLDLists, then why doesn't plugins/uribl just read the data
itself?

You guys can have it whichever way you want it but I say, don't be afraid
to treat a .pl file as if it were basically a static, shipped config file
in a format easily read by perl plugins :)

-Jared



Re: Rewritten URIBL plugin

2010-07-25 Thread Robert Spier

 - Introduces support for URIBL services that may not have worked right, at
 least out of the box, before.  Defines the subtle differences between
 various known URIBL services in order to maximize compatibility. 

Is it worth pulling some of this config out of the code and putting it
into some sort of config file?  

 - Uses Net::DNS::Async to simplify the code, and also to ensure the
 afore-mentioned A and NS lookups will prompt new URIBL lookups in an
 efficient and simple manner via callbacks

Does the code still work with the async qpsmtpd cores?


 Attached also is tld_lists.pl, a companion file that needs to be dropped
 in lib/Qpsmtpd/ which provides the list of first, second, and third level
 TLDs that we care about.  It's derived from our URIBL datafeed as well as

Do the owners of that data care about it being used this way?  You may
be violating any agreement with them.  Would they be ok if this was
released as an independent CPAN module?Either way, can we
structure this as an API instead of just an include file?

 There are some disadvantages to this code.  MIME::Parser is probably heavy
 in terms of resource usage compared to just scanning the message body
 line-by-line.  This would probably not be that difficult to revert again
 if desired, although I think the advantages of M::P outweigh the
 disadvantages.  

It would be cool if it was configurable (i.e. use MIME::Parser or not.)

 code, etc.  At any rate, I've worked hard on this code and am thankful to
 Devin and the rest of the QP contributors for the starting point, so I
 hope that someone manages to find this useful and perhaps massage it into
 upstream :)

I don't have time to test or massage it, but the uribl plugin is quite
important, so if someone is willing to test, benchmark, and massage,
it would be cool to get a more powerful version into core.

-R