On Tue, 20 Jul 2004 17:28:25 -0400, Chris Santerre wrote:

>� I bet the file could be half the size it is now. But I don't have
>� the script experience to do this. So anyone who can improve the
>� logic would be a great help.

How is it generated now?

Have you tried using "Regexp::List", "Regexp::Optimizer" or "Regex::PreSuf " 
for this? 'm not sure how good or trustworthy they are, but maybe they are 
worth a try.

A made a very small test just to see what they might do. I don't know what one 
can change by telling the modules to act differently. And they probably work 
better on some input than other. Some results below.

Regexp::Optimizer (optimize a regexp):
--8<--
/\bhomel(?:oanace\.com|andunited\.com|anddefensejournal\.com|anddefenseradio\.com|andsecurityresearch\.com|ead\.net|essprelates\.com|essteens\.com)\b/i
became
(?-xism:/\bhomel(?:and(?:defense(?:journal|radio)\.com|(?:united|securityresearch)\.com)|e(?:ss(?:prelate|teen)s\.com|ad\.net)|oanace\.com)\b/i)

/\bhomeg(?:ain\.com|ain\.biz|ain\.net|un\.com)\b/i
became
(?-xism:/\bhomeg(?:ain\.(?:com|biz|net)|un\.com)\b/i)
--8<--

Regexp::List (create a regexp from a list):
--8<--
gnyrfalo.nils.com, gnyrfippa.hasse.net, jsfd.hej.com, jasaf.asf.se, 
jfdsjsdf.hsf.com, gnyrffal.hej.net
became
(?-xism:(?=[gj])(?:gnyrf(?:alo\.nils\.com|(?:ippa\.hasse|fal\.hej)\.net)|j(?:(?:sfd\.hej|fdsjsdf\.hsf)\.com|asaf\.asf\.se)))

gnyrfalo.nils.com, gnyrffal.hej.net, gnyrfippa.hasse.net, jasaf.asf.se, 
jfdsjsdf.hsf.com, jsfd.hej.com
became
(?-xism:(?=[gj])(?:gnyrf(?:alo\.nils\.com|(?:fal\.hej|ippa\.hasse)\.net)|j(?:asaf\.asf\.se|(?:fdsjsdf\.hsf|sfd\.hej)\.com)))

gnyrfalo.nils.com, jsfd.hej.com, jfdsjsdf.hsf.com, gnyrfippa.hasse.net, 
gnyrffal.hej.net, jasaf.asf.se
became
(?-xism:(?=[gj])(?:gnyrf(?:alo\.nils\.com|(?:ippa\.hasse|fal\.hej)\.net)|j(?:(?:sfd\.hej|fdsjsdf\.hsf)\.com|asaf\.asf\.se)))
--8<--

PreSuf (also creates a regexp from a list):
--8<--
gnyrfalo.nils.com, gnyrfippa.hasse.net, jsfd.hej.com, jasaf.asf.se, 
jfdsjsdf.hsf.com, gnyrffal.hej.net
became
(?:gnyrf(?:alo\.nils\.com|fal\.hej\.net|ippa\.hasse\.net)|j(?:asaf\.asf\.se|fdsjsdf\.hsf\.com|sfd\.hej\.com))

gnyrfalo.nils.com, gnyrffal.hej.net, gnyrfippa.hasse.net, jasaf.asf.se, 
jfdsjsdf.hsf.com, jsfd.hej.com
became
(?:gnyrf(?:alo\.nils\.com|fal\.hej\.net|ippa\.hasse\.net)|j(?:asaf\.asf\.se|fdsjsdf\.hsf\.com|sfd\.hej\.com))

gnyrfalo.nils.com, jsfd.hej.com, jfdsjsdf.hsf.com, gnyrfippa.hasse.net, 
gnyrffal.hej.net, jasaf.asf.se
became
(?:gnyrf(?:alo\.nils\.com|fal\.hej\.net|ippa\.hasse\.net)|j(?:asaf\.asf\.se|fdsjsdf\.hsf\.com|sfd\.hej\.com))
--8<--

Just a thought... Might be stupid...

Anyway, here's the little test script:
--8<--
use strict;
use Regexp::List;
use Regexp::Optimizer;
use Regex::PreSuf;

my $l2r = Regexp::List->new;
my $ro = Regexp::Optimizer->new;

my $r1 = 
'/\bhomel(?:oanace\.com|andunited\.com|anddefensejournal\.com|anddefenseradio\.com|andsecurityresearch\.com|ead\.net|essprelates\.com|essteens\.com)\b/i';
my $r2 = '/\bhomeg(?:ain\.com|ain\.biz|ain\.net|un\.com)\b/i';
my @l1 = (
                'gnyrfalo.nils.com',
                'gnyrfippa.hasse.net',
                'jsfd.hej.com',
                'jasaf.asf.se',
                'jfdsjsdf.hsf.com',
                'gnyrffal.hej.net',
        );
my @l2 = sort @l1;
my @l3 = sort { my $ax = $a; my $bx = $b; $ax =~ s/.*\.([^\.]+)$/$1/; $bx =~ 
s/.*\.([^\.]+)$/$1/; $ax cmp $bx; } @l1;

print "$r1 =>\n" . $ro->optimize($r1) . "\n\n";
print "$r2 =>\n" . $ro->optimize($r2) . "\n\n";

print '('.join(', ',@l1).") =>\n" . $l2r->set(modifiers => 'i')->list2re(@l1) . 
"\n\n";
print '('.join(', ',@l2).") =>\n" . $l2r->set(modifiers => 'i')->list2re(@l2) . 
"\n\n";
print '('.join(', ',@l3).") =>\n" . $l2r->set(modifiers => 'i')->list2re(@l3) . 
"\n\n";

print '('.join(', ',@l1).") =>\n" . presuf(@l1) . "\n\n";
print '('.join(', ',@l2).") =>\n" . presuf(@l2) . "\n\n";
print '('.join(', ',@l3).") =>\n" . presuf(@l3) . "\n\n";
--8<--

Regards (and many thanks)
/Jonas

--
Jonas Eckerman, [EMAIL PROTECTED]
http://www.fsdb.org/

Reply via email to