[Boston.pm] tech meeting tues 2/8/11

2011-02-05 Thread Uri Guttman
hi all, our long time interim fearless leader will not be leading the meeting this coming tuesday. he left the controls in my safe hands. there will be two main talks, a short one by me on new features and planned stuff in File::Slurp and also we will watch a video of schwern on perl5i which is

Re: [Boston.pm] Q: giant-but-simple regex efficiency

2011-02-05 Thread Ted Zlatanov
On Sat, 05 Feb 2011 18:27:13 -0500 Charlie wrote: C> The sample program below runs in 00:09:04 on 1.15GB (1024 copies of C> Moby Dick). Replacing the hard-coded map with 2 entries with 6000 C> words taken from the text (randomly selected, unique, >5 chars) runs C> in 00:09:17. I.e. the

Re: [Boston.pm] Q: giant-but-simple regex efficiency

2011-02-05 Thread Charlie
Short answer, no, Perl regex will not build an optimal lookup of a token into your set of 6000 names. In general, if speed is the issue, do not use regex. It does not scale. Also, be clear on the 2 problems at hand: 1) tokenizing 1GB of input text and 2) adding a prefix to identified

Re: [Boston.pm] Q: giant-but-simple regex efficiency

2011-02-05 Thread Bill Ricker
On Sat, Feb 5, 2011 at 3:43 PM, Alex Vandiver wrote: > Since we're talking about literals, this hasn't been true since 2007, > with the release of perl 5.10.  Perl now uses a Aho-Corasick trie > algorithm internally for literal alternations, which allows for > matching without backtracking: Aha,

Re: [Boston.pm] Q: giant-but-simple regex efficiency

2011-02-05 Thread Alex Vandiver
At Fri Feb 04 18:53:09 -0500 2011, Uri Guttman wrote: > that will kill your cpu. alternations are very slow since they have to > go back and try from the beginning of the list each time. Since we're talking about literals, this hasn't been true since 2007, with the release of perl 5.10. Perl now

Re: [Boston.pm] Q: giant-but-simple regex efficiency

2011-02-05 Thread Uri Guttman
> "MP" == Martyn Peck writes: MP> What's wrong with something like this: MP> while($line=<>){ MP> foreach my $name (@names){ MP> $line ~= s/$name/prefix_$1/g; MP> } MP> } it is O( N^2 ) which is very slow for large data sets. MP> I know it seems

Re: [Boston.pm] [Boston.pm-announce] Meeting next Tuesday, Feb 8th ?

2011-02-05 Thread Uri Guttman
> "RJK" == Ronald J Kimball writes: RJK> On Fri, Feb 04, 2011 at 11:26:05PM -0500, Uri Guttman wrote: >> > "CW" == Conor Walsh writes: >> CW> I suppose I have a new valid answer to my favorite "do you really know CW> Perl or do you just know the syntax" interview question.

Re: [Boston.pm] Q: giant-but-simple regex efficiency

2011-02-05 Thread Martyn Peck
hi Ok, I've been reading over the responses you've been getting and I just have to ask everyone. What's wrong with something like this: while($line=<>){ foreach my $name (@names){ $line ~= s/$name/prefix_$1/g; } } I know it seems kind of

Re: [Boston.pm] Question on optimization/memory allocation

2011-02-05 Thread Shlomi Fish
On Saturday 05 Feb 2011 00:23:50 Conor Walsh wrote: > On 2/4/2011 2:04 PM, Asa Martin wrote: > > I was told that "predeclaring" the variables outside the loop saved on > > memory allocation, and that using @rules instead of four named variables > > was also more efficient. I had never considered