Re: Bug in Regular Expression Engine?

Shawn McKinley Sun, 08 Jul 2001 14:35:08 -0700
Well, I am not familiar with what you are doing at all... but here is what I
came up with that will parse your source file, line number, error/warning,
and description:

<code>
use strict;
undef $/;
open (F, "<Zach_Turner.dat"); my $error = <F>; close (F);
$/ = "\n";
my (@error,@errors);
$error =~ /(.*?)([A-Za-z]:\\.*)/ms; my $start = $1; my $errors = $2;
my @eP = split(/([A-Za-z]:\\)/, $errors); s/\n//g for (@eP); my $c = @eP;
for (my $x=1;$x<$c;$x+=2)

  my $temp ="$eP[$x]$eP[$x+1]";
  $temp =~ /(.*?)\((\d+)\) : ((?:error|warning) .*?:)? ?(.*)?/ms;
  push @error, $1,$2,$3,$4;
}
print "Debug Header:\n$start\n\n";
for (my $x=0;$x<scalar(@error);$x+=4) {
  my $source = substr($error[$x],0,1500);
  my $line = qq~Error in (line): $source($error[$x+1]) \n~;
  $line .= qq~Error/Warning  : $error[$x+2]\n~ if($error[$x+2]);
  my $desc = substr($error[$x+3],0,1500);
  $line .= qq~Description    : $desc\n\n~;
  print $line;
}
exit;
</code>

I am sure it can be condensed some more, but I am not sure you are going to
get a single regex to do what you are wanting...

BTW, thanks, I did not know about (?:.*).  I guess I will have to see how
much processing time this neat little trick will save :-)

Shawn

----- Original Message -----
From: <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Sunday, July 08, 2001 1:13 PM
Subject: Re: Bug in Regular Expression Engine?


> Oh, and some notes on the Rx itself:
>
> >> ([A-Za-z]:\\(?:[\w ]+\\)*[\w ]+\.[\w ]*?\(\d+\) : warning C)(\d{4})(:
.*)
>
> > \\ => match '\'
> > ****(You probably meant to have another one of these in there so your
next
> > char '(' would be escaped)
>
> Just two \\ is correct.  The part of the Rx before \(\d+\) basically
matches
> a fully qualified path name.  So the first part, ([A-Za-z]:\\, matches
C:\,
> D:\, E:\, etc.
>
> > ( => start capture $2
> >
> > ? => no value at all
> >
> > : match ':'
>
> Actually this specifies that the value in parentheses will not be saved
for
> backtracing.  I wanted the parentheses only as a means of placing a
> quantifier on a group of characters, and I didn't need to save the actual
> match.  So (?:*) is identical to (*) except it does not save the match in
$n
>
> > [\w]+ => match at least one plus any number of word characters (does not
> > include spaces, '(', ')', and many other valid win os file name chars
>
> It's actually [\w ]+ in the Rx, so it will match spaces.  I did not worry
> about characters like _ and other valid filename characters, however.
>
> > \\ => match '\'
> > ******(You probably meant to have another one of these in there so your
next
> > char ')' would be escaped)
>
> Again, since this is part of a path name I am trying to match the \, not
the (
>
> > \d+ => match at least one plus any number of number characters
>
> Doesn't this match one plus any number of digits?
>
> > .* => match what is left in the string (VERY greedy)
> That's fine.  Even though I slurp the whole file into a buffer, I am using
a
> previous m//g to get me each individual message.  So this .* simply gives
me
> everything up until the end of the string, which turns out to be the rest
of
> the current message.
>


_______________________________________________
ActivePerl mailing list
[EMAIL PROTECTED]
http://listserv.ActiveState.com/mailman/listinfo/activeperl
Re: Bug in Regular Expression Engine?

Reply via email to