On Tue, Nov 29, 2016 at 08:45 PM, Don Ward wrote: I should have been a bit
clearer when I said my program didn’t work. It does, but only for fixed
strings: there is none of the RE special character magic. And, I agree, the
crucial question is how to construct a pattern from a string that treats the
special characters as special characters, rather than just literals.
In passing, write( type( )) writes string, whereas write(type( )) writes
pattern, which isn’t quite what I expected.
I had quite high hopes for Arbno(), but soon realised that it wanted a pattern
for its argument, not a string and, even when I fed it with a variable that had
the type of pattern, it still didn’t work how I might have expected it to. At
that stage, I asked my original question. If Clint’s “option #1" is "write a
library procedure that parses the regex and builds the corresponding pattern”,
I wonder whether Arbno() might be a suitable interface: i.e. if it’s a pattern
already, do what it does now, otherwise turn the string into a pattern and then
do it. Perhaps a separate procedure might be clearer.
The reader with no time for trivia may profitably skip the rest of this message
...
I may have found a use for Succeed: If I modify my program to be as below
(additions in red: the reason for the strange comment at the end will be clear
in a moment)
procedure main(args)
local f, line, re := pop(args) || Succeed()
write(type(re))
every f := open(!args, "r") do {
every line := !f do {
if ( line ?? re ) then write(line)
}
}
end
#[dne][edn][den]
If I use grep on this program source I get
bash-3.2$ grep "[dne][edn][den]" gerp.icn
local f, line, re := pop(args) || Succeed()
end
#[dne][edn][den]as expected: grep has found “eed", “end" and the regular
expression itself in the final comment. Whereas, if I use the program on its
own source code I get
bash-3.2$ ./gerp "[dne][edn][den]" gerp.icn
pattern
#[dne][edn][den]showing that although I have a pattern, it isn’t interpreting
the special characters.
Don, this isn't run time regex. Regex literals are preprocessed before the
compile.
Look at unicon -E gerp.icn
to see how it is the preprocessor phase that expands the special meanings, by
generating Unicon function expressions.
The program patstr.icn
procedure main(argv)
local f, line, re := pop(argv)
a :=
write(type(a))
b :=
write(type(b), " ", string(b))
p := pattern_concat("", re)
write(type(p))
every f := open(!argv, "r") do {
every line := !f do {
if line ?? p then write(line)
}
}
end
Expands to
prompt$ unicon -E patstr.icn
Parsing patstr.icn: .
/home/btiffin/unicon/bin/icont -c -E -O patstr.icn /tmp/uni13423387
patstr.icn:
#line 0 "/tmp/uni13423387"
#line 0 "patstr.icn"
procedure main(argv );
local f, line, re;
#line 13 "patstr.icn"
re := pop(argv)
a := pattern_concat("abc", (Arbno('abc')));
write(type(a));
b := pattern_concat("abc", Any('abc'));
write(type(b), " ", string(b));
p := pattern_concat("", re);
write(type(p));
every f := open(!argv, "r") do {
every line := !f do {
if( "" ? pattern_match( line, p)) then write(line)
}
};
end
No errors
The Unicon VM nevers see the literals, it see the results of pattern_xxx
constructors and functions. Along with use of Arbno() to handle the Kleene
stars and character set square brackets.
unicon -E will show more clearly what is going on, and the separation of
compile time and runtime behaviour, and what is going to be allowed with regex
literals.
Cheers,
Brian
If I miss off the "|| Succeed()” from the initialisation of re and try again I
get
string
#[dne][edn][den]
I still get a pattern match, even though it’s a string not a pattern, but it’s
the literal string that is matching.
Therefore Succeed() may be used to turn a string into a pattern! Unfortunately
not in a useful way.
On 29 Nov 2016, at 22:27, Jeffery, Clint ([email protected]
(mailto:[email protected])) wrote:
My thanks to Don, Jay, and anyone else who is trying out stuff related to
patterns. I am on the road ATM but will work on improving the diagnostics
related to Jay's experiments. Regarding Don's original request and Jay's
comments on it: backquotes in patterns is not a full "eval" interpreter that
will take arbitrary Icon strings and turn them into code. Maybe we need that,
and maybe someone will build it some day. In the meantime, after figuring out
the best workarounds that may be available, you can judge for yourself whether
the patterns are still useful, or whether they remain unfinished business.
The basic question is: given a regular expression supplied as string data s,
how best should we construct a corresponding pattern. The answer sadly is not .
The Unicon translator has a parser for regular expressions and emits pattern
function calls for them, but we want to do it from the Unicon VM. Options
include: write a library procedure that parses the regex and builds the
corresponding pattern; write a library procedure that invokes the translator to
do the work and use dynamic loading to get the code loaded; extend the language
with a new built-in that does the same or similar; extend the backquotes
operator to do what we want here; or use another idea that you think up.
Don: great minds think alike. When I started to update the Unicon book to talk
about patterns, I immediately figured we needed to update the "grep" example to
use patterns, and came up against the same issue you're asking about. I
haven't implemented a solution yet, but perhaps we should do option #1 and see
what that looks like.
Cheers,Clint
------------------------------------
From: Jay Hammond
Sent: Tuesday, November 29, 2016 1:55:07 PM
To: Don Ward; [email protected]
(mailto:[email protected])
Subject: Re: [Unicon-group] Converting strings to patterns
Hi Don,I tried running your program.To get it to do anything I had to change
line 2, separate the local declaration and the assignment.to clarify, repat is
a (new) variable that I intend to hold a pattern
procedure main(args)
local f, line, re
re := pop(args)
write(re)
repat := re
every f := open(!args, "r") do {
while ( line := read(f) ) do {
if line ?? repat then write(line)
}
}
endI created qqq.txt with the linesQQQ
qqq
cqcqcqand ran testpat QQQ qqq.txt after compiling testpat.icnOutput was
QQQ
then the contents of qqq.txtas if repat always matches. (it has the null
value??)
I tried forcing repat to be a pattern (utr18 says that patterns are composed of
strings concatenated or alternated) so I tried
repat := re .| fail()repat := re .| rebut the pattern building process gave
me node errors at compile time.dopat2.icn:6: # "re": syntax error (237;349)
File dopat2.icn; Line 16 # traverse: undefined node typeline 16 is the line
after end in main, i.e. program source end.
I tried using the -f s option at the compile step, so as to use unevaluated
expressions in patternslike
repat := < `re` ># that syntax ought to force a pattern!
node traversal errors again. And the backquotes were not recognised. Perhaps I
put the -f s option in the wrong place?
I also tried
repat := < Q > || < Q > || < Q >
dopat2.icn:6: # "repat": invalid argument in augmented assignment
File dopat2.icn; Line 16 # traverse: undefined node typeso it is not
considering || to be pattern concatenation repat := < Q || Q || Q >
gave the same error!
So although UTR18 seems to give options for converting strings to patterns I
have not had any luck so far.
Jay
On 29/11/2016 14:33, Don Ward wrote:
Here is a very simple (and simple minded) grep program. The idea being to apply
a Unicon regexp pattern to a series of files, just like grep
procedure main(args)
local f, line, re := pop(args)
every f := open(!args, "r") do {
every line := !f do {
if line ?? re then write(line)
}
}
end
Of course, it doesn’t work because in line 6 I have a string instead of a
pattern.
Is there any way to convert the contents of re from a string to a pattern?
After reading UTR18 again (and again), I’ve come to the conclusion that there
isn’t any way to do it.
The pertinent extract from UTR18 is in section 4.5.3 "Limitations due to lack
of eval()”.
But before I give up on the idea entirely, I thought I’d check to see if my
understanding is correct.
Don
------------------------------------------------------------------------------
_______________________________________________ Unicon-group mailing
list [email protected]
(mailto:[email protected])
https://lists.sourceforge.net/lists/listinfo/unicon-group
(https://lists.sourceforge.net/lists/listinfo/unicon-group)
------------------------------------------------------------------------------
_______________________________________________
Unicon-group mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/unicon-group