David Gamey wrote:
> Beyond the acknowledgment that part of the code is a kludge and the
> desire to better integrate with the rest of the language. Has the
> problem/challenge been defined in a bit more detail?
David,
Here's something I wrote to Clint back in 2006 that outlines my thinking
on integrating some of the PM ideas into Unicon. There are some references
to Sudarshan's thesis that may require some review to make sense, and a
few typos [when have I *ever* not had a few typo's?]. It's excerpted from a
long message but I don't think anything's lost by doing so:
====================================================================
One of the complaints against PM in S4 (though some people like this)
is that PM is really a separate language grafted onto S4.
This has the same feel to me. (In S4, this is mitigated some
by the fact that both languages are simple, that certainly
isn't the case in Unicon!). PM as described does not integrate
well into Unicon and I find it confusing to have to constantly
shift gears between the Unicon language and the PM language.
I found, in thinking about patterns, that I'd want to write
something like:
PArbno(x) && write(PAny(y))
but I imagine that's not possible! This is a step backward.
Also, too many of the operators are too close in
meaning to existing Unicon operators. The conditional
assignment x -> y is nearly equivalent to reversible
assignment y <- x. I think I understand why one can't just
use y <- x, but I believe that actually illustrates part
of my concern - something isn't right if you can't. If fact,
if you could, then there would be no need for *both* immediate
assignment and conditional assignment, := and <- would suffice.
Virtually all of the operations in PM align similarly with
existing operations.
The examples shown in section 4 are also slightly misleading -
the 'pure' Unicon versions do not take advantage of existing
features enough and so present Unicon in a worse light than
necessary. I'd rather first see a better use of existing
facilities than introducing an entirely new mechanism.
The phone number parser can be written, for example as:
----------------------------------------------
procedure main(args)
in := open("phoneIn.txt")
out := open("phoneOut.html","w")
write(out, "<html>")
write(out, "<body>")
while line := read(in) do {
write(out, "<div>",line,"</div>")
line ? {
if (areacode := digits(3)) &
(trunk := digits(3)) &
(rest := digits(4)) &
(ext := arbDigits()) &
pos(0) then {
write(out,"<div style=\"color:red\">","AreaCode = ",
areacode,"</div>")
write(out,"<div style=\"color:red\">","Trunk = ",
trunk,"</div>")
write(out,"<div style=\"color:red\">","Rest = ",
rest,"</div>")
}
}
}
write(out, "</body>")
write(out, "</html>")
end
procedure digits(N)
return 2(p1 := &pos := upto(&digits)\1, tab(many(&digits))\1, pos(p1+N))
end
procedure arbDigits()
return (tab(upto(&digits))\1, tab(many(&digits))) | ""
end
--------------------------------------------------------------
which also happens to be 'more correct' than any of the examples in the
paper, all of which think that:
111111-22222222222223333333333333333333333333333
represents a valid phone number: (111) 111-2222
The program to detect words with double letters can be written *much* more
succinctly [if that's important] as:
--------------------------------------------------------------
procedure main()
in := open("mtent12.txt", "r") | stop("open failed")
out := open("mtentpatternOut.txt", "w")
every line := !in do {
line ? while word := (tab(upto(&letters)), tab(many(&letters))) do
word ? if |(move(1)\1) == move(1) then write(out, word)
}
end
--------------------------------------------------------------
and the (A^n)(B^n)(C^n) can done as (the ABC(s) function is overkill for the
task and quite inefficient):
---------------------------------------------------------------
procedure main(args)
every line := !&input do {
if line ? (ABC() & pos(0)) then write("accepted")
else write("rejected")
}
end
procedure ABC()
return (*tab(many('a')) = *tab(many('b')) = *tab(many('c')))
end
---------------------------------------------------------------
Note that this is shorter and clearer than the S4 equivalent! It would
be even better if Unicon had span(c) as a synonym for tab(many(c)):
procedure ABC()
return *span('a') = *span('b') = *span('c')
end
[As an aside, I'm also opposed to make &input, &output, and &errout
variables. I think one would be better served by having initialization
of global parameters and allowing (e.g.):
global out:&output
Redefining &input etc dynamically is a problem for large programs
consisting of man packages - for one thing, there's no way to go back.
This idea is (to me) the equivalent of the way FORTRAN2 let you
redefine integer constants! &input, &output, and &errout should
remain constants also.]
Now, having said that I'm not happy with PM implemented as an add on,
I do see some *very* useful ideas here. One thing that is apparent is
that PM is more efficient than string scanning (even the 'improved'
examples shown above aren't likely to be as fast as PM). So, in my
mind, the question becomes more 'what can be done to improve string
scanning?'. I see two approaches:
(1) It's clear that PM has a richer set of operations than SS.
That fact that only tab() and move() advance the scan position
is elegant, but inefficient because it (a) increases the number
of function calls and (b) always constructs a string, even
if that string isn't needed. The procedure digits(N), for
example can be written more 'cleanly' as:
procedure digits(N)
return 2(tab(p1 := upto(&digits)\1), tab(many(&digits))\1, pos(p1+N))
end
I'd like to see improved SS operations, such as:
pos() [as a synonym for &pos, instead of a runtime error]
span(c) [tab(many(c))]
skipto(n) [like tab, but returns "" instead of the matched string]
skip(c) [skipto(many(c))]
skip(s) [skipto(match(s))]
substring(p1,p2) [&subject[p1:p2]]
etc...
then the above could be:
return 3(skip(~&digits),p1 := pos(), span(&digits), pos(p1+N))
(the calculation of ~&digits could be moved out of the expression if speed
were important, of course) or
return 3(skipto(&digits),p1 := pos((), span(&digits), pos(p1+N))
In fact, virtually all of the proposed pattern matching functions would
be *very* good candidates for new string scanning operations. (The only
ones
that wouldn't are those that simple duplicate existing operations.)
(2) It's also clear that precompiling the pattern is a significant win (I'd like
to see timing tests that included building the pattern *inside* the match
as well
as the prebuilt pattern. [At least, I'm assuming that the pattern is
prebuilt, and
that doing such nasties as:
double := PArbno(&letters) && PAny(&letters) $$ x && `x` &&
(PSpan(&letters) .| "")
PAny := write
line ?? double
wouldn't cause problems, but that:
PAny := write
double := PArbno(&letters) && PAny(&letters) $$ x && `x` &&
(PSpan(&letters) .| "")
line ?? double
would.])
Would it be possible to 'prebuild' scanning expressions in a similar way?
Note that this
is *much harder* than building patterns because of the (nice) integration
of scanning with
the rest of Unicon. But I believe it's the right way to go. For example,
consider the
following (valid!) Unicon program [I had been thinking about this as a
possible Generator
article, it's not just off the top of my head...]:
------------------------
procedure main(args)
aC := create (*tab(many('a')) = *tab(many('b')) = *tab(many('c')))
every line := !&input do {
if line ? (@^aC & pos(0)) then write("accepted")
else write("rejected")
}
end
------------------------
This is beautifully succinct, integrates perfectly with existing Unicon
(since it *is* existing
Unicon), and cleanly separates the scanning expression from its use, just
as prebuilding patterns
does. Its drawbacks are:
a. creating a coexpression is overkill for this task (it's slightly slower
than the
original, recursive solution on short input strings!)
b. the need to constantly 'refresh' the coexpression is an added expense,
since refreshing
a coexpression is also an expensive operation
c. combining patterns into bigger ones isn't as clean as it could be
So, what about considering something like:
aC := pattern (*span('a') = *span('b') = *span('c'))
(see those new scanning operations are already helping!).
Here pattern 'captures' an expression ala create, but in a much more
lightweight
fashion so that this pattern could be applied within string scanning with
(say) @aC.
This would decouple the efficiency issue from the syntax. An initial
version could be
implemented quickly by layering on top of the coexpression mechanism while
work proceeds
on how to improve the internal representation to make it more efficient.
(Since you
know you're working on a 'pattern' (a string scanning pattern, *not* an S4
one) the
compiler could perform some sort of transformation internally to improve
performance.
===========================================================
--
Steve Wampler -- [email protected]
The gods that smiled on your birth are now laughing out loud.
------------------------------------------------------------------------------
This SF.net email is sponsored by
Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev
_______________________________________________
Unicon-group mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/unicon-group