There have been some very nice solutions posted. Most have
the advantage over the one given here that they produce
strings that can be processed further. (The solution
here could be rewritten in a similar fashion, of course).
This solution assumes the program itself is a filter,
reading from standard input and writing to standard output.
It's pure Unicon and uses some procedures and classes
from the 'Unilib' library found at:
http://tapestry.tucson.az.us/unicon
A short explanation of some of the features follows the code:
----------------------------------------------------------------
#<p>
# Flatten a tag by removing nested instances.
#</p>
import Utils # for StringUtils, ScanUtils, and BlockRead
procedure main(args)
local tag
tag := zapPrefix(!args, "--tag=") | "x"
flatten("<"||tag||">", "</"||tag||">")
end
procedure flatten(startTag, endTag)
local bRead, ff, inside
bRead := BlockRead()
ff := FindFirst([startTag, endTag])
inside := 0
while bRead.readBlock() ? {
while writes(tab(ff.locate())) do {
case ff.moveMatch() of {
startTag: if (inside +:= 1) = 1 then writes(startTag)
endTag: if (inside -:= 1) = 0 then writes(endTag)
}
inside <:= 0 # malformed input! reset and continue
}
writes(tab(0))
}
if inside > 0 then write(endTag) # malformed input! repair
return
end
-------------------------------------------------------------------
(1) The procedure zapPrefix(string, prefix) succeeds only if prefix
is a prefix of string and produces string with prefix removed.
(I like having named arguments on command-line interfaces.)
(2) The class BlockRead() reads in ASCII text files in large
blocks, but guarantees that each block ends at (including)
a newline. By default, a BlockRead reads up to 500,000
characters at a time. On large files, this can be a
significant performance gain, depending on the processing
done on the resulting input string.
(3) Find first is a blend of find() and upto(). Like find, it
finds substrings. Like upto, it finds alternatives
in the order they appear in the subject. The locate()
method finds the next matching substring, and moveMatch()
moves over the last found substring (returning that substring
in the process). The class avoids rescanning the subject
in places where it knows the result (that is, moveMatch()
does not rematch the substring already discovered by
locate().
(4) The flatten procedure itself just counts start and end tags
(as the other solutions also all do) in the same way that
bal() works. As written, it outputs the results immediately
instead of building up a new string internally. That could
be considered as either an advantage or a disadvantage.
(5) Strictly speaking the lines that attempt repairs on
malformed input aren't needed (and may even be the
wrong thing to do!) They seem like reasonable
repair actions, though, and easily removed.
In the next message I'll post a solution to the secondary
problem of handling multiple indepedent tags. (William
Mitchell's solution already handles this case...)
--
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.
-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g.
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
_______________________________________________
Unicon-group mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/unicon-group