Re: [Unicon-group] A new 'programming challenge'...

Steve Wampler Tue, 01 Jun 2004 07:16:14 -0700

There have been some very nice solutions posted.  Most have
the advantage over the one given here that they produce
strings that can be processed further.  (The solution
here could be rewritten in a similar fashion, of course).


This solution assumes the program itself is a filter,
reading from standard input and writing to standard output.
It's pure Unicon and uses some procedures and classes
from the 'Unilib' library found at:

    http://tapestry.tucson.az.us/unicon

A short explanation of some of the features follows the code:
----------------------------------------------------------------
#<p>
#  Flatten a tag by removing nested instances.
#</p>

import Utils        # for StringUtils, ScanUtils, and BlockRead

procedure main(args)
    local tag

    tag := zapPrefix(!args, "--tag=") | "x"
    flatten("<"||tag||">", "</"||tag||">")
end

procedure flatten(startTag, endTag)
    local bRead, ff, inside

    bRead := BlockRead()
    ff := FindFirst([startTag, endTag])
    inside := 0

    while bRead.readBlock() ? {
        while writes(tab(ff.locate())) do {
            case ff.moveMatch() of {
                startTag: if (inside +:= 1) = 1 then writes(startTag)
                endTag:   if (inside -:= 1) = 0 then writes(endTag)
                }
            inside <:= 0  # malformed input! reset and continue
            }
        writes(tab(0))
        }
    if inside > 0 then write(endTag)    # malformed input! repair

    return
end
-------------------------------------------------------------------

(1) The procedure zapPrefix(string, prefix) succeeds only if prefix
     is a prefix of string and produces string with prefix removed.
     (I like having named arguments on command-line interfaces.)

(2) The class BlockRead() reads in ASCII text files in large
     blocks, but guarantees that each block ends at (including)
     a newline.  By default, a BlockRead reads up to 500,000
     characters at a time.  On large files, this can be a
     significant performance gain, depending on the processing
     done on the resulting input string.

(3) Find first is a blend of find() and upto().  Like find, it
     finds substrings.  Like upto, it finds alternatives
     in the order they appear in the subject.  The locate()
     method finds the next matching substring, and moveMatch()
     moves over the last found substring (returning that substring
     in the process).  The class avoids rescanning the subject
     in places where it knows the result (that is, moveMatch()
     does not rematch the substring already discovered by
     locate().

(4) The flatten procedure itself just counts start and end tags
     (as the other solutions also all do) in the same way that
     bal() works.  As written, it outputs the results immediately
     instead of building up a new string internally.  That could
     be considered as either an advantage or a disadvantage.

(5) Strictly speaking the lines that attempt repairs on
     malformed input aren't needed (and may even be the
     wrong thing to do!)  They seem like reasonable
     repair actions, though, and easily removed.

In the next message I'll post a solution to the secondary
problem of handling multiple indepedent tags.  (William
Mitchell's solution already handles this case...)


-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.


-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
_______________________________________________
Unicon-group mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/unicon-group

Re: [Unicon-group] A new 'programming challenge'...

Reply via email to