On Thu, 2003-11-06 at 14:27, Tom Foster wrote:
> Hi Everybody,
>
> I've lurked here for a while, but am not a programmer, much less a
> unicon programmer. Still, I have a task at hand. I'd like to
> strip all the <img src="*"> tags from some html pages. The tags are
> sometimes broken across lines.
>
> can someone give an example of how this might be done with unicon?
If the files aren't *too* big (and I've used a similar
technique for files up to 10MB), just join all the
lines (it's ok to leave the line separators in) and
use bal() in string scanning. Something like
(danger! untested and incomplete code follows!!!):
# produce a single string holding an entire file...
#
procedure readdata(f)
local s
s := ""
while s ||:= reads(f, 500000)
return s
end
# Balance image tags...
# (assumes '<', and '>', can't appear in strings,
# which I think is true of html data, otherwise, you
# should make a pass to hide embedded '<' and '>')
# Behavior is similar to bal - assumes it is
# called during string scanning when immediately
# in front of a '<'.
#
# This could be improved (made more general) and is
# probably buggy!!
#
procedure ibal()
match("<img src=") | fail # Cannot match
suspend bal(,'<','>')
end
So, the entire process would be something like (buggy, I'm sure):
procedure main() # read from stdin, write to stdout
# Assumes properly formated html...
readdata(&input) ? {
while write(tab(upto('<')) do {
tab(ibal()) # Strips out the img tag, if one
}
write(tab(0))
}
return
end
You might see a few lines become longer (where the stripped out
tag crosses line boundaries, the 'non-stripped' portions will
be joined into a single line), but since this is html, it shouldn't
matter.
I hope this helps! You should check bal() and make sure it
positions *after* the matching '>', as I've forgotten am and
not near my books...
-Steve
--
Steve Wampler <[EMAIL PROTECTED]>
-------------------------------------------------------
This SF.net email is sponsored by: SF.net Giveback Program.
Does SourceForge.net help you be more productive? Does it
help you create better code? SHARE THE LOVE, and help us help
YOU! Click Here: http://sourceforge.net/donate/
_______________________________________________
Unicon-group mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/unicon-group