On 9 Sep 2006, at 03:57, Richard Gaskin wrote:
So I have two questions about the sort of variable-based methods
for filtering SGML-style tags and using a field object to so the same:
1. Which is more forgiving of html which may not be well-formed?
2. Which is faster?
A quick initial test. The html used was what returned from
get URL "http://google.com"
I amended a tag that was <b>Web</b>
to <b>Web 5<10 </b>
and then
<b>Web 10>5 </b>
I ran 100 iterations of each of the following
1)
function stripHtmlTagsUsingField tHtml
set the htmlText of fld "hiddenFld" to tHtml
return the text of fld "hiddenFld"
end stripHtmlTagsUsingField
This took 370 ms
it failed on 5<10 (the tag content returned was "Web 5", though the
following content was ok)
it succeeded with 10>5
2)
function stsStripHTML what
replace cr with empty in what -- my addition to Kens handler - to
handle tags containing cr
put replaceText(what,"<.*?>","") into noHTML
return noHTML
end stsStripHTML
This took 920 ms
Same results as 1)
3)
function stripHtmlTags tHtml
replace cr with empty in tHtml -- in case of multi-line tags
replace "<" with cr & "<" in tHtml
replace ">" with ">" & cr in tHtml
filter tHtml without "<*>"
repeat for each line LNN in tHtml
put word 1 to -1 of LNN & cr after newHtml
end repeat
filter newHtml without empty
replace cr with space in newHtml
return newHtml
end stripHtmlTags
This took 45 ms
Semi-succeeded with the amended tag content, in that 10>5 became 10>
5 (additional space) and 5<10 became 5 <10.
The hidden field approach was the only one that translated html
entities.
Best,
Mark
_______________________________________________
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution