On 9 Sep 2006, at 03:57, Richard Gaskin wrote:

So I have two questions about the sort of variable-based methods for filtering SGML-style tags and using a field object to so the same:

1. Which is more forgiving of html which may not be well-formed?

2. Which is faster?


A quick initial test. The html used was what returned from

get URL "http://google.com";

I amended a tag that was <b>Web</b>
to  <b>Web 5<10 </b>
and then
<b>Web 10>5 </b>


I ran 100 iterations of each of the following

1)
function stripHtmlTagsUsingField tHtml
  set the htmlText of fld "hiddenFld" to tHtml
  return the text of fld "hiddenFld"
end stripHtmlTagsUsingField

This took 370 ms
it failed on 5<10 (the tag content returned was "Web 5", though the following content was ok)
it succeeded with 10>5

2)
function stsStripHTML what
replace cr with empty in what -- my addition to Kens handler - to handle tags containing cr
  put replaceText(what,"<.*?>","") into noHTML
  return noHTML
end stsStripHTML

This took 920 ms
Same results as 1)

3)
function stripHtmlTags tHtml
  replace cr with empty in tHtml -- in case of multi-line tags
  replace "<" with cr & "<" in tHtml
  replace ">" with ">" & cr in tHtml
  filter tHtml without "<*>"
  repeat for each line LNN in tHtml
    put word 1 to -1 of LNN  & cr after newHtml
  end repeat
  filter newHtml without empty
  replace cr with space in newHtml
  return newHtml
end stripHtmlTags

This took 45 ms
Semi-succeeded with the amended tag content, in that 10>5 became 10> 5 (additional space) and 5<10 became 5 <10.

The hidden field approach was the only one that translated html entities.

Best,

Mark




_______________________________________________
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Reply via email to