Re: remove html tags from text

Mark Smith Sat, 09 Sep 2006 00:58:09 -0700


On 9 Sep 2006, at 03:57, Richard Gaskin wrote:

So I have two questions about the sort of variable-based methodsfor filtering SGML-style tags and using a field object to so the same:
1. Which is more forgiving of html which may not be well-formed?

2. Which is faster?


A quick initial test. The html used was what returned from

get URL "http://google.com";

I amended a tag that was <b>Web</b>
to  <b>Web 5<10 </b>
and then
<b>Web 10>5 </b>


I ran 100 iterations of each of the following

1)
function stripHtmlTagsUsingField tHtml
  set the htmlText of fld "hiddenFld" to tHtml
  return the text of fld "hiddenFld"
end stripHtmlTagsUsingField

This took 370 ms

it failed on 5<10 (the tag content returned was "Web 5", though thefollowing content was ok)

it succeeded with 10>5

2)
function stsStripHTML what

replace cr with empty in what -- my addition to Kens handler - tohandle tags containing cr

  put replaceText(what,"<.*?>","") into noHTML
  return noHTML
end stsStripHTML

This took 920 ms
Same results as 1)

3)
function stripHtmlTags tHtml
  replace cr with empty in tHtml -- in case of multi-line tags
  replace "<" with cr & "<" in tHtml
  replace ">" with ">" & cr in tHtml
  filter tHtml without "<*>"
  repeat for each line LNN in tHtml
    put word 1 to -1 of LNN  & cr after newHtml
  end repeat
  filter newHtml without empty
  replace cr with space in newHtml
  return newHtml
end stripHtmlTags

This took 45 ms

Semi-succeeded with the amended tag content, in that 10>5 became 10>5 (additional space) and 5<10 became 5 <10.

The hidden field approach was the only one that translated htmlentities.


Best,

Mark




_______________________________________________
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Re: remove html tags from text

Reply via email to