Jim Ault wrote:

The problem with this may be that it only looks for alpha chars,
not spaces or numbers, quotes or equal signs
therefore it finds less matches depending on the html

oops, these don't match and won't be replaced with empty  --------------
<img src="somebody.jpg" width="160">
<img src="somebody.jpg" width="160" />
<div class="mainFormat">
<table cellpadding="" width=100%">
<b />
<hr />

works on this tag  -------------
<B>Making this bold</B>

put "" into newString
put "(?U)<.*> into regEx
put replaceText(myText,regEx,newString) into myText

I put that into this function:

function RegexMethod pHtml
  put "" into newString
  put "(?U)<.*>" into regEx
  return replaceText(pHtml,regEx,newString)
end RegexMethod

...and then ran it on the HTML source for this page:

<http://mail.runrev.com/pipermail/use-revolution/2008-August/113074.html>

It catches just about everything except for the mailto near the top:

<A HREF="mailto:use-revolution%40lists.runrev.com?Subject=Getting%20the%20text%20content%20of%20a%20HTML%20page&In-Reply-To=f99b52860808031334l44f6cd1by6ed2444fb32560ac%40mail.gmail.com";
       TITLE="Getting the text content of a HTML page">

Presumably this is because that tag is broken onto two lines.

This function takes care of that, and this far benchmarks about an order of magnitude faster:


function HtmlTextMethod pHtml
  put the properties of the templateField into tSaveProps
  set the htmlText of the templateField to pHtml
  get the text of the templateField
  set the properties of the templateField to tSaveProps
  return it
end HtmlTextMethod


--
 Richard Gaskin
 Managing Editor, revJournal
 _______________________________________________________
 Rev tips, tutorials and more: http://www.revJournal.com
_______________________________________________
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Reply via email to