On 8/5/08 10:18 AM, "Richard Gaskin" <[EMAIL PROTECTED]> wrote:
> But given its blindingly fast performance and the scope of things it > handles in well-optimized machine-compiled code in the engine, it seems > a good starting point for a more complete function which would have > relatively little other cleanup work to do after using it. Agreed that the htmlText is the fastest method for removing balanced tags. It is good to know all the benchmarking results you produce. I save these since I want to know, and it is better that the same techniques are used so the results can be compared. Thanks for the good info. What I have needed in my apps is the ability to parse the raw html and extract certain tags and user visible text, then extract the data. In other words, --> data mining. One example is several charts of stock data shown on a page. The column headers are text that is repeated many times on the page, so that particular text is not good for isolating a particular table, but in almost every case, the html tags do allow that specificity. After isolating a table by using the tags, then using the text column headers makes sure that I will be extracting the correct data, even if the publisher of the web page moves the columns or tables. Now I have the correct values to add to my database. Of course I do error checking on the data values before assuming the page is accurate. Another case of needing the tags is to test if the web server has sent back a special condition, such as "interrupted, not available, maintenance" Another case is looking for the absence of tags that mean missing data or incomplete server delivery. Jim Ault Las Vegas _______________________________________________ use-revolution mailing list use-revolution@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution