Hi Stephen,

Thanks for your input.
At the moment, I have 2 efficient functions with 2 different approaches for extracting all urls from the text of a framed page: The first one from Ken Ray uses regex with machText and another one I wrote uses items with quote as the item delimiter. Ken's solution is a bit slow but more reliable than mine that is much faster but a little bit silly ;-) I go on digging in and I shall share solutions on this list when it will be solid enough.

Le 24 mai 05 à 20:02, Stephen Barncard a écrit :

I would look for the word frameset in a tag inside a page, then get all the valid URLS inside the frame. Then I would check each URL for size, and pick the largest file, or the number of lines. That will be where the main content is.

Best regards from Paris,

Eric Chatonet.
----------------------------------------------------------------
So Smart Software

For institutions, companies and associations
Built-to-order applications: management, multimedia, internet, etc.
Windows, Mac OS and Linux... With the French touch

Plugins, tutorials and more on our website
----------------------------------------------------------------
Web site        http://www.sosmartsoftware.com/
Email        [EMAIL PROTECTED]/
Phone        33 (0)1 43 31 77 62
Mobile        33 (0)6 20 74 50 86
----------------------------------------------------------------

_______________________________________________
use-revolution mailing list
use-revolution@lists.runrev.com
http://lists.runrev.com/mailman/listinfo/use-revolution

Reply via email to