Hi Heather,
So simple ;-)
I don't think that, some day, there will be a built-in function for
retrieving plain text from html because html evolves every day and as
you pointed it out, the code snippet I sent does not take CSS into
account: it was written three years ago :-(
If you enhance it, please share ;-)
The way that Jacque was talking about is interesting when you want to
parse the text 'invisibly' but is not satisfying for displaying it.
Unfortunately: it would so simple :-)
Good luck and prefer knitting code than sweaters :-)
Le 2 août 08 à 17:19, H Baric a écrit :
Well, you know I could have thought of that!
So simple and obvious really isn't it!
I mean, I could have just asked my two year old instead!
:-o
:-|
Well, I was going to just take myself to bed when I saw all that
code, but
at least I could understand it, and so decided to just tried it out...
And it works except - all the CSS remains! (Anyone ever heard of
linked
stylesheets sheesh!)
So rather than add a million more lines to the script (would it
ever be
complete!), I'm thinking I shall give up for now, at least until
tomorrow
when I am well slept, and can think up nice little incomplicated
things to
create for the purpose of keeping the old brain cells alive.
Thanks for your help again Eric.
Heather, who is determined to be a programmer when she grows up.
At 36yrs though, she is wondering if she should just stick to
knitting.
on knitOne ; select chunk of wool ; tie it in a knot ; create
noose ; end
knitOne
----- Original Message -----
From: "Eric Chatonet" <[EMAIL PROTECTED]>
To: "How to use Revolution" <use-revolution@lists.runrev.com>
Sent: Sunday, August 03, 2008 12:33 AM
Subject: Re: Getting the text content of a HTML page
Re,
Le 2 août 08 à 16:31, H Baric a écrit :
* Get the text only from a web page - no html tags, no formatting
etc.
LOL
This is a case that needs some additional code snippet as I said in a
previous email :-)
put StripTags(thePage) into field "The Page"
---------------------------------------------------------
function StripTags pHtml -- returns the meaningful text from a web
page
local tRegex,tPrevText
constant kHtml =
"é,à,ç,>,<,ecirc;,è,©,•,&#
39
;,·,&"
constant kConvertedHtml = "é,à,ç,>,<,ê,è,©,•,',·,&"
-----
replace return with space in pHtml
replace numtochar(13) with empty in pHtml
replace tab with empty in pHtml
-----
put replacetext(pHtml,"(?Usi)<SCRIPT.*</SCRIPT>","") into pHtml
put replacetext(pHtml,"(?Usi)<STYLE>.*</STYLE>","") into pHtml
put replacetext(pHtml,"(?Usi)<\?.*\?>","") into pHtml
-----
replace " " with space in pHtml
replace "<BR>" with return in pHtml
replace "<p>" with return in pHtml
-----
put "<[^><]*>" into tRegex
put replacetext(pHtml,tRegex,"") into pHtml
put replacetext(pHtml,tRegex,"") into pHtml
-----
repeat until tPrevText is pHtml
put pHtml into tPrevText
put replacetext(pHtml," +",space) into pHtml
put replacetext(pHtml,"^ ","") into pHtml
end repeat
-----
replace (space & return) with return in pHtml
replace (return & space) with return in pHtml
filter pHtml without empty
-----
replace """ with quote in pHtml
repeat with i = 1 to the number of items of kHtml
replace item i of kHtml with item i of kConvertedHtml in pHtml
end repeat
-----
return pHtml
end StripTags
Best regards from Paris,
Eric Chatonet.
Best regards from Paris,
Eric Chatonet.
----------------------------------------------------------------
Plugins and tutorials for Revolution: http://www.sosmartsoftware.com/
Email: [EMAIL PROTECTED]/
----------------------------------------------------------------
_______________________________________________
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution