On Jan 26, 2007, at 10:56 AM, Chipp Walters wrote:

function stripAllTagsBut pHtml,pTagsList
 --> pTagsList IS A LIST OF TAGS NOT TO EXCLUDE FROM PARSING
 --> EX. LINE 1 OF pTagsList CAN BE "img" AND LINE 2 CAN BE "b", etc..


It's used to strip all tags from HTML but those in the pTagsList parameter.

IOW, it can be used to grab the HTML of a page, and strip everything but the
img tags.

I'm starting to write it, but thought I'd ask-- just in case.

Well, I have one I've been working on that takes a list of things to strip. You could modify it to fit your needs maybe. The first version was much more compact and use matchText. Then I stress tested it and it was slooooow and I had to call it quite often and with large amounts of text. So I came up with the attached version.


--
Trevor DeVore
Blue Mango Learning Systems - www.bluemangolearning.com
[EMAIL PROTECTED]



/**
 * Cleanses a string of the specified Revolution HTML tags.
 *
 * @param  pHTML               HTML to act on.
* @param pStripFilter List of tags to strip: p,size,face,lang,color,bgcolor,b,i,u,strike,sub,sup,box,threedbox,expand ed,condensed,img,a. * @param pStripTrailingCR pass true to strip any trailing CR from END of the pHTML.
 *
 * @return empty
 */
FUNCTION str_stripHTML pHTML, pStripFilter, pStripTrailingCR
    local tProp,tFontFilter,tInlineFilter,tAttributeFilter,tStart,tEnd
    local tSkip,tOffset1,tOffset2,tDeleteChars,i

    set the wholematches to true

    --> PROCESS pStripFilter
    REPEAT for each item tProp in pStripFilter
IF tProp is among the items of "p,b,i,u,strike,sub,sup,box,threedbox,expanded,condensed" THEN
            put tProp &comma after tAttributeFilter
ELSE IF tProp is among the items of "face,size,color,bgcolor,lang" THEN
            put tProp &comma after tFontFilter
        ELSE IF tProp is among the items of "img,a" THEN
            put tProp & comma after tInlineFilter
        END IF
    END REPEAT

    --> PROCESS
REPEAT forever --> OK, I TRIED USING MATCHCHUNK WITH THIS BUT IT WAS A GAZILLION TIMES SLOWER
        put offset("<font", pHTML, tSkip) into tOffset1
        IF tOffset1 > 0 THEN
put offset(">", pHTML, tSkip + tOffset1) into tOffset2 -- > GET CLOSING TAG

            --> LOOP THROUGH PROPS AND ERASE
            REPEAT for each item tProp in tFontFilter
put offset(space & tProp & "=" & quote, pHTML, tSkip + tOffset1) into tStart IF tStart > 0 AND tSkip + tOffset1 + tStart < tSkip + tOffset1 + tOffset2 THEN --> ONLY LOOK FOR PROPS IN CURRENT FONT TAG
                    get tSkip + tOffset1 + tStart + length(tProp) + 2
                    put offset(quote, pHTML, it) into tEnd
                    IF tEnd > 0 THEN
put tSkip + tStart + tOffset1 & comma & it + tEnd & cr after tDeleteChars
                    END IF
                END IF
            END REPEAT

            --> NOW MOVE BACKWARDS THROUGH LIST AND DELETE
REPEAT with i = the number of lines of tDeleteChars down to 1 delete char (item 1 of line i of tDeleteChars) to (item 2 of line i of tDeleteChars) of pHTML
            END REPEAT
            put empty into tDeleteChars
        ELSE
            exit REPEAT
        END IF
        add tOffset1 + 4 to tSkip
    END REPEAT

    REPEAT for each item tProp in tAttributeFilter
        replace "<"&tProp&">" with empty in pHTML
        replace "</"&tProp&">" with empty in pHTML
    END REPEAT

    REPEAT for each item tProp in tInlineFilter
        REPEAT forever
            put offset("<"&tProp, pHTML) into tStart
            IF tStart > 0 THEN
                put offset(">", pHTML, tStart) into tEnd
                IF tEnd > 0 THEN
                    delete char tStart to (tStart+tEnd) of pHTML
                ELSE
                    exit REPEAT
                END IF
            ELSE
                exit REPEAT
            END IF
        END REPEAT
    END REPEAT

    IF "a" is among the items of tInlineFilter THEN
        replace "</a>" with empty in pHTML
    END IF

    --> REMOVE ANY LONELY <FONT> TAGS
    REPEAT forever
        put offset("<font>", pHTML) into tStart
        IF tStart > 0 THEN
            put offset("</font>", pHTML, tStart) into tEnd
            IF tEnd > 0 THEN
                delete char tStart+tEnd to tStart+tEnd+6 of pHTML
                delete char tStart to tStart+5 of pHTML
            ELSE
                exit REPEAT
            END IF
        ELSE
            exit REPEAT
        END IF
    END REPEAT

    --> REMOVE TRAILING RETURNS
    IF pStripTrailingCR THEN
        REPEAT forever
            IF char -8 to -1 of pHTML is cr&"<p></p>" THEN
                delete char -8 to -1 of pHTML
            ELSE
                exit REPEAT
            END IF
        END REPEAT
    END IF

    return pHTML
END str_stripHTML

_______________________________________________
use-revolution mailing list
use-revolution@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Reply via email to