I think I am close the source of the problem which at this point I take to be the spurious introduction of "char[13]" into what should be line delimited lists. Where this char(13) is present in a GET request URL, libURL fails. (of course)

REPEAT for each line tOneGenus in tTestList
        --> extract the genus name first
        set the itemdel to "/"
        put item 5 of tOneGenus into tGenus
        put tGenus after tGeni
        delete item -1 of tOneGenus
        put tOneGenus & cr after tURLs
        put url (tOneGenus) into tOneGenusPage
        --> from each page we have to extract the species URLs
        REPEAT for each line x in tOneGenusPage
IF x contains ("/" & tGenus & "/") THEN put x & cr after tSpeciesPages
        END REPEAT

    END REPEAT

set the clipboarddata["text"] to tGeni

is generating a string on Mac OSX like this... it appears in the msg box as a long line with no spaces and no breaks


AcanthophoenixAcoelorrhapheAcrocomiaActinokentiaAdonidiaAiphanesAllagopt eraAlloschmidiaAlsmithiaI

If paste this here we see:

Acanthophoenix
Acoelorrhaphe
Acrocomia
Actinokentia
Actinorhytis
Adonidia
Aiphanes
Allagoptera
Alloschmidia
Alsmithia

If I do a byte by byte examination I get something interesting... char (13) is present after each one:

65,99,97,110,116,104,111,112,104,111,101,110,105,120,13,65,99,111,101,10 8,111,114,114,104,97,112,104,101,13,65,99,114,111,99,111,109,105,97,13,6 5,99,116,105,110,111,107,101,110,116,105,97,13,65,99,116,105,110,111,114 , 104,121,116,105,115,13,65,100,111,110,105,100,105,97,13,65,105,112,104,9 7,110,101,115,13,65,108,108,97,103,111,112,116,101,114,97,13,65,108,108, 111,115,99,104,109,105,100,105,97,13,65,108,115,109,105,116,104,105,97,1 3,

so, I'm not sure where or how this is being introduced.

but where the variable watcher is showing me

http://www.pacsoa.org.au/palms/Areca/index.html

 in fact that string is:

http://www.pacsoa.org.au/palms/Areca(char[13])/index.html

and this is what is causing the URL GET requests to break. If you paste it into a URL field in FireFox the char(13) is not passed (my assumption)

I have an odd feeling that Rev is introducing this... I could be wrong...

Here again is my script. This is easy to simulate for those who may be interested: make new stack. create two fields "Previewer" and "Logfield" and one button with following script:

--> all handlers

ON mouseup

    set the cursor to busy
    getPalms

    # previous crawlers

    --getGaneshas

END mouseup

ON getPalms
    --> site is: http://www.pacsoa.org.au/palms/index.html

    # we need to dig every */palms/*.html  file on this page
    # so first is to extract all the URL's

   -- put fld "MainURL" into tStartURL

put "http://www.pacsoa.org.au/palms/index.html"; into tStartURL

    put URL tStartURL into tMainListing
    REPEAT for each line x in tMainListing
        IF x contains "/palms/"  THEN # we got one for sure
            put x & cr after tPalmList
        END IF

    END REPEAT

    --check it out
    delete line 1 to 2 of tPalmList
    delete line -1 of tPalmList

    put  "<[^><]*>" into tRex
    put replacetext(tPalmList, tRex, "") into tPalmsList
    replace " " with "" in tPalmsList

    REPEAT for each line x in tPalmslist
        put "http://www.pacsoa.org.au/palms/"; before x
        put "/index.html" after x
        put x & cr after tGenusListing
    END REPEAT


    --> Step through Genus listing

    put line 1 to 10 of tGenusListing into tTestList

    liburlSetLogField the long id of field "logField"
    --repeat for each line tOneGenus in tGenusListing
    REPEAT for each line tOneGenus in tTestList
        --> extract the genus name first
        set the itemdel to "/"
        put item 5 of tOneGenus into tGenus
        put tGenus after tGeni
        delete item -1 of tOneGenus
        put tOneGenus & cr after tURLs


        put url (tOneGenus) into tOneGenusPage

        --> from each page we have to extract the species URLs
        REPEAT for each line x in tOneGenusPage
IF x contains ("/" & tGenus & "/") THEN put x & cr after tSpeciesPages
        END REPEAT

    END REPEAT
    set the clipboarddata["text"] to tGeni


    --> Load the Species URL's and then save and .jpg file therein


END getPalms







_______________________________________________
use-revolution mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-revolution

Reply via email to