Have pondered the question, and come up with some code which may or may not 
solve the problem at hand, but which may at least prove helpful in looking for 
a real solution:

==========================

Assumption: You’ve got a text document (not HTML, not RTF, just plain TXT) 
which contains, among other things, however-many place names.
Assumption: You have a return-list of place names, which may or may not be 
single words
Assumption: The text document is in the variable SourceDoc
Assumption: The list of place names is in the variable NamesList

Assumption: You want a document which contains a complete census of exactly 
which of the place-names in NamesList occur in SourceDoc
Assumption: For each place-name which does occur within SourceDoc, you want a 
list of which word-numbers each such occurrance begins at

put “” into PlaceNamesCensus
repeat for each line DisName in NamesList
  put the number of words in DisName into DisNameWords
  put 0 into SearchOffset
  put “” into FoundLocs
  repeat
    put offset (DisName, SourceDoc, SearchOffset) into DisLoc
    if DisLoc = 0 then
      -- there is no character string which matches the place name in question
      end repeat
    else
      —- there is a character string which matches the place name in question
      —- is it the actual placename, and not finding “chester” in “colchester”?
      put the number of words in (char 1 to DisLoc of SourceDoc) into StartWord
      if DisName = (word StartWord to (StartWord + DisNameWords - 1) of 
SourceDoc) then
        -- it’s a match, yay!
        put StartWord into item (1 + the number of items in FoundLocs) of 
FoundLocs
      end if
      add DisLoc to SearchOffset
    end if   
  end repeat
  if FoundLocs <> “” then
    —- nope, DisName wasn’t in SourceDoc
    put “[nil]” into DeseLocs
  else
    —- yay! DisName *was* in SourceDoc! at least once!
    put FoundLocs into DeseLocs
  end if
      put DisName & comma & DeseLocs into line (1 + the number of lines in 
PlaceNamesCensus) of PlaceNamesCensus
end repeat

==========================

Known issue: The above code does not pretend to locate possessive instances of 
place names (i.e., California's, the United Kingdom's, etc). Am thinking that 
pre-processing of SourceDoc will be helpful-to-necessary. This pre-processing 
may need to accommodate more issues than just possessives.
 

"Bewitched" + "Charlie's Angels" - Charlie = "At Arm's Length"
Read the webcomic at [ http://www.atarmslength.net ]!
If you like "At Arm's Length", support it at [ 
http://www.patreon.com/DarkwingDude ].
_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to