Wow, this is awesome, thank you all!!

Sorry, on the road taking my daughter to college, would love to try some of 
this out. 

One thing to keep in mind is that as that I’m checking for names against the 
town list, I may not know what town I’m actually looking for. Usually i do, but 
not always. 

Therefore i’ve been counting how many of each name I’ve come across and do some 
calculations at the end to make a best guess. 

Really appreciate the responses!!

Thank you,

Steve

> On Sep 1, 2018, at 7:53 AM, Richmond Mathewson via use-livecode 
> <use-livecode@lists.runrev.com> wrote:
> 
> 
> 
>> On 1/9/2018 2:50 pm, Mark Waddingham via use-livecode wrote:
>>> On 2018-09-01 13:15, Richmond Mathewson via use-livecode wrote:
>>> I've already shovelled Ruyton of the Eleven Towns quite effectively:
>>> 
>>> https://www.dropbox.com/s/n7r7u0c2m9ny3eb/Text%20analyzer%20X.livecode.zip?dl=0
>>>  
>>> 
>>> No tokenising, in fact very basic stuff indeed.
>>> 
>>> Not wishing to bang on about over-complcating things . . . . .
>> 
>> There is actually a 'correct' more shovelistic approach (at least I *think* 
>> this is correct):
>> 
>> -- Ensure all punctuation is surrounded by space
>> repeat for each char tPuncChar in ",.';:()[]{}<>!@£$%^&*-_+=~`?/\|#€" & quote
>>  replace tPuncChar with space & tPuncChar & space in tText
>> end repeat
> 
> Thats a "point" (pun intended) as I just fell foul of a full stop.
>> 
>> -- Ensure all whitespace is space
>> replace return with space in tText
>> replace tab with space in tText
>> 
>> -- Ensure there is never two spaces next to each other in tText
>> repeat while tText contains "  "
>>  replace "  " with " " in tText
>> end repeat
>> 
>> -- Ensure there is only ever one space between words in phrases
>> repeat while tPhrases contains "  "
>>  replace "  " with " " in tPhrases
>> end repeat
>> 
>> -- We can now use an itemDelimiter of space
>> set the itemDelimiter to space
>> 
>> -- Sort the phrases by descending word length.
>> sort lines of tPhrases descending numeric by the number of items in each
>> 
>> -- Now check for, and remove each phrase from the source text in turn
>> set the wholeMatches to true
>> repeat for each line tPhrase in tPhrases
>>  -- If the phrase is not present then skip to the next
>>  if itemOffset(tPhrase, tText) is 0 then
>>    next repeat
>>  end if
>> 
>>  -- Accumulate the phrase on the output list
>>  put tPhrase & return after tFoundPhrases
>> 
>>  -- Remove the phrase from the input text (we assume here that * does not 
>> appear in any phrase)
>>  replace tPhrase with "*" in tText
>> end repeat
>> 
>> Warmest Regards,
>> 
>> Mark.
>> 
>> P.S. The above will be reasonable quick for small sets of phrases / small 
>> source texts - but I think as the size of either increases it will get very 
>> slow, very quickly!
>> 
> 
> 
> _______________________________________________
> use-livecode mailing list
> use-livecode@lists.runrev.com
> Please visit this url to subscribe, unsubscribe and manage your subscription 
> preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode



_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to