Can someone explain how the “sentence" chunk would work? How are decimal points, and points in an abbreviation distinguished from the “period” that deliniates the end of a “sentence?” Does it presume that the exitsing text has special embedded “periods?”
I’ve written my own, but it is very cumbersome and not flawless. I use it to do manuscript analysis. Like: Find all sentences in which “time” and “party” occur anywhere in the same sentence. My ignorance on unicode is profound. Jim C > Message: 15 > Date: Tue, 11 Mar 2014 18:15:18 +0000 > From: Benjamin Beaumont <b...@runrev.com> > To: LiveCode Developer List <livecode-...@lists.runrev.com>, How to > use LiveCode <use-livecode@lists.runrev.com> > Subject: New chunks > Message-ID: > <CADd0_Txbhdem4PbKXifXUsujqPLs9HROME6vKhF=sk1znp2...@mail.gmail.com> > Content-Type: text/plain; charset=ISO-8859-1 > > Hi All, > > We're in the process of adding some new chunk types in LiveCode 7 and we > would appreciate suggestions for a particular chunk name. > > The new chunk types are: > > naturalword (breaks on unicode word boundaries) > sentence (breaks on unicode sentence boundaries) > paragraph (Same behaviour as current 'line' chunk) > > The first chunk is called 'naturalword' because 'word' is already in use. > Renaming the current 'word' chunk to 'token' to free up 'word' is not an > option for backward compatibility. We are also limited by the current > parser which doesn't allow us to use the form: > > put natural word 1 of "this is a string of words" > > 'naturalword' is the clearest internal suggestion at the moment and we'd > love to get the input from community members if there is an even clearer > option. > > Warm regards and thank you for your input. > > Ben > > _____ _______________________________________________ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode