On 11/03/14 20:15, Benjamin Beaumont wrote:
Hi All,

We're in the process of adding some new chunk types in LiveCode 7 and we
would appreciate suggestions for a particular chunk name.

The new chunk types are:

naturalword (breaks on unicode word boundaries)

Well; in theory that looks good until you start to think about languages which are written (such as Sanskrit) with no obvious word boundaries and both vowel mutation (Sandhi)
at what would be word boundaries, and consonant fusion.

Languages such as Inuit and Hungarian are agglutinative, and in some cases what we (speakers of West European languages) would term a sentence consists of a single word with loads of affixes; some at
the front (prefixes).

Many Austronesian languages use infixes (i.e. twiddly bits shoved into the middle of 'words').

These also crop up in Afro-Asiatic languages such as Arabic.

There are also some examples in English such as "fan-f*cking-tabulous".

We could also get sweaty about circumfixes, where a bit gets put on the front and a bit gets put on the back as
a sort of split morpheme (not to be confused with split-pea bara).

sentence (breaks on unicode sentence boundaries)

That looks a bit fishy.

How are you going to work out what marks a sentence boundary in every language that one can write with Unicode? And there are languages where the idea of a 'sentence' is absent.

paragraph (Same behaviour as current 'line' chunk)

The first chunk is called 'naturalword' because 'word' is already in use.
Renaming the current 'word' chunk to 'token' to free up 'word' is not an
option for backward compatibility. We are also limited by the current
parser which doesn't allow us to use the form:

put natural word 1 of "this is a string of words"

'naturalword' is the clearest internal suggestion at the moment and we'd
love to get the input from community members if there is an even clearer
option.

I'm sorry to be such a "pill", but word and sentence boundaries are such culture-bound concepts that they will only be any good for languages that mark word and sentence boundaries.

This is about the same as stating dogmatically that "all bananas are yellow", when they are not.

Warm regards and thank you for your input.

You may not thank me.

Richmond.


Ben

_____________________________________________

Benjamin Beaumont . RunRev Ltd




_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Reply via email to