Re: First 1000 characters without loop?

Mike Kerner via use-livecode Fri, 23 Jun 2017 06:15:41 -0700

Oh.
Now I know why I kept getting beaten up during class as a kid - because I'd
ask some question and then the teacher would do a Mark - and then ALL of it
would end up on the test.


On Fri, Jun 23, 2017 at 5:09 AM, Mark Waddingham via use-livecode <
[email protected]> wrote:

> On 2017-06-23 03:07, Peter W A Wood via use-livecode wrote:
>
>> Some Unicode characters, such as emojis, have to be represented by two
>> codepoints in UTF-16 (known as surrogates) so they take four bytes not
>> two. Additionally, the number of bytes for characters with accents
>> will take either one codepoint or two depending on whether they have
>> been coded in pre-composed or decomposed form. (e.g. ç can be either
>> U+0063 U+0327 (decomposed) or U+00E7 (precomposed).
>>
>> So it is isn’t easy to estimate the number of bytes in a UTF-16 string.
>>
>
> The number of bytes used by a string when encoded as UTF-16 is '2 * the
> number of codeunits in tString'.
>
> The number of codeunits in a string in LiveCode is a stored property of
> the string, so doesn't require any computation. (We took the decision that
> regardless of how a string is stored internally, it should always be
> possible to ask for the number of codeunits in constant time, and to be
> able to look up a codeunit in constant time).
>
> Note: codeunit is not the same as codepoint and codepoint is not the same
> as character. Both codepoint and character require scanning the string (in
> the general case) to both compute the i'th one, and to compute the length.
>
> In contrast (to UTF-16), if you want the number of bytes a string takes up
> in UTF-8 encoding then you also have to scan the string as a codepoint in
> UTF-8 can be 1-4 bytes in length.
>
> I would guess that LiveCode will store the characters of a string in
>> single bytes if all the letters of the string conform to ISO-8859-1.
>> So if you can be certain that your text is all ISO-8859-1 encoded, you
>> can estimate at 1 byte per character. (The guess is base on the fact
>> that the first 256 Unicode code points replicate ISO-8859-1).
>>
>
> Almost true - the engine stores strings which can be fit into the running
> platform's 'legacy' (in terms of pre 7.0) encoding (ISO8859-1, Latin-1,
> MacRoman) in that encoding in memory. This means that stacks written
> pre-unicode will use the same amount of memory, same amount of processing
> time as they did before.
>
> The reason this works is because all three of those encodings have the
> property that when they are converted to Unicode, the number of codeunits
> in the Unicode version is the same as the number of codes (indeed, bytes in
> this case) in the original string.
>
> Warmest Regards,
>
> Mark.
>
> --
> Mark Waddingham ~ [email protected] ~ http://www.livecode.com/
> LiveCode: Everyone can create apps
>
> _______________________________________________
> use-livecode mailing list
> [email protected]
> Please visit this url to subscribe, unsubscribe and manage your
> subscription preferences:
> http://lists.runrev.com/mailman/listinfo/use-livecode
>



-- 
On the first day, God created the heavens and the Earth
On the second day, God created the oceans.
On the third day, God put the animals on hold for a few hours,
   and did a little diving.
And God said, "This is good."
_______________________________________________
use-livecode mailing list
[email protected]
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: First 1000 characters without loop?

Reply via email to