Re: Nicest UTF

Philippe Verdy Tue, 07 Dec 2004 05:36:22 -0800

From: "D. Starner" <[EMAIL PROTECTED]>

If you're talking about a language that hides the structure of strings
and has no problem with variable length data, then it wouldn't matter
what the internal processing of the string looks like. You'd need to
use iterators and discourage the use of arbitrary indexing, but arbitrary
indexing is rarely important.

I fully concur to this point of view. Almost all (if not all) string processing can be performed in terms of sequential enumerators, instead of through random indexing (which has also the big disavantage of not allowing with rich context dependant processing behaviors, something you can't ignore when handling international texts).

So internal storage of string does not matter for the programming interface of parsable string objects. In terms of efficiency and global application performance, using compressed encoding schemes is highly recommanded for large databases of text, because the negative impact of the decompressing overhead is extremely small face to the huge benefits you get when reducing the load on system resources, on data locality and on memory caches, on the system memory allocator, on the memory fragmentation level, on reduced VM swaps and on file or database I/O (which will be the only effective limitation for large databases).

Re: Nicest UTF

Reply via email to