The internal string API

2001-06-19 Thread Dan Sugalski
Since we're going to try and take a shot at being encoding-neutral in the core, we're going to need some form of string API so the core can actually manipulate string data. I'm thinking we'll need to be able to at least do this with string: * Convert from and to UTF-32 * lengths in bytes,

RE: The internal string API

2001-06-19 Thread Hong Zhang
* Convert from and to UTF-32 * lengths in bytes, characters, and possibly glyphs * character size (with the variable length ones reporting in negative numbers) What do you mean by character size if it does not support variable length? * get and set the locale (This might not be the spot

Re: The internal string API

2001-06-19 Thread Dan Sugalski
At 02:31 PM 6/19/2001 -0500, Jarkko Hietaniemi wrote: I think you misunderstand my point. It is a property of the code region, but a property of the context in which is the code is running. For example, Taiwanese read traditional chinese characters, but PRC people read simplied chinese.

Re: The internal string API

2001-06-19 Thread Dan Sugalski
At 02:51 PM 6/19/2001 -0500, Jarkko Hietaniemi wrote: Gah. I thought (and I use the word loosely here) that locales generally specified how a particular character should be interpreted when there's some ambiguity--the high bit ASCII characters spring to mind, given there's a dozen or

Re: The internal string API

2001-06-19 Thread Jarkko Hietaniemi
Taiwanese read traditional chinese characters, but PRC people read simplied chinese. Even we take the same data, and same program (code), people just read differently. As an end user, I want to make the decision. It will drive me crazy if Perl render/display the text file using traditional

RE: The internal string API

2001-06-19 Thread Hong Zhang
This is the common approach of complicated text representation, the implemetations I have seen includes IBM IText and SGI rope. For rope, each rope is represented by either of a simple immutable string, a simple mutable string, a simple immutable substring of another rope, or a binary node of