Re: The Arrogants and the sillies (RE: Euros and cents)

2002-03-26 Thread Tom Lord
Intended just as friendly fuel for this cheery and interesting fire: Well, English has become 'Chinese-like' (i.e. more like isolating languages and less like inflecting languages) recently(?) with less and less inflection. I'm not an expert, but I don't think that's

Re: The Arrogants and the sillies (RE: Euros and cents)

2002-03-26 Thread Tom Lord
It (english) seems to me like a kind of ultimate, permanent creole, Sorry to reply to my own message but this reminds of another question: have there been any attempts (apart from taggers and one art project that I know of) to design/propogate ideograph-based writing systems for english?

Re: bijective (was re: An Absurdly Brief Introduction to Unicode (was Re:

2001-02-24 Thread Tom Lord
I think I'd like bijective too, if I knew what it meant. Someone? It would be a lot more fun to answer this question in plain-text Unicode (using math notation) than in ASCII. Informally: "Bijective" describes a mapping between two sets. Every element of the source set ("the

Re: Perception that Unicode is 16-bit (was: Re: Surrogate space i

2001-02-22 Thread Tom Lord
[EMAIL PROTECTED] wrote: "Unicode is a character set encoding standard which currently provides for its entire character repertoire to be represented using 8-bit, 16-bit or 32-bit encodings." Please say "encoding forms". There are three distinct terms, that sound similar, and

Re: Perception that Unicode is 16-bit (was: Re: Surrogate space i

2001-02-21 Thread Tom Lord
What exactly _would_ be wrong with calling UNICODE a thirty-two bit encoding If I have a 32 bit integer type, holding a Unicode code point, I have 11 bits left over to hold other data. That's worth knowing. Btw, saying approximately 20.087 bits (Am I calculating that

An Aburdly Brief Introduction to Unicode (was Re: Perception ...)

2001-02-21 Thread Tom Lord
We've seen several posts about the perception that Unicode is a 16 bit character set encoding. Among those, we've heard anecdotes about the problems people have introducing newcomers to Unicode. Here is a chapter of a reference manual I've been working on. The original manual can be found at

Re: Surrogate space in Unicode

2001-02-16 Thread Tom Lord
Because of the widespread belief that Unicode stops at U+, many fonts and applications that claim to support Unicode can only handle basic characters, not supplementary characters. Right. (Is it really a widespread belief? That's something I've been wondering.) So

Re: Surrogate space in Unicode

2001-02-15 Thread Tom Lord
It has proven difficult to come up with convenient terms for the Unicode characters encoded at U+1 and beyond. [] 2. A 'basic' code point, which may represent a 'basic character', can range from U+ through U+. For what purpose is such a

Unicode regular expression matcher

2000-12-14 Thread Tom Lord
We are distributing an efficient, open source regular expression pattern matcher in a C library. It implements the regular expression language specified by W3C, in "XML Schema Part 2: Datatypes". Our software can be retrieved from: http://www.regexps.com Hackerlab Rx-XML

Unicode database (C library)

2000-12-14 Thread Tom Lord
We are distributing an open source C library that contains a programming interface for accessing information taken from "unidata.txt" and other Unicode databases. It provides space and time efficient access to various character properties. Our library does not contain all of the information