Re: VS vs. P14 (was Re: Indic Devanagari Query)

2003-02-06 Thread Doug Ewell
Asmus Freytag wrote: > Unicode 4.0 will be quite specific: P14 tags are "reserved for > use with particular protocols requiring their use" is what the > text will say more or less. I didn't know the question of what to do about Plane 14 language tags had already been resolved. If that is the ca

Re: list etiquette (was Re: Tailoring of normalization

2003-02-06 Thread Lars Marius Garshol
* Tex Texin | | There probably isn't a one-size fits all solution, short of those | not wanting a response changing their reply-to address to | "[EMAIL PROTECTED]". That's dangerous. Quite a few email clients will then create replies that go only to that address, so nobody will see them at all..

RE: discovering code points with embedded nulls

2003-02-06 Thread Kent Karlsson
> From what I'm hearing from you all is that a null in UTF-8 is > for termination and termination only. > Is this correct? No, NULL is a character (actually a control character) among many others. However, many C/C++ APIs (mis)use NULL as a string terminator since NULL isn't very useful for othe

VS vs. P14 (was Re: Indic Devanagari Query)

2003-02-06 Thread Andrew C. West
James Kass wrote, > (What happens if someone discovers a 257th variant? Do they > get a prize? Or, would they be forever banished from polite > society?) I was thinking about that. 256 variants of a single character may seem a tad excessive, but there is a common Chinese decoartive motif (frequen

Re: list etiquette (was Re: Tailoring of normalization

2003-02-06 Thread Curtis Clark
Lars Marius Garshol wrote: * Tex Texin | | There probably isn't a one-size fits all solution, short of those | not wanting a response changing their reply-to address to | "[EMAIL PROTECTED]". That's dangerous. Quite a few email clients will then create replies that go only to that address, so no

Re: discovering code points with embedded nulls

2003-02-06 Thread Doug Ewell
Kent Karlsson wrote: >> From what I'm hearing from you all is that a null in UTF-8 is >> for termination and termination only. >> Is this correct? > > No, NULL is a character (actually a control character) among many > others. However, many C/C++ APIs (mis)use NULL as a string terminator > since

Re: VS vs. P14 (was Re: Indic Devanagari Query)

2003-02-06 Thread John H. Jenkins
On Thursday, February 6, 2003, at 08:47 AM, Andrew C. West wrote: There are also a number of other auspicious characters, such as fu2 (U+798F) "good fortune" that may be found written in a hundred variant forms as a decorative motif. Ah, but decorative motifs are not plain text. == Jo

Re: discovering code points with embedded nulls

2003-02-06 Thread Stefan Persson
What is that strange file (winmail.dat) attached to your mail? I really hope that it isn't a virus. Stefan Kent Karlsson wrote: From what I'm hearing from you all is that a null in UTF-8 is for termination and termination only. Is this correct? No, NULL is a character (actually a contro

RE: discovering code points with embedded nulls

2003-02-06 Thread Marco Cimarosti
Doug Ewell wrote: > Kent Karlsson wrote: > > >> From what I'm hearing from you all is that a null in UTF-8 is > >> for termination and termination only. > >> Is this correct? > > > > No, NULL is a character (actually a control character) among many > > others. However, many C/C++ APIs (mis)use NU

RE: discovering code points with embedded nulls

2003-02-06 Thread Marco Cimarosti
Stefan Persson wrote: > What is that strange file (winmail.dat) attached to your > mail? I really hope that it isn't a virus. http://support.microsoft.com/default.aspx?scid=KB;en-us;q241538 (Whether MS Outlook is a virus or not, is still a debated issue. :-) _ Marco

CJK test data

2003-02-06 Thread Erik.Ostermueller
I'm starting to put together some CJK test data as described below. Before I dive in, I was curious if any of this work is already available on the web. If not, would others be interested seeing this, once complete? ### CJK Test data. Th

RE: discovering code points with embedded nulls

2003-02-06 Thread Erik.Ostermueller
I didn't get any attachments. Hmmm. --Erik O. > -Original Message- > From: Stefan Persson [mailto:[EMAIL PROTECTED]] > Sent: Thursday, February 06, 2003 11:12 AM > To: Kent Karlsson > Cc: Ostermueller, Erik; [EMAIL PROTECTED] > Subject: Re: discovering

Re: CJK test data

2003-02-06 Thread Michael \(michka\) Kaplan
From: <[EMAIL PROTECTED]> > 1) Sorting Test > a) include a list of un-ordered strings. > b) follow that with the same list, ordered properly. GB18030 does not define a specific standard for sorting (as far as I know, neither does GB13000). It is an encoding standard. Since

Plane 14 Tag Deprecation Issue (was Re: VS vs. P14 (was Re: Indic Devanagari Query))

2003-02-06 Thread Kenneth Whistler
Doug wrote: > Asmus Freytag wrote: > > > Unicode 4.0 will be quite specific: P14 tags are "reserved for > > use with particular protocols requiring their use" is what the > > text will say more or less. > > I didn't know the question of what to do about Plane 14 language tags > had already been

Re: list etiquette (was Re: Tailoring of normalization

2003-02-06 Thread Tex Texin
Guys, fair enough. I thought the semi-serious name would have been indication enough for someone to hit the delete key and remove the name, but I guess not. I'll probably hear from a Mr. Max Delete from the Exchange next wanting to know why I instigated this. Reminds me of the story of the suppor

Re: discovering code points with embedded nulls

2003-02-06 Thread Jim Allan
Doug Ewell posted: The use of NULL to terminate strings is a basic part of the Standard C library, not just certain APIs. As such, it doesn't seem right to call this a "misuse" of the character. But ISO 646, in defining ASCII, states as the defintion of the control character NULL: "A control

Re: Arabic Presentation Forms

2003-02-06 Thread Markus Scherer
ICU has a function u_shapeArabic(): http://oss.software.ibm.com/icu/apiref/ushape_8h.html#a24 markus Mete Kural wrote: I need to figure out a method to convert Arabic Unicode text encoded in its normal form to Arabic Unicode text encoded in Arabic presentation forms. ...

Re: compatibility between unicode 2.0 and 3.0

2003-02-06 Thread Markus Scherer
Doug Ewell wrote: That said, there are certain conventions for certain ranges of code points. For example, the range from U+0590 through U+08FF is marked in the Roadmap as being reserved for right-to-left scripts, and IIRC there are ranges reserved for invisible formatting and control characters