Re: Chinese FVS? (was: RE: Cuneiform Free Variation Selectors)

2004-01-20 Thread Christian Wittern
John Jenkins [EMAIL PROTECTED] writes: Actually, TC/SC variation *is* one of the cases it's intended to cover (where the two forms are related in a one-one fashion and the regular simplification rules are being applied). It's aimed, however, more at things like the ever-multiplying

Re: Mongolian Unicoding (was Re: Cuneiform Free Variation Selectors)

2004-01-20 Thread Asmus Freytag
Just a few comments on Andrew's note: At 06:43 AM 1/19/2004, Andrew C. West wrote: An analogy for those not familiar with the Mongolian script is the much beloved long s, which is a positional glyph variant of the ordinary letter s for some languages at some periods of time. The long s does not

Re: Cuneiform Free Variation Selectors

2004-01-20 Thread Jon Hanna
It may not be magic but I was basically told it was taboo in Unicode. If it was a taboo that would mean that it was something which is often thought of as a law being imposed by someone, but is in fact merely something that would have severely negative consequences and the lawgivers tell you

Re: Mongolian Unicoding (was Re: Cuneiform Free Variation Selectors)

2004-01-20 Thread Peter Kirk
On 20/01/2004 00:36, Asmus Freytag wrote: ... Chinese ideographs don't quite fit either Andrews example or my reply - the nature of the problem is different due to both the large set of base characters and the (potentially) large number of (non-deterministic) variations -- together with the

Re: Mongolian Unicoding (was Re: Cuneiform Free Variation Selectors)

2004-01-20 Thread Andrew C. West
On Tue, 20 Jan 2004 00:36:54 -0800, Asmus Freytag wrote: Currently, Variation Selectors work only one way. You could 'force' one particular shape. Leaving the VS off, gives you no restriction, leaving the software free to give you either shape. W/o defining the use of two VSs you cannot

how to download code pages in win2k/ nt

2004-01-20 Thread Deepak Chand Rathore
from where can i install different code pages in windows (2k/NT) (i want access in vc++ program)?? (code pages mentioned for windows http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicod e_81rn.asp ) i want to set console code page(OEM) unicode (1201), so that i can

Re: how to download code pages in win2k/ nt

2004-01-20 Thread Doug Ewell
Deepak Chand Rathore deepakr at aztec dot soft dot net wrote: from where can i install different code pages in windows (2k/NT) (i want access in vc++ program)?? (code pages mentioned for windows http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/un icode_81rn.asp ) i

Re: how to download code pages in win2k/ nt

2004-01-20 Thread Philippe Verdy
From: Deepak Chand Rathore [EMAIL PROTECTED] from where can i install different code pages in windows (2k/NT) (i want access in vc++ program)?? Look into the regional settings configuration panel, there are the options necesary to add support for more encodings, located on the Windows

RE: Unicode forms for internal storage

2004-01-20 Thread Francois Yergeau
Look at SCSU (http://www.unicode.org/reports/tr6/) and BOCU-1 (http://www.unicode.org/notes/tn6/). -- François -Message d'origine- De : Elliotte Rusty Harold [mailto:[EMAIL PROTECTED] Envoyé : 20 janvier 2004 11:59 À : [EMAIL PROTECTED] Cc : [EMAIL PROTECTED] Objet : Unicode forms

Unicode forms for internal storage

2004-01-20 Thread Elliotte Rusty Harold
I'm currently working on a project (XOM, http://www.cafeconleche.org/XOM/) in which the Unicode text data is a significant portion of memory usage in many important use cases. Currently, for the major class where this is an issue in practice (as proved by profiling), I store the data as UTF-8.

RE: Unicode forms for internal storage

2004-01-20 Thread Mike Ayers
Title: RE: Unicode forms for internal storage Last night it occurred to me it might be possible to design an internal storage format for this class which had better memory usage characteristics. In particular I'd like ASCII data to occupy only a single byte, and all other BMP

Re: Chinese FVS? (was: RE: Cuneiform Free Variation Selectors)

2004-01-20 Thread John Jenkins
On Jan 19, 2004, at 11:22 PM, Christian Wittern wrote: Hmm. Are you saying this can also be used for cases were both (or all necessary) forms are already encoded? No. I'm just using U+8AAA and U+8AAC as an example of the kind of glyphic difference this is intended to cover. Since they're

Re: Cuneiform Free Variation Selectors

2004-01-20 Thread Kenneth Whistler
Dean Snyder asserted: No, we do not need to rehearse the pros and cons of the dynamic model for Cuneiform already. Abundant evidence for why it has not been chosen has already been presented. But NO ONE mentioned free variation selectors in the discussion until yesterday. This is not

Re: Unicode forms for internal storage

2004-01-20 Thread Markus Scherer
You need not invent something new: Just use a simplified SCSU encoder, and either a regular SCSU decoder or one that only supports the features which your custom encoder uses. For a tiny SCSU encoder (main function 75 lines of commented C) that also compresses a little better than what you

Pseudo-IPA characters for Russian

2004-01-20 Thread Peter Kirk
In A Comprehensive Russian Grammar by Terence Wade (2nd edition, Blackwell 2000), one of the best respected descriptions of Russian, there is a list of symbols from the IPA... used... for the phonetic transcription of Russian words (p.2). I was surprised to find that many of these symbols are

Re: Cuneiform Free Variation Selectors

2004-01-20 Thread Dean Snyder
Kenneth Whistler wrote at 10:35 AM on Tuesday, January 20, 2004: Dean Snyder asserted: No, we do not need to rehearse the pros and cons of the dynamic model for Cuneiform already. Abundant evidence for why it has not been chosen has already been presented. But NO ONE mentioned free

RE: Chinese FVS? (was: RE: Cuneiform Free Variation Selectors)

2004-01-20 Thread Kenneth Whistler
John Jenkins tried to present some usage cases for Han FVS combinations, and Mike Ayers responded with a bunch more questions: Ummm - if this simplified form were used at all, wouldn't it already be encoded? Isn't there a process for getting such encoded? Has this process broken down,

Re: Unicode forms for internal storage

2004-01-20 Thread Elliotte Rusty Harold
At 9:52 AM -0800 1/20/04, Markus Scherer wrote: You need not invent something new: Just use a simplified SCSU encoder, and either a regular SCSU decoder or one that only supports the features which your custom encoder uses. Thanks. It looks like exactly what I need. For a tiny SCSU encoder

RE: Unicode forms for internal storage

2004-01-20 Thread Elliotte Rusty Harold
At 10:26 AM -0800 1/20/04, Mike Ayers wrote: BZZZT! Sorry, thanks for playing. You can't get the advantages of both with no drawbacks. Given the octets 0x5B5B, how would you know if you had [[ or a Chinese character? Actually, it looks like SCSU may do exactly that. If I'm

Re: Unicode forms for internal storage

2004-01-20 Thread Philippe Verdy
From: Elliotte Rusty Harold [EMAIL PROTECTED] Has anyone done any work on Unicode formats for this use-case? Does anyone have any references or ideas to share? If you want something very simple to convert between UTF-8 and UTF-16, why not using them directly, by requiring a leading BOM and

recent meeting of ISO 639/RA JAC

2004-01-20 Thread Peter Constable
The ISO 639/RA Joint Advisory Committee met last week in Washington DC. I've prepared a brief report from that meeting that can be obtained from http://scripts.sil.org/cms/scripts/page.php?site_id=nrsiitem_id=PCUnicodeDocshighlight=#367db883 (the file MtgRpt_ISO639RA-JAC.pdf near the bottom of

Re: Cuneiform Free Variation Selectors

2004-01-20 Thread Kenneth Whistler
Dean Snyder continued: But NO ONE mentioned free variation selectors in the discussion until yesterday. This is not the case. *I* mentioned free variation selectors during both of the ICE meetings. They weren't discussed at any great length, precisely because I and the other encoding

Re: Mongolian Unicoding (was Re: Cuneiform Free Variation Selectors)

2004-01-20 Thread jcowan
Andrew C. West scripsit: These are glyph variants of Phags-pa letters that are used with semantic distinctiveness in a single (but very important) text, _Menggu Ziyun_ , a 14th century rhyming dictionary of Chinese in which Chinese ideographs are listed by their Phags-pa spellings. In this

Re: Chinese FVS? (was: RE: Cuneiform Free Variation Selectors)

2004-01-20 Thread jameskass
- Original Message - From: John Jenkins [EMAIL PROTECTED] To: Unicode List [EMAIL PROTECTED] Sent: Tuesday, January 20, 2004 9:32 AM Subject: Re: Chinese FVS? (was: RE: Cuneiform Free Variation Selectors) . John Jenkins wrote, 1) U+9CE6 is a traditional Chinese character (a kind of

RE: Cuneiform Free Variation Selectors

2004-01-20 Thread Kenneth Whistler
Peter Kirk suggested: Presumably the same principles can be applied when you run into a newly discovered (probably archaic) cuneiform character. Except that for some reason, Ken, you classified dynamic cuneiform as Type VI: Glyph Description Language. Why can't it be seen as Type V:

RE: Cuneiform Free Variation Selectors

2004-01-20 Thread Peter Kirk
On 20/01/2004 11:27, Kenneth Whistler wrote: ... If you are representing Han data as Unicode plain text, and you run into a newly discovered character, you are stuck. Your options are: 1. Use a geta (U+3013), i.e. throw up your hands and punt. 2. Use an Ideographic Description Sequence to