John Jenkins [EMAIL PROTECTED] writes:
Actually, TC/SC variation *is* one of the cases it's intended to cover
(where the two forms are related in a one-one fashion and the regular
simplification rules are being applied). It's aimed, however, more at
things like the ever-multiplying
Just a few comments on Andrew's note:
At 06:43 AM 1/19/2004, Andrew C. West wrote:
An analogy for those not familiar with the Mongolian script is the much
beloved
long s, which is a positional glyph variant of the ordinary letter s for some
languages at some periods of time. The long s does not
It may not be magic but I was basically told it was taboo in Unicode.
If it was a taboo that would mean that it was something which is often thought
of as a law being imposed by someone, but is in fact merely something that
would have severely negative consequences and the lawgivers tell you
On 20/01/2004 00:36, Asmus Freytag wrote:
...
Chinese ideographs don't quite fit either Andrews example or my reply
- the nature
of the problem is different due to both the large set of base
characters and
the (potentially) large number of (non-deterministic) variations --
together with
the
On Tue, 20 Jan 2004 00:36:54 -0800, Asmus Freytag wrote:
Currently, Variation Selectors work only one way. You could 'force' one
particular
shape. Leaving the VS off, gives you no restriction, leaving the software free
to give you either shape. W/o defining the use of two VSs you cannot
from where can i install different code pages in windows (2k/NT) (i want
access in vc++ program)??
(code pages mentioned for windows
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/unicod
e_81rn.asp )
i want to set console code page(OEM) unicode (1201), so that i can
Deepak Chand Rathore deepakr at aztec dot soft dot net wrote:
from where can i install different code pages in windows (2k/NT) (i
want access in vc++ program)??
(code pages mentioned for windows
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/intl/un
icode_81rn.asp )
i
From: Deepak Chand Rathore [EMAIL PROTECTED]
from where can i install different code pages in windows (2k/NT) (i want
access in vc++ program)??
Look into the regional settings configuration panel, there are the options
necesary to add support for more encodings, located on the Windows
Look at SCSU (http://www.unicode.org/reports/tr6/) and BOCU-1
(http://www.unicode.org/notes/tn6/).
--
François
-Message d'origine-
De : Elliotte Rusty Harold [mailto:[EMAIL PROTECTED]
Envoyé : 20 janvier 2004 11:59
À : [EMAIL PROTECTED]
Cc : [EMAIL PROTECTED]
Objet : Unicode forms
I'm currently working on a project (XOM,
http://www.cafeconleche.org/XOM/) in which the Unicode text data is
a significant portion of memory usage in many important use cases.
Currently, for the major class where this is an issue in practice (as
proved by profiling), I store the data as UTF-8.
Title: RE: Unicode forms for internal storage
Last night it occurred to me it might be possible to design an
internal storage format for this class which had better memory usage
characteristics. In particular I'd like ASCII data to occupy only a
single byte, and all other BMP
On Jan 19, 2004, at 11:22 PM, Christian Wittern wrote:
Hmm. Are you saying this can also be used for cases were both (or all
necessary) forms are already encoded?
No. I'm just using U+8AAA and U+8AAC as an example of the kind of
glyphic difference this is intended to cover. Since they're
Dean Snyder asserted:
No, we do not need to rehearse the pros and cons of the dynamic
model for Cuneiform already. Abundant evidence for why it has not
been chosen has already been presented.
But NO ONE mentioned free variation selectors in the discussion until
yesterday.
This is not
You need not invent something new: Just use a simplified SCSU encoder, and either a regular SCSU
decoder or one that only supports the features which your custom encoder uses.
For a tiny SCSU encoder (main function 75 lines of commented C) that also compresses a little better
than what you
In A Comprehensive Russian Grammar by Terence Wade (2nd edition,
Blackwell 2000), one of the best respected descriptions of Russian,
there is a list of symbols from the IPA... used... for the phonetic
transcription of Russian words (p.2). I was surprised to find that many
of these symbols are
Kenneth Whistler wrote at 10:35 AM on Tuesday, January 20, 2004:
Dean Snyder asserted:
No, we do not need to rehearse the pros and cons of the dynamic
model for Cuneiform already. Abundant evidence for why it has not
been chosen has already been presented.
But NO ONE mentioned free
John Jenkins tried to present some usage cases for Han FVS
combinations, and Mike Ayers responded with a bunch more questions:
Ummm - if this simplified form were used at all, wouldn't it already
be encoded? Isn't there a process for getting such encoded? Has this
process broken down,
At 9:52 AM -0800 1/20/04, Markus Scherer wrote:
You need not invent something new: Just use a simplified SCSU
encoder, and either a regular SCSU decoder or one that only supports
the features which your custom encoder uses.
Thanks. It looks like exactly what I need.
For a tiny SCSU encoder
At 10:26 AM -0800 1/20/04, Mike Ayers wrote:
BZZZT! Sorry, thanks for playing. You can't get the
advantages of both with no drawbacks. Given the octets 0x5B5B, how
would you know if you had [[ or a Chinese character?
Actually, it looks like SCSU may do exactly that. If I'm
From: Elliotte Rusty Harold [EMAIL PROTECTED]
Has anyone done any work on Unicode formats for this use-case? Does
anyone have any references or ideas to share?
If you want something very simple to convert between UTF-8 and UTF-16, why
not using them directly, by requiring a leading BOM and
The ISO 639/RA Joint Advisory Committee met last week in Washington DC. I've prepared
a brief report from that meeting that can be obtained from
http://scripts.sil.org/cms/scripts/page.php?site_id=nrsiitem_id=PCUnicodeDocshighlight=#367db883
(the file MtgRpt_ISO639RA-JAC.pdf near the bottom of
Dean Snyder continued:
But NO ONE mentioned free variation selectors in the discussion until
yesterday.
This is not the case. *I* mentioned free variation selectors
during both of the ICE meetings. They weren't discussed at any
great length, precisely because I and the other encoding
Andrew C. West scripsit:
These are glyph variants of Phags-pa letters that are used with semantic
distinctiveness in a single (but very important) text, _Menggu Ziyun_ , a 14th
century rhyming dictionary of Chinese in which Chinese ideographs are listed by
their Phags-pa spellings. In this
- Original Message -
From: John Jenkins [EMAIL PROTECTED]
To: Unicode List [EMAIL PROTECTED]
Sent: Tuesday, January 20, 2004 9:32 AM
Subject: Re: Chinese FVS? (was: RE: Cuneiform Free Variation Selectors)
.
John Jenkins wrote,
1) U+9CE6 is a traditional Chinese character (a kind of
Peter Kirk suggested:
Presumably the same principles can be applied when you run into a newly
discovered (probably archaic) cuneiform character. Except that for some
reason, Ken, you classified dynamic cuneiform as Type VI: Glyph
Description Language. Why can't it be seen as Type V:
On 20/01/2004 11:27, Kenneth Whistler wrote:
...
If you are representing Han data as Unicode plain text, and you
run into a newly discovered character, you are stuck. Your options
are:
1. Use a geta (U+3013), i.e. throw up your hands and punt.
2. Use an Ideographic Description Sequence to
26 matches
Mail list logo