At 10:44 AM 4/22/2004, Frank Yung-Fong Tang wrote:
I
saw the announcment of publishing
ISO/IEC 10646: 2003, Information technology --
Universal Multiple-Octet Coded Character Set (UCS)
From
http://anubis.dkuug.dk/jtc1/sc2/open/02n3729.htm
I expect there are no difference from Unicode 4.0,
At 03:49 PM 4/19/2004, Kenneth Whistler wrote:
The Unicode Standard is not prescriptive about rendering, beyond the
basics required to simply ensure correct mapping of textual content
into streams of characters. If one font vendor wants to have a raised
glyph for the MIDDLE DOT and another wants
At 08:42 AM 4/19/2004, Theo Veenker wrote:
Hi,
Until now I always downloaded the lastest version of the UCD
and worked with that. Now I want to download the UCD files for
4.0.0 again. I know it is all in http://www.unicode.org/Public/-
4.0-Update/, but in http://www.unicode.org/ucd/ I read this:
At 06:16 PM 4/15/2004, Philippe Verdy wrote:
The other reason is that the middle-dot, being a punctuation, would be
likely to
have extra spacing on both sides, which would make it inappropriate for
rendering Catalan words. Also such punctuation would probably forbid
kerning of
the middle-dot
At 01:54 PM 4/17/2004, Michael Everson wrote:
The samples Asmus sent suggest to me that a school of typographers made a
set of bad decisions, even if they were really famous and got paid lots of
money and their fonts are widely shipped!
In all charity, Michael, your opinion seems to be mainly
At 12:26 AM 4/16/2004, Alexandros Diamantidis wrote:
* Philippe Verdy [EMAIL PROTECTED] [2004-04-16 01:22]:
U+0387 GREEK ANO TELEIA
wrong form? it's a small square, and is the greek semicolon, and is then
separating words.
U+0387 is canonically equivalent to U+00B7. About its shape, whether
At 03:31 PM 4/15/2004, Peter Kirk wrote:
[PA] Isn't this the one that should be used in dictionaries ?
See http://www.unicode.org/unicode/standard/reports/tr14/tr14-6.html
Why are you guys citing the 1999 (!) version of this TR?
It's 2004, Unicode 4.0.1 has been published and we are up to
At 10:49 PM 4/7/2004, Peter Constable wrote:
, and the length it reports
is the number of code units, not the number of characters or graphemes
in
the string.
True; that is documented.
However, that's very common; many APIs relating to UTF-8 would report
the number of bytes, not the number of
James,
this is the kind of thing that you should report via
our error reporting form. Here on the open list, it's
liable to get lost (no-one owns excerpting issues from
this forum).
The contact form can be found on our home page under
contact us.
A./
At 12:03 PM 4/8/2004, [EMAIL PROTECTED]
At 01:29 PM 4/7/2004, Richard Cook wrote:
On Wed, 7 Apr 2004, Peter Constable wrote:
They were encoded that way some while before they were accepted in
Unicode. Also, until Unicode 4.1 is published, there is a possibility
that codepoints may change.
I see. I assumed the codepoint assignments
At 09:11 PM 4/7/2004, Tobias Stamm wrote:
Greetings to all standartisers!
I'm new here so forgive me my stupidness.
I just have one little question to which I didn't found the answer in the
whole homepage:
What is the standard of the characters names?
You are looking for the character naming
At 09:37 AM 4/1/2004, you wrote:
[EMAIL PROTECTED] wrote:
The cedi sign should be of the size of the dollar sign ($) or the euro
sign
(EUR). The site you provided is using the cent sign. The Ghana web site
uses a
better version of the cent sign for the cedi. See
At 12:34 PM 4/2/2004, Kenneth Whistler wrote:
But by all means, make the proposal to the UTC if fixing this
inconsistency seems important and there is some argument to
be made for it.
I might add that 'merely' fixing an apparent inconsistency
cannot be enough of a rationale for making this change.
At 11:44 AM 4/2/2004, Kenneth Whistler wrote:
Rick said:
We also learn from the bird stamps web site cited later that the
government of Ghana is extremely inconsistent about their images and
usage
of their own currency sign. I.e., they apparently don't have a standard
for
it.
So, I don't
Somebody wrote:
non-breaking and non-stretching are presentational properties, not
semantic ones. They don't change the meaning of the space: it's still
just a space, not a hyphen or the letter g. They don't affect
non-visual media; we don't break lines in spoken speech. Louis XVI
is
At 04:28 PM 3/29/2004, Kenneth Whistler wrote:
I will say again as I have said before - but the above (and what I
snipped) is extra evidence for it - that what is broke ... is
the rule that the isolated (generally spacing) form of a combining mark
should be formed by SPACE or NBSP followed by
At 12:19 PM 3/29/2004, Ernest Cline wrote:
[Original Message]
From: Peter Kirk [EMAIL PROTECTED]
On 29/03/2004 06:56, John Cowan wrote:
Peter Kirk scripsit:
Using NBSP rather than SPACE has several advantages, and has long
been specified in Unicode, although not widely implemented. It
At 09:46 AM 3/28/2004, Philippe Verdy wrote:
It was like the US telecommunications act which set fines for transmitting
its set of proscribed words including in programs that were designed to
filter the words out of text.
Dos this list really exist? Seriously, there's no word that can be
At 07:53 PM 3/27/2004, [EMAIL PROTECTED] wrote:
What does the collation standard say to do with unassigned codepoints
anyhow?
Variation selectors are not unassigned characters.
But, they might be regarded as such by any application predating VSs. And,
likewise for any VS sequences approved
Date: Sun, 28 Mar 2004 15:26:12 -0800
To: Philippe Verdy [EMAIL PROTECTED]
From: Asmus Freytag [EMAIL PROTECTED]
Subject: Re: [OT] proscribed words... (was:What is the principle?)
At 02:46 PM 3/28/2004, Philippe Verdy wrote:
From: Asmus Freytag [EMAIL PROTECTED]
Does this list really exist
At 05:32 PM 3/26/2004, John Cowan wrote:
Asmus Freytag scripsit:
Another drawback is the fact that
too few systems handle any variation selectors gracefully.
Well, at least they should be easy to handle in fonts: add the selectors
to the font as invisible characters, and then create mandatory
John,
Look at UTR#20 and at UAX#9 (the 4.01. version is due out shortly).
Taken together they suggest that the non-plain text way is to keep such
text direction overrides out of band (i.e. in markup) and to apply the
bidi algorithm segment by segment in a marked up file.
If you export to plain
At 05:47 PM 3/27/2004, John Cowan wrote:
Asmus Freytag scripsit:
This can be tricky esp,. when the user doesn't know a VS is present
and the font used to view the data doesn't have an alternate glyph.
Well, surely it'll turn into the black blob, or the reversed question
mark, or whatever
At 01:33 PM 3/26/2004, Jim Allan wrote:
Arcane Jill posted:
(A) A proposed character will be rejected if its glyph is identical in
appearance to that of an extant glyph, regardless of its semantic
meaning,
Obviously not.
Unicode encodes characters not glyphs. That particular glyphs of one
At 02:03 PM 3/26/2004, Ernest Cline wrote:
[Original Message]
From: Asmus Freytag [EMAIL PROTECTED]
There are millions of fonts out there with variations of the zodiac. Font
shifting would seem to be the correct answer to implement glyph
variations
there. (A wrong font will ruin the mood
At 12:14 PM 3/24/2004, Mike Ayers wrote:
Does anyone know of a good program for examining fonts? What I
am looking for is some way to, given a font, find out both the glyphs
contained and the code points (bad term?) at which those glyphs are
situated. Ability to read hinting/shaping
At 02:58 PM 3/24/2004, Thomas Kuehne wrote:
Am 2004-03-23 20:23 schrieb Asmus Freytag:
I don't think I know of a scenario where it is crtical for a
resource limited device to display the kinds of texts you list
below.
Reading the font data and processing it into a display representation
poses
At 02:55 PM 3/23/2004, Thomas Kuehne wrote:
Is somebody already using a PUA assignment for vertical text direction
controls?
from http://www.unicode.org/faq/bidi.html#1
[...] the choice of vertical layout is usually treated as a
formatting style; therefore, the Unicode Standard does not define
At 06:09 PM 3/23/2004, Thomas Kuehne wrote:
Am Mittwoch 24 März 2004 00:09 schrieb Asmus Freytag:
Is somebody already using a PUA assignment for vertical text
direction controls?
I think the idea was that these don't belong in plain text.
Markup languages have had vertical layout controls
At 02:26 AM 3/21/2004, Philippe Verdy wrote:
Look into Wingdings and Dingbats code blocks,
**
Phillipe, this is a new low in sloppy inaccuracy even for you.
WingDings is a name of a series of fonts shipped by MS.
They contain many symbols not found in Unicode. There is no
At 09:48 AM 3/19/2004, Mike Ayers wrote:
In less than half an hour of looking at printed samples, I've
been able to
locate two instances of the symbol replacing the letter A in
a word. If
that's not use in text, I don't know what is.
That is use in text as a glyph variant, which is,
At 07:13 AM 3/19/2004, Marion Gunn wrote:
Ar 15:33 + 2004/03/18, scríobh Arcane Jill:
This probably is going to sound like a really dumb question, but ... Is
the BMP being saved for something?
...
Arcane Jill
There are never any dumb questions, Jill, only dumb answers.
And some of the latter
At 10:34 AM 3/18/2004, Michael Everson wrote:
I think the ANARCHY SIGN is perfectly good, but I think it is a glyph
variant of an existing character.
Just as 2117 and 24C5 are similar, but unrelated the *ANARCHY SIGN is not
the same as 24B6.
A./
At 08:27 AM 3/18/2004, Jon Wilson wrote:
Hi folks,
I believe there is a character missing from the standard. I would like to
apply to have it included, but I am a typography and Unicode novice, so I
require some assistance with the application process.
The character in question is a variant of
At 04:18 PM 3/18/2004, Mike Ayers wrote:
Note that in *that* rendition of the anarchy symbol, the
crossbar on the A does *not* touch the circle on either
edge, but it may just be that the renderer was a little
short of black paint.
I find
At 12:07 PM 3/16/2004, Antoine Leca wrote:
(For example, old German in Frakkur typeface has been decided to be
just different font, but the same lattin letters as we know today)
Like U+017F? ;-)
A little known fact is that the long s cannot be implemented as your typical
context-based glyph
At 12:11 PM 2/24/2004, Kenneth Whistler wrote:
Think of variation selection as being more appropriate when
what we are talking about are for most purposes simply
*free variants* for presentation -- either is equally correct
to most people under most circumstances -- but where for
particular
At 01:20 PM 2/7/2004, Laurentiu Iancu wrote:
I noticed that a new combining character, U+1DC2 Combining Snake Below, has
been added. Just out of curiosity, what were the reasons why this character
was allocated at this code point rather than, for instance, U+0358, the last
free position in the
At 04:12 PM 2/9/2004, Kenneth Whistler wrote:
That leaves item A. And it is mostly a matter of determining
what is the best mechanism for getting people to know how
they should spell the metegs with the minimum of confusion.
Putting something in the Unicode Standard might be appropriate,
or there
Just a few comments on Andrew's note:
At 06:43 AM 1/19/2004, Andrew C. West wrote:
An analogy for those not familiar with the Mongolian script is the much
beloved
long s, which is a positional glyph variant of the ordinary letter s for some
languages at some periods of time. The long s does not
At 09:23 PM 1/18/2004, [EMAIL PROTECTED] wrote:
Seriously, it's my understanding that implementation guidelines
for Mongolian script and Unicode are still being worked out.
You are correct. A group of experts is currently working out a definite
description of how Mongolian should work.
All the
At 04:08 PM 1/8/2004, D. Starner wrote:
Otto Stolz [EMAIL PROTECTED] wrote:
Gerd Schumacher wrote:
The long s [...] has been abandoned from the Roman alphabet in Germany
in the mid of the 19th century.
You mean the 20th century, don't you?
I have a facsimile reprint of the 1914 issue of
Another rule which isn't written into Unicode but I like (don't know if
Everson
and Whistler and others will), is the font clarity rule. Given a font
minus one
character, I should be able to predict what that character will look like.
If I
have a Sütterlin font or a Fraktur font, I know what
- Original Message -
From: Frank Yung-Fong Tang [EMAIL PROTECTED]
UTF-166,634,430 bytes
UTF-87,637,601 bytes
SCSU6,414,319 bytes
BOCU-15,897,258 bytes
Legacy encoding (*)5,477,432 bytes
(*) KS C 5601, KS X 1001, or EUC-KR)
What is the size
At 05:52 AM 11/20/2003, Philippe Verdy wrote:
We need a comprehensive new technical report that lists all the exceptions
to the general category system, as these line-breaking or word-breaking or
grapheme cluster breaking properties are orthogonal to the basic GC system
and to the combining class
At 05:44 AM 11/19/2003, Philippe Verdy wrote:
However, a couple of paragraphs up, the definition for No-Break
Space says:
U+00A0 [No-Break Space] behaves like the following coded
character sequence: U+FEFF [Zero Width No-Break Space] +
U+0020 [Space] + U+FEFF [Zero Width No-Break Space].
At 09:35 PM 10/27/03 -0800, Doug Ewell wrote:
That said, I can try to improve my use of real Unicode punctuation on
these lists, if I have time to paste it in (since my keyboard doesn't
support it).
Please don't.
I remember being told by someone a few years back that I
should limit my use of
At 09:30 PM 10/26/03 -0800, Doug Ewell wrote:
I can't speak for the whole of the last two centuries, but certainly
current American bills and coins do not use either symbol. The bills
in common use say ONE DOLLAR, FIVE DOLLARS, TEN DOLLARS, and TWENTY
DOLLARS; the coins say ONE CENT, FIVE
At 02:08 PM 10/25/03 -0700, Doug Ewell wrote:
So, in effect the UNICODE character names attempt to be
a unified transliteration scheme for all languages? Are these
principles laid down somewhere or is this more informal?
The Unicode character names attempt to be (a) unique and (b) reasonably
At 03:36 AM 10/26/03 +1100, Simon Butcher wrote:
Just a quick question.. The description for U+0024 (DOLLAR SIGN) states
that the glyph may contain one or two vertical bars. Is there a codepoint
specifically for the traditional double-bar form, or any plan to include
one in the future?
No.
I
At 05:51 PM 10/25/03 +0100, Raymond Mercier wrote:
Among the new characters in N2676 there is
10186 G GREEK ARTABE SIGN
This is one of the many signs found in papyri, such as those edited by
Kenyon. This symbol represents apparently a measure of volume used for
grain. It appears as a small
At 11:02 AM 10/26/03 +1100, Simon Butcher wrote:
Hi!
snip
I was taught at school that the double-bar form was used
when Australia
switched to decimal currency in 1966, and that it was
incorrect to write
the single-bar form when referring to Australian dollars.
It would be interesting if
At 02:05 PM 10/24/03 +0100, Jill Ramonsky wrote:
Here's a better idea.
Let's just stick with the idea that ANY C0 or C1 control has no place
being anywhere in a line of text, and so any sequence of one or more of
them will be interpretted as a line-break!
Sorted once and for all!
I'm not sure
Why does this have to be in 'plain text'??
Plain text can be streams or strings. For streams, such a mechanism might
make sense, if you could identify a compelling case that's not better
handled by HTML, XML etc.
For strings, embedding font names in front of characters just violates some
I noticed that this message had not gotten a reply.
At 05:07 PM 10/7/03 +0200, Kent Karlsson wrote:
A question about the issues already open: What is the justification
for
proposing to make Braille Lo?
Shortly before this came up as a Public Review Issue, I suggested that
Braille characters
At 02:26 AM 10/16/03 -0700, Peter Kirk wrote:
You can never tell whether something is going to be a performance
issue -- not just measurably slower, but actually affecting
usability -- until you do some profiling. Guessing does no good.
Well, did the people who wrote this in the standard do some
At 08:03 AM 10/16/03 -0700, Peter Kirk wrote:
Or perhaps a way can be found to graciously retire UTF-16 in some distant
future version of Unicode. That is likely to become viable long before the
extra planes are needed.
This discussion is a pure numbers game. Since no-one can define a hard
At 10:16 PM 10/16/03 +0200, Philippe Verdy wrote:
Standards should always be designed with the idea of integrating well
with other standards, without introducing contradictory objectives.
This is what Americans call motherhood and apple pie - feel godd statements
that are lofty but do nothing to
At 09:59 PM 10/16/03 +0200, Philippe Verdy wrote:
We're not discussing about addition of characters standardized by joint
efforts
of Unicode's UTC and ISO's WG2, and I'm not expecting a lot of changes in
this
area. But about a more general scheme in which the Unicode/ISO10646 would
become a part
I'm going to answer some of Peter's points, leaving aside the interesting
digressions into Java subclassing etc. that have developed later in the
discussion.
At 04:19 AM 10/15/03 -0700, Peter Kirk wrote:
I note the following text from section 5.13, p.127, of the Unicode
standard v.4:
At 01:44 PM 10/15/03 -0700, Peter Kirk wrote:
The guidelines are concerned with the average case: displaying the
characters as *text*.
[The use of the word 'must' in a guideline is always awkward, since that
word has such a strong meaning in the normative part of the standard.]
So, are you
At 03:07 PM 10/12/03 -0400, Laurentiu Iancu wrote:
Hello,
I was wondering if it would be a good idea to include variation sequences in
the code charts, as notes below the base characters that have standardized
variants. To me it would seem as a convenient place to reference them, but I
realize
At 10:32 AM 10/7/03 +0530, [EMAIL PROTECTED] wrote:
The only justification mentioned so far for changing Braille from So to Lo
is to be able to use Braille in identifiers. I'm not sure why someone
whould want to use Braille in this way, for a start how would these
identifiers be translated into
At 10:29 AM 10/6/03 +0530, [EMAIL PROTECTED] wrote:
The Unicode Technical Committee has posted some new issues for public
review and comment. Details are on the following web page:
http://www.unicode.org/review/
A question about the issues already open: What is the justification for
At 04:24 PM 10/3/03 -0700, Peter Constable wrote:
HEBREW BABYLONIAN (SIMPLE) ATNACH
I don't know that parens in names are acceptable. Also, might it make
sense to hyphenate the first two words (the first word in the name of
characters in the Hebrew block doesn't need to be HEBREW). Hence,
At 10:50 AM 10/1/03 -0700, Magda Danish \(Unicode\) wrote:
Our problem is the representation of the £ sign (British
pound sign - U+00A3). When we type this character into our
pages and then set the character encoding in our pages to
Unicode (UTF-8) (either by setting it directly in the HTTP
At 11:15 AM 9/30/03 -0400, John Cowan wrote:
Isaac Newton spent an unconscionable amount
of time, by our standards, messing about with astrology and numerology
One of the aspects of character encoding and standardization that seems to
have an unholy fascination for people is its numerical aspect.
At 10:34 PM 9/30/03 +0200, Stefan Persson wrote:
The code charts tells that U+ACs-0308 Combining di+AOY-resis may be used for
indicating the +IBw-double derivate.+IB0- I have only heard people
calling this
the +IBw-second derivate+IB0gFA-is +IBw-double derivate+IB0- a valid name
for this?
The
At 05:41 PM 9/25/03 +0100, Richard Ishida wrote:
Aha. Maybe, next time I try to explain it on the plane, I'll say
something like:
Unicode is a standard for enabling your computer to represent all the
letters of all the alphabets of the world.
Still not terribly accurate and deliberately vague
At 08:36 PM 9/18/03 -0400, Noah Levitt wrote:
On Mon, Aug 11, 2003 at 12:57:11 -0700, Kenneth Whistler wrote:
Kent asked:
How should a freestanding double diacritic be encoded (for purposes of
meta-discussions, and the like): SPACE, dbl diacritic or SPACE, dbl
diacritic, SPACE?
It
At 08:26 PM 9/1/03 -0700, Doug Ewell wrote:
Tex Texin tex at i18nguy dot com wrote:
In most industry usages, MBCS refers to variable width encodings, not
fixed width.
Well, if variable-width encodings are referred to as both DBCS (see, for
example, http://czyborra.com/charsets/cjk.html#dbcs)
At 10:40 AM 8/31/03 -0400, Jim Allan wrote:
The code chart menu page at http://www.unicode.org/charts/ does not
contain a link to the Ugaritic characters
However the Ugaritic chart exists and can be obtained by using the direct
url http://www.unicode.org/charts/PDF/U10380.pdf.
I've just
Compatibility characters:
The recommendations for compatibility characters are necessarily vague,
since their use in legacy data (and legacy environments) is strongly
dependent on what is (or was) customary in a given environment.
If a process merely warehouses text data (or parses only a very
At 04:50 AM 7/22/03 +0200, Chris Jacobs wrote:
Where am I going with this? Basically what I'm after is a clean/clear
way to tell if quotation marks and parentheses (plus the other
bracketing characters such as '[' or '{' are opening or closing
punctuation. That's the real question here!
At 05:09 PM 7/8/03 -0400, you wrote:
Even if this were done, I wonder if most software would understand U+2007
or other non-breaking spaces as spaces for the purpose of
full-justification or right-justification and hide them when they would
otherwise appear at column right position.
Such usage
Unicode assigns the general category value, Sk, or Symbol, [k]urrency
to all characters whose *primary* function is to act as a currency symbol.
That excludes all characters that have other, unrelated uses, as long as
those are not more specialized than the use as currency sign. That's an
Consortium.
The location on the Unicode website is
http://www.unicode.org/reports/tr20/
Asmus Freytag
Technical Vice President
The Unicode Consortium
At 10:00 AM 6/9/03 +0300, [EMAIL PROTECTED] wrote:
It also appears along with other symbols used in the OED at
http://dictionary.oed.com/public/help/Advanced/symbols.htm#mod1letter.
(Again, not all these symbols are currently part of Unicode.)
To state the obvious (and random email does not
I keep coming across a letterlike symbol based on the letter p. In going
through my collections, I found it listed in a table of symbols in an
excerpt from the US Government Printing office style manual from
1984.
That symbol is named 'per' and looks like
To me, the symbol looks like something
At 01:34 AM 6/7/03 +0200, Philippe Verdy wrote:
- Original Message -
From: Asmus Freytag [EMAIL PROTECTED]
Can anyone shed further light on this character? I assume this is a lower
case form, does anyone care to confirm that?
Isn't your per symbol it similar to the form variant
At 05:21 PM 6/2/03 -0400, Jim Allan wrote:
Rick McGowan asked:
Can someone point more specifically to where it says anything about
variation selectors? This pointer is to the table of contents/overview...
Well, at
http://www.usefulcontent.org/docs/manuals/REC-MathML2-20010221/chapter6.htm
At 02:56 PM 5/29/03 -0700, Kenneth Whistler wrote:
António asked:
I've just downloaded the PDF files with 4.0 additions (U40-*.pdf). One
question: How is one supposed to tell apart the glyphs for U+1D29 and
U+1D18?... Or one isn't?... (OK, this question is probably more suited
to be posed to
At 09:10 PM 3/26/03 +, Michael Everson wrote:
At 10:48 -0800 26/03/2003, Kenneth Whistler wrote:
And the reason why U+2030 PER MILLE SIGN is the right answer is
that salinity is measured in grams per 1 kg of solution.
The question :-)
Yes, what is the question?
Shall Ken add salinity
At 11:13 AM 3/22/03 +0100, Pim Blokland wrote:
David Starner schreef:
Criss-cross-lain. is./i The alphabet; so called in consequence
of its being formerly preceded in the ihorn-book/i by a â to
remind us of the cross of Christ; hence the term. iChrist-Cross-
line/i came at last to mean
At 12:15 PM 3/21/03 -0800, Kenneth Whistler wrote:
Let's try this one on for size:
==
However, if you load the list of ISO/IEC 10646 character codes in a commercial
product, thus giving an added value to your product, we
At 11:55 AM 3/13/03 +0900, you wrote:
Dear Unicoders,
The unicode beta page mentions that a new concept of provisional
properties has been introduced to 4.0. Unfortunately, no text is
available that elaborates this. Is there any way to learn more about that
prior to publication of TUS 4.0?
At 06:47 PM 3/10/03 -0800, Kenneth Whistler wrote:
Sorry. I mean such an invisible character that would keep those letters
toghether, even when the inter-character space is expanded, like as if
they were in the same lead type. (The same thing I'd use decompose
U+0133 into i+THING+j.)
What
At 04:57 PM 3/5/03 +0100, Pim Blokland wrote:
I apologize if this question has been asked before, but I'm relatively new
at this.
My question is: where can I find formal definitions of the terms used in
the Character Name field of the UnicodeData.txt file? Most specifically,
precise
delete illegal sequences, but substitute
a replacement character for missing characters.
Mark
[EMAIL PROTECTED]
IBM, MS 50-2/B11, 5600 Cottle Rd, SJ CA 95193
(408) 256-3148
fax: (408) 256-0799
- Original Message -
From: Asmus Freytag [EMAIL PROTECTED]
To: Mark Davis [EMAIL PROTECTED
At 11:52 AM 3/3/03 -0800, Mark Davis wrote:
Perhaps I wasn't clear; I agree with you on that.
1) It is conformant to skip or substitute text, with just a code at the end
indicating that something of that sort was done.
It's a subtle point, but can be put into your formulation:
What I was after
At 01:07 PM 3/3/03 -0800, Mark Davis wrote:
If your converter purports to produce any one of the Unicode encoding forms,
then it cannot conformantly produce malformed Unicode as a result.
If, of course, it does not purport to do that, it can do anything it wants
to.
Then, as long as the
At 07:21 AM 3/2/03 -0800, Mark Davis wrote:
C12a When a process interprets a code unit sequence which
purports to be in a Unicode character encoding form, it
shall treat ill-formed code unit sequences as an error
condition, and shall not interpret such sequences as
Can we retitle this thread?
I'm getting actual replies to my posting of the BETA that I need to keep
track of, and the run-on discussion of UTF-8 under this title is distracting.
Thanks for your help,
A./
At 04:56 PM 2/26/03 -0800, you wrote:
Yung-Fong Tang wrote:
I see a hole here. How about
At 12:55 PM 2/25/03 +, Anto'nio Martins-Tuva'lkin wrote:
Most (all?) of them are composable, either by
means of letter + slash (OSLI) or by ZWJ (for things like Pta or
Pts, if anything),
Using ZWJ for such things is frowned upon. The ZWJ may be used to request a
ligature between two
At 07:26 AM 2/21/03 +0100, Werner LEMBERG wrote:
Show me a widely used font which contains both U+03C6 and U+03D5.
That was not the issue. The issue is when font wanted to add 03D5 that they
would not just put the opposite glyph into 03D5. Or just end up having a
duplicate glyph. Fonts that have
At 12:08 AM 2/21/03 +0100, Werner LEMBERG wrote:
Virtually all fonts I know of use the pre-3.0 glyph representations.
Sigh. Any suggestion how to fix this mess? [...]
To give just one very widely available example Times New Roman has always
used the post 3.0 glyph.
A./
.]
Asmus Freytag
Technical Vice President
The Unicode Consortium
At 08:13 AM 2/12/03 -0800, Doug Ewell wrote:
Even then, you may be behind a time lag of more than one month because
the UTC meetings minutes are posted a little late. So, to be fully
aware, apart from becoming a member, you should also attend UTC
meetings.
I would imagine that issues like
At 11:54 AM 2/6/03 -0800, Kenneth Whistler wrote:
My personal opinion? The whole debate about deprecation of
language tag characters is a frivolous distraction from
other technical matters of greater import, and things would
be just fine with the current state of the documentation.
But, if formal
At 01:52 AM 2/7/03 -0800, Andrew C. West wrote:
Ah, but decorative motifs are not plain text.
Ah, but it could be.
Ah, but it wouldn't be Unicode.
A(h)./
1001 - 1100 of 1250 matches
Mail list logo