David Hopwood wrote:
> We can *guess* what the column two glyphs look like from the descriptions,
> I suppose, but isn't it kind of important to have images of them?
Heh... Well, yeah, theoretically. We just don't have any glyphs for some
of the things in column 2. The items in column 1 will e
Doug Ewell reported:
> Many of the embedded images in the Standardized Variants
> document are missing.
The missing images have been fixed.
Rick
David Starner wrote:
> If the symbols in Unicode make a political
> statement by being there, then Unicode supports Christianity (U+2626 and
> others), anti-Christianity (U+FB29), Islam (U+262a), Hippies (U+262e),
> Communism (U+262d), and Dharma (U+2638).
Ahem... Not to mention Turtles. ;-)
R. Palais wrote...
> Which seems to make Unicode a defender of the status quo. Inaction is
> as political as action. "We are holders of the standards
> for the technology for encoding symbols, and we won't admit new symbols
> until they are widely used..." not necessarily the intent, but possibly
> For those that have not heard about the Unicode standard, you may want to
> download the pdf file that describes it at
> http://www.unicode.org/charts/PDF/U1D100.pdf
That isn't the first place I would say "describes" the encoding. That is
just the final code chart and name list. (Which is, of
Robert Palais wrote:
> Nelson Beebe recommended it since he figured unicode 3.2 would be
> the make or break for "getting it in use".
Speaking not officially, but as someone who has been lurking around here
awhile, the Unicode Technical Committee does not generally float trial
balloons. In o
> At least two of the links from
> http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE-old/
> showed empty pages (or nothing) in my browser a while ago.
Oops. The entire directory "APPLE-old" is old and obsolete. It was just
moved aside when the new files came in. It has now been removed.
Tex, et al --
> ... have added a Tengwar entry [...] to the Plane 1 Demo page.
Woah! Hang on there!
I would like to voice a shout of vehement discouragement about this sort
of thing. Tex wrote "it's not officially in Unicode yet" -- which still
means "it isn't in Unicode".
Making an entry
Isn't "accretion disk" something that forms around a black hole?
Please see:
http://www.unicode.org/unicode/consortium/distlist.html
All the details.
Rick
There is code for doing UTF8/16/32 conversions:
ftp://www.unicode.org/Public/PROGRAMS/CVTUTF
Rick
Dhrubajyoti Banerjee wrote:
> On Thu, 08 Nov 2001 Gaspar Sinai wrote :
> >I think that the Indian sctipts deserve better character
> >assignement -
[...]
> However the idea you present, of pushing half characters,
> does not sound correct.
Actually, Gaspar's idea of encoding half letters (and fo
> The correct site for the Shusha font for Devnagari is
> http://www.bharatbhasha.com/
And by the way, that site wants Visual Basic Scripting support, so you
can't view it in Netscape at all...
Rick
James Naughton wrote...
>The most authoritative-sounding page on the web which I could find when I
>was investigating this was an article on diacritics by J. C. Wells,
>University College, London:
>http://www.phon.ucl.ac.uk/home/wells/dia/diacritics-revised.htm
>
>He writes:
>
>"The term 'caron',
Doug Ewell wrote...
> Cyrillic was created as a better way to write Slavic languages, Russian in
> particular. Shavian and Deseret were created as better ways to write
> English. The former met with overwhelming success, the latter did not
It's usual to bind "former" and "latter" to the close
Nick, et al -- You mentioned:
> In Classical scholarship (and I suspect, beyond it), all
> four possible corner brackets are routinely used as punctuation
> to delimit text in some way ---
I saw your examples of these the other day in Greek text. The upper corners also occur widely. For example
> Anyone knows where I could find an online chart of the International
> Phonetic Alphabet encoded in Unicode (plain text or HTML)?
> Thanks in advance.
> _ Marco
Try the charts!
http://www.unicode.org/charts/
Rick
Some brief and not complete answers follow.
> I'm trying to get a grasp on exactly how many planes
> are defined in Unicode
> [...]
> How many planes are defined in Unicode 3.1?
There are 17 planes, and everything will be re-written to reflect that,
eventually. Most of the planes are empty (e
Here we go again... Before everyone goes off and starts blaming Unicode
for bad rendering...
When you render a combining character sequence and it "doesn't look right"
that is not the fault of the Unicode Standard, it is the fault of your
font and/or rendering software (and the people who d
For some reason, the following note from Mark Davis appears to have been
lost in space.
Rick
--
To: [EMAIL PROTECTED], [EMAIL PROTECTED]
Date: 09-19-2001 15:51
>From: Mark Davis/Cupertino/IBM@IBMUS
Subject: DerivedAge.txt
At the request of someone working with ICU, I regenerat
> If ISCII is still being developed does this suggest that Unicode and its ISO
> equivalent move too slowly?
ISCII dates back to 1988 with a revision in 1990. It's not "still being
developed" -- as far as I know, it's a stable standard that is under
routine maintenance.
I wonder if anyone
[EMAIL PROTECTED] wrote:
> The existence of the byte sucks.
Well, I suggest therefore that you do Civilizaton a favor and incidentally
leave your indelible Mark on History by devoting every waking moment of the
rest of your life to stamping out the accursed byte.
Rick
Check the code in the latest Beta.
Rick
Jaipal K, asked:
> 1) How exactly do I use the Unicode standard?
That depends on what you want to do with it. In Java, characters are
Unicode characters anyway, so you do nothing special. Java is a pretty good
choice for an underlying language.
But for basic questions about Unicode itself
> On 07/31/2001 05:58:57 AM Kairat A. Rakhim wrote:
>
> Cherkessian, Crimean Tatar, Kumyk, Nivkh are not yet presented in the
> list.
Peter C responded:
> It's my understanding that the Nivkh Cyrillic writing system requires a
> couple of characters that are not yet in Unicode.
Can someone pr
One of the questions asked most frequently is whether Unicode encodes some
particular language. As most of you know, Unicode doesn't encode
"languages", it encodes scripts. But the thing people often most want to
know is whether their language, or some other language, can be represented.
Jungshik Shin <[EMAIL PROTECTED]> wrote:
> I put up a screenshot of glyphs for GooGyeol
> characters included in one of fonts mentioned above at
>http://jshin.net/~jungshik/i18n/googyeol.png
Looks to me like everything, or nearly everything, in that list is just a
brush-style rendering of
> This table was generated by the Unicode group for use with TrueType
> and Unicode
What an unfortunately worded comment. They probably mean the group of
people inside MS who worked out the list of codes used by the RTF
implementors. They certainly don't mean Unicode, Inc.
> (Does the cons
P. Andries wrote:
> I'm still interested by a definition of "in(-)line software"
> (http://www.unicode.org/unicode/reports/tr27/)". I know what inline code
> or processing could be but I can't quite understand the relationship
> with the inline software mentioned here and processing music text.
> Was such encoding done due to some historical reasons in the past?
Yes. The rules for future allocation were formulated many years after
thousands of "questionable" codepoints were already encoded in the early
days. Usually, these things (presentation forms, ligatures, compatibility
cod
Caveat: This will only be of interest to Japanese speakers, so you can hit
delete now if you're not interested.
TOYOSHIMA Masayuki pointed at:
> http://www.asahi-net.or.jp/~lf4a-okjm/genkan61.htm
Oh! How interesting! Exactly what I needed.
The link to:
http://www.asahi-net.or.jp/~l
Carl Brown suggested:
> If you convert to iso-8859-1 you lose characters that is just as bad as
> sending Windows-1252 out as iso-8859-1.
Well... If the author converts to ISO 8859-1 on the way out, the author
might "lose" characters. If you send 1252 labelled as 8859 to the world,
everyone
Thanks to a few people who gave me the answer. I keep forgetting that there
are so many multiple romanizations; I didn't try the "other" romanization,
but was trying to type "dzu" (voiced "tsu"), and just about everything else.
Thanks.
By the way, in case anyone is curious... Why does any
Speaking of all this UTF-8 & mojibakes etc,
Here's a question for the Japanese speakers & users of Word 2000... I'm
using Word2k on Win98. How do you input the syllables U+3065 and U+30C5 with
the Japanese Global IME? I.e, I want the "zu" syllables obtained by adding
dakuten to "tsu" rat
> Watashi wa loco en la cabeza
Duh, well, use katakana as appropriate, use middle-dots between your foreign
words, and people might get it.
Rick
Doug Ewell wrote...
> > @š‚¶‚イ‚¢‚Á‚¿‚á‚ñš
> > @Ž„‚͂낱‚¦‚ñ‚ç‚©‚ׂ³B
>
> Robert, please stop this. It doesn't seem to be UTF-8 (that is, I can't copy
> and paste it into UniPad or Windows 2000 Notepad and see anything
> reasonable)
Eeek.. What's that? 11's comment shows up fine in m
> Unfortunately, you don't hear much about SCSU, and in particular the Unicode
> Consortium doesn't really seem to promote it much (although they may be
> trying to avoid the "too many UTF's" syndrome).
Probably that's one point. But also, SCSU is something that's a little more
complicated to
Thank you Kay Genenz. This web page is helpful. I was not aware of any
of this info. I'm not surprised they disappeared from the roadmap.
> Should one consider the Chinese oracle bone
> inscriptions (1200 BC) for entry to the unicode list?
> They really did exist.
They are "unified" with th
Thomas Chan wrote...
> I'd like to ask about the encoding status of the Japanese "Jindai"
> scripts, which are mentioned in older documents[1], and until a certain
> point in time, versions of the Roadmap.
Do you have a paper on the topic? You say "over a dozen 'Jindai'
scripts". What does t
I don't think there's any point in encoding 64 hexagrams; especially when
we have the pieces already. Use the pieces of three and position them with
a drawing program. We don't have combining thingies for putting chess
pieces on board squares, either.
Rick
Martin v. Loewis" <[EMAIL PROTECTED]> wrote:
> It seems to be unclear to many, including myself, what exactly was
> clarified with Unicode 3.1. Where exactly does it say that processing
> a six-byte two-surrogates sequence as a single character is
> non-conforming?
It's not non-conforming, it's
Marc-Andre Lemburg wrote:
> Do you have references which we could look at
> to determine which of these boundary kinds would actually be
> useful in daily programming ?
There are two things utterly useful in daily programming... One is to get
a "character", whether it's a surrogate or not; an
Gaute B Strokkenes wrote...
> [I'm cc:-ing the unicode list to make sure that I've gotten my
> terminology right, and to solicit comments
Interesting... I just started looking at Python the other day, once I
discovered it has such nice built-in Unicode support.
If Python is explicitly storing
I only have one question. What do blueberries have to do with XML?
Rick
Hello Geetika,
> I have to parse the UTF8 characters , so that they can accepted by
> the C++ code.
I wonder a little what you mean exacty by "parse" them so they can be
"accepted" by the code. In any case, if you want sample code for UTF
conversions, please try:
http://www.unicode
Carl,
> On Tue, 12 Jun 2001, [EMAIL PROTECTED] wrote:
>
> I am not sure exactly what you mean by this. Do you mean adding a
> '>' at the beginning of each line and replying in place? If so then
> replying to a 300+ line missive is a lot of hand editing.
Just remove the 300 lines!!! That's pre
Lisa asked...
> Shouldn't a war about UTF-8 be discussed on Unicore?
Well, theoretically perhaps, but personally speaking I believe that this
UTF-8 business is so choice and has such far-reaching implications for
every user and so many other standards that, like presidential private
lives,
Everson wrote:
> Lots of people with names like McGowan like to have the "c",
> ostensibly an abbreviation for "ac" superscripted and underlined. ;-)
(Sound of wretching...) Uh, no. I like it just fine as-is. If I
actually spelled my name with a small superscripted underlined "c", even
mo
Michael Kaplan <[EMAIL PROTECTED]> wrote:
> ... asking for a lavicious license to be lecherously lazy
Parse error at "lavicious". No such word appears in any English
dictionary I own, not even the OED.
Rick
Bev, Ken already answered my questions: it uses the Latin script.
Rick
> The Lushootseed language is ordinarily written in the Latin
> script. And the encoding of the Latin script includes, as far
> as I know, all the additional letters and diacritics needed
> for the standard Americanist
Hi Bev --
> Does anyone know if the Lushootseed language is included in Unicode?
> I searched but could not find it if you have an URL could you
> please send. Thanks in advance.
AS others will no doubt tell you, this standard encodes _characters_ used
for writing, it doesn't encode _lan
Toby, I think you forgot to comment on these objections that have also
been coming up from time to time:
* Introduction of UTF-8S would merely add to the myriad forms people would
already have to support, and it is insufficiently distinuguishable from
UTF-8.
* encoding ambiguities in the s
> The main difference from SCSU is that this method preserves binary order.
Ah. And which binary order does it preserve?
The right one, or the other one? ;-)
Rick
> So I suggest to correct the problem before it came out.
> And I would like to propose UTF-32s.
I think this has been anticipated, I think by some people who proposed UTF-8S.
My opinion, for what it's worth, is that there should be no new formats.
We have too many of them already, and making
Some people have pointed out to me privately that my previous wording of a
message implied something that I didn't intend at all, so let me rephrase
this and retract my previous note.
I am not writing in any official capacity at all. I'm just another
ordinary reader of the Unicode list, and
Some people said things like...
>There was another abomination proposed.
>I was choosing not to mention the abominable.
The abominable steam-rollers of history squish those who don't scream and
run; and the few weak survivors are forever cleaning up the resulting
messes.
If you think someth
$B}*$8$e$&$$$C$A$c$s}*(B wrote:
> $B
Marco Cimarosti wrote:
> East Asian Width is a property that tells whether or not each Unicode
> character should have the same typographical width as a CJK ideograph. The
> property may be "yes", "no", or a few different kinds of "maybe".
Whoa, wait... Whether or not you care at all about the E
Peter said:
>2. How do I get software X to know how to process my PUA characters, or how
>do I document my characters for others to understand my data?
Michael replied...
> In principle it would work, if the OSes are being written to handle user
> editing of such things. Ten euros sez they ain'
William Overington wrote, responding to Ken:
> As I have not claimed that any such case actually exists at the
> present time, then the challenge is null and void and I have no need to
> answer it.
Hmmm... I still think, personally speaking, that you're going through a
lot of effort that appear
William Overington wrote...
> So, when Ken states the sentence above, is that Ken writing as a private
> individual ... or Ken writing as a Technical Director
> ...
> ... there exists scope for considerable confusion as to the
> provenance of a statement made on this list where members of the uni
Eric Muller quoted from a Seybold Report, but... I think it's out of date.
Actually, I'm not talking about the "Gaiji Problem". It's a well-known
special case of needing things that aren't in the standard one is using;
but it's a private need.
As long as the system you're using lets you m
David Starner <[EMAIL PROTECTED]> wrote:
> Most of the PUA usages seem to be stuff the UTC refuses to encode
> (Apple's logo, Klingon
First a correction: UTC has not yet, unfortunately, actually *refused* to
encode Klingon. It still sits on the books. (I think they should formally
refuse t
There has been a lot of recent discussion about various uses of the PUA.
Can someone point to widespread instances of confusion and chaos right now
over PUA usage? I don't think there is any.
It seems to me there's a lot of effort being expended to engineer the
regulation of something that
> 1. The first one is an "Arabic Subscript Alef",
I thought we had that one... but I don't find it among the Koranic annotations.
> 3. The most weird of all, was that after finding all the dingbats and
> weird shapes, one was missing: a "White Square Containing White Small
> Square" (compare wit
Doug Ewell quoted:
>"By convention, the Private Use Area is divided into a Corporate Use subarea,
>starting at U+F8FF and extending downward in values, and an End User subarea,
>starting at U+E000 and extending upward."
Then Michael Everson wrote:
> This has nothing to do with ISO/IEC 10646.
>
> users who have the most interest vested in
> the encoding are the scholars themselves (and they are saying the state of
> the art prevents a useable encoding at the time)
I don't think it's all scholars who have objected to the Egyptian
proposal. But this is a case where there appears to be
Roozebeh wrote...
> Oh, I just found it! It's also encoded as a character in the national
> standard ISIRI 2900, dated 1989 (which is a 7-bit character set standard).
> I will update the proposal. So you can be sure that you have not disobeyed
> the rules ;)
Oh good! Nice bit of research...! T
Roozbeh asked...
> Would you please give me the reference? I once heard this, but after
> seeing a new proposal for "Arabic Tail Fragment" approved by UTC to be
> encoded in "Arabic Presentation Forms-B" block (SC2/WG2 document N2322), I
> thought I was wrong.
That proposal and this follow-on pr
Mike wrote:
> In particular step 5 should be made required instead optional.
Eh? And deprive the committees of the pleasure of endlessly debating the
one true shape of the unspecified glyph???
Rick
application/www-form-urlencoded
David Starner wrote:
> I have a copy of Shellbear's Practical Malay Grammar that I'm preparing
> to transcribe for Project Gutenberg. Unfortunately, he represents the
> Malaysian alphabet in a Latin transliteration that includes ng as a
> single ligatured form, and I don't know how to transcribe
Markus complained:
> Thai is not stored/used in logical order in Unicode.
Here's my contribution to the FAQ about Thai:
Q. Why isn't Thai stored/used in logical order in Unicode?
A. Once upon a time, the Unicode fore-parents inherited the Thai
industrial standard, which is an 8-bit standard
> Why are the punctum and semi-brevis unified with U+1D147 and U+1D1BA
> since, unless I err, they do not share the same value but only a
> visual similarity
Well... the rationale for that would be the same thing that unifies the
"." in "3.14" and "Mr. Fung".
However, in this case, it's true,
> indigenous Albanian alphabet? I read about it in von Hahn's 1854
I believe it's on the Roadmap as "Büthakukye" slated for Plane 1.
See Faulmann, page 182.
Rick
Lukas P said:
> I'd be interested to learn the rationale behind these choices. Is the
> original proposal available anywhere?
Try:
http://viva.lib.virginia.edu/dmmc/Music/UnicodeMusic/
That's Perry Roland's original proposal, with a lot of examples. I'm not
sure you'll get much ratio
P. Andries asked:
> 1) Where is the Gregorian punctum (square dot) ? Is it unified with another
> dot, another shaped note (U+1D147) ? If so, why ?
I am double-checking, but I believe it's unified. I'll have more info later.
> 2) How would a triplet (a group of three notes to be performed in
It has always been my impression that the dz and other digraphs were
included ONLY because they existed in standards that were used as source
material by the Unicode designers. Such digraphs would not have been
encoded otherwise.
Rick
> Adam mentions the Latin digraphs encoded for
"G. Adam Stanislav" <[EMAIL PROTECTED]> wrote...
> I believe there are other *human* scripts that need to be encoded
> in Unicode before Klingon (is Mayan encoded yet, for example?).
I sympathise with the general sense here, but Mayan isn't a great example.
Mayan is dead, as are many other sc
thejokrishna wrote:
> Hi all,
> Can you please point out web applications which entirely support
> UTF-8? i.e an application which takes Unicode characters as input,
> stores them in a database and retrieves them.
Please see the Unicode web pages. There is a lot of information about
applicati
Let me throw my light weight in with John O'Conner...
It's silly to even consider Klingon for Unicode or 10646. Many members of
both committees know this, and that's why it hasn't moved anywhere in
several years. The question keeps cropping up because that silly proposal
is still "on the
> One chapter in Chiang's book is a glossary, which gives brief English
> glosses (one or two words) and transcription in Han characters.
Does this claim to be complete (more or less)? Or is it just an
illustrative subset?
In any case, it would be nice if someone were able to make a complete
> it's the "woman's writing" used in Jiangyong country, Hunan province,
> China, of some ~1000-1500 glyphs
Ah, yes... This has been proposed and debated before. It certainly is
a candidate for encoding, if a thorough enough description of it is
available. Last time I went looking for infor
The question that I keep asking is who wrote this missive, and if Alain
didn't write it, where did he get it? That's the most basic question I
had.
Rick
Elain wrote:
> Chinese and Japanese newspapers are still mostly written in a vertical,
> frequently right-to-left, boustrophedon.
No, not exactly. They don't go "as the ox plows", and it is entirely
improper to utilize the term "boustrophedon" to refer to them. They are
written in columns,
Elaine... Quick reply, sorry. I should be more verbose, but I hope
others can chime in.
> Is Unicode's so-called "bidi algorithm" really bidirectional, that is,
> does it govern horizontal text layout in right-to-left and left-to-
>right languages?
Yes.
> Or is "bidi" a metaphor here, fo
Everson opined:
> But I suspect he didn't write it.
> It looks very much like the kind of thing an enthusiastic
> second-year university student would write as a term paper.
If Alain wrote that diatribe, he should have said so to avoid any such
questions. Otherwise, it should not have been
The Venerable Dr Whistler wrote:
> I'm sure there is, but I can't lay hands on it right at the moment.
> It's sitting in a box in the basement somewhere.
Uh... He probably meant to write:
"Yes, it's right here as you can see from Diagram 7,
it's part of the thin banded layer right above th
For what it's worth, in this oh-so-important discussion... I have seen this length
mark used with both Katakana and Hiragana (I suppose that puts me in the good company
of 'Leven Digit Boy, only he can prove it and I can't). Call the usage nonce or
whatever... So what? It would be fair to say
Mike Ayers wrote:
> The last I knew,
> computer-savvy Taiwan and Hong Kong were continuing to invent new
> characters. In the end, the onus is on the computer to support the user.
Yes, the computer should support the user, but... The invention of new characters to
serve multitudes is OK, and i
[EMAIL PROTECTED] wrote:
> Unfortunately, there's no corresponding LATIN CAPITAL LETTER N WITH LONG
> RIGHT LEG, which Lakota needs.
To my knowledge, the discussion in September between John Cowan and Curtis Clark
didn't terminate with any actual proposal, and I'm not clear on whether the above
> 1. Is a halant/virama ever valid following other than a consonant (or
> consonant and nukta)?
Legal? In the sense of "any string is legal", yes; as is anything else. The
implementation question to answer is whether it's useful or renderable, and if so, how.
The independent vowel followed by
Mike Ayers wrote:
> I am aware that there are European languages (swiss and italian?)
> that group four digits, and am reasonably sure that japanese does.
Japanese? I don't think so.
Rick
> The only exception I am aware of to this rule is
> the OmniWeb application which runs only on Mac OS X.
One dis-advantage of OmniWeb, by the way, for international use, is that it requires
that you set (in a preference panel) the encoding it uses for pages that it renders;
it doesn't know abo
Yup, I think Otto is right... Just nodding my agreement with the trend...
>
> 2311 Square Lozenge
>
> Best wishes,
>Otto Stolz
>
> David Starner <[EMAIL PROTECTED]>
> There is a document at http://www.linuxdevices.com/news/NS3271194620.html
After reading this article, I'm quite certain that (to quote the article): "CGS
coverage of each language is much more comprehensive than Unicode's, and is more
efficient to use". I'
Where have you erred? That page isn't encoded in UTF-8! Setting your browser to
interpret the page as UTF-8 won't work if the page isn't in UTF-8. The page appears
to be in 8859-1, but doesn't actually say... I would have figured that Yahoo would
have charset headers, but I don't see one in
> My point is that for some languages there is no single
> orthography that can ever be nailed down.
Yes of course. But there's nothing to prevent the development of a system of
orthographic tags, and nothing to prevent combining orthographic tags with language
tags for complete mix-and-match
Tom Emerson wrote:
> One (well, the only) problem I have with explicit orthographic tagging
> is that it makes assumptions that a consistent orthography is being
> used throughout a document, which isn't necessarily the case. This is
> particularly prevalent in East Asian languages:
Well, the ta
Otto Stolz wrote:
> I think, the ethnologue lacks information about variant orthographies.
Yes, it does. But that's OK, because we can make a composite tagging system that tags
orthography separately from language.
So... does anyone have a comprehensive list of orthographies?
Rick
301 - 400 of 419 matches
Mail list logo