RE: Encoding: Unicode Quarterly Newsletter

2003-03-11 Thread Marco Cimarosti
Otto Stolz wrote: Beware: When the book is thrown at a large speed, the relativistic effects must be taken into account. I hope that the editors took pains to find a wording that will not upset anybody to the extend that he would throw the book away at a considerable fraction of the speed of

Khmer encoding model (had no subject)

2003-03-04 Thread Marco Cimarosti
Mijan wrote: [...] 3. There are no other cases of a Vowel+Virama combination in the Unicode encoding model. Yes, there are. Khmer. I do not understand Khmer but I see that it does not use the same 'encoding model'. Please look, you will see that you were wrong to use Khmer as an

RE: Need program to convert UTF-8 - Hex sequences

2003-03-04 Thread Marco Cimarosti
David Oftedal wrote: [...] I need a program to convert UTF-8 to hex sequences. [...] For example, a file with the content would yield the output 0x00E6 0X00F8 0X00E5, and the Japanese expression would yield 0x3042 0x306E 0x4EBA. SC Unipad, an Unicode editor, can do this for you:

RE: Need program to convert UTF-8 - Hex sequences

2003-03-04 Thread Marco Cimarosti
Doug Ewell wrote: David Oftedal david at start dot no wrote: Hm yes, so I see, but I should have been more specific, I actually need an app that can do this automatically, either in ansi C, Perl, or a Linux binary. I need to call it from a script, so it's got to happen automatically.

RE: accented cyrillic characters

2003-02-24 Thread Marco Cimarosti
Barnie De Los Angeles wrote: Even after studying the Unicode web site for a while I am not able to find a solution for this issue. The task is to include accented cyrillic characters (vowels only) into russian html. (Vowels are accented or stressmarked in Russian for educational

RE: Unicode 4.0 beta characters.

2003-02-24 Thread Marco Cimarosti
William Overington wrote: [...] In the same U40-2600.pdf document are six Yijing monogram and digram symbols. I wonder if someone could please say something about these characters as to their meaning. [...] The Yijing (also spelled Yi Jing, I Ching, I-Ching, etc.) is a very famous book of

[REPOST, LONG] XML and tags (LONG) (derives from Re: Plane 14 Tag Deprecation Issue)

2003-02-21 Thread Marco Cimarosti
I sent this message yesterday but I didn't see it on the Unicode list. Possibly, this was because the ZIP contained two executable programs: now I removed them; anyway, the ZIP contains the source code. BTW, I took the occasion to correct a few grammar errors... _ Marco -

RE: XML and tags (LONG) (derives from Re: Plane 14 Tag Deprecation Issue)

2003-02-21 Thread Marco Cimarosti
William Overington wrote: [... PageDown, Delete ... PageDown, Delete ... PageDown, Delete ... PageDown, Delete ... PageDown, Delete ... PageDown, Delete ...] 4. The text files being transmitted MUST be small (bandwidth is limited!). Yes, keep the text file size down, bandwidth is

RE: traditional vs simplified chinese

2003-02-13 Thread Marco Cimarosti
Paul Hastings wrote: i suppose this is a really simple minded question but is there any way of telling if an incoming chunk of text (say from a browser form) is traditional or simplified chinese? Please notice that the classification you want is not always meaningful. E.g., what if the

RE: traditional vs simplified chinese

2003-02-13 Thread Marco Cimarosti
Zhang Weiwu wrote: Take it easy, if you find one 500B (the measure word) it is usually enough to say it is traditional Chinese, one 4E2A (measure word) is in simplified Chinese. They never happen together in a logically correct document. A few examples of perfectly logically correct

RE: traditional vs simplified chinese

2003-02-13 Thread Marco Cimarosti
Edward H Trager wrote: [...] If I were going to write such an algorithm, I would: * First, insure that the incoming text stream to be classified was sufficiently long to be probabilistically classifiable. In other words, what's the shortest stream of Hanzi characters needed, on

RE: Indic Vowel/Consonant combinations

2003-02-13 Thread Marco Cimarosti
Andy White wrote: The Unicode Standard disagrees. TUS3.0, Chapter 9, page 214, Figure 9-3 (Conjunct Formations), example (4) [...] In the light of Jim Agenbroads information and references, I think this sentence is wrong. Yes, in *that* light it is, of course... :-) Just I think that

RE: traditional vs simplified chinese

2003-02-13 Thread Marco Cimarosti
Paul wrote: To: Edward H Trager Marco Cimarosti has questioned, why do you need to classify text as being simplified or traditional? if i understand their needs correctly, its to implement a search system with search phrases of either type of chinese--content would be in both types

RE: Never say never

2003-02-12 Thread Marco Cimarosti
Kenneth Whistler wrote: Marco Cimarosti wrote: It has been repeated a lot of times that no more precomposed character will never ever ever ever be added. ... I trust the clarification from John Cowan helped on this -- there is no prohibition against adding characters

Never say never

2003-02-11 Thread Marco Cimarosti
Unicode's (n)ever's can sometimes be puzzling. It has been repeated a lot of times that no more precomposed character will never ever ever ever be added. But now I see from http://www.unicode.org/charts/PDF/U40-2100.pdf that the following new character will be added in 4.0: - code: U+213B -

RE: Handwritten EURO sign

2003-02-07 Thread Marco Cimarosti
Marion Gunn wrote: I wonder if any Unicoders have seen the handwritten EURO sign which differs substantially from the usual computer-generated kind? In Italy, it is becoming common to see a sort of left parenthesis crossed by a small Z. Notice that this is very similar to a common

RE: discovering code points with embedded nulls

2003-02-06 Thread Marco Cimarosti
Doug Ewell wrote: Kent Karlsson kentk at md dot chalmers dot se wrote: From what I'm hearing from you all is that a null in UTF-8 is for termination and termination only. Is this correct? No, NULL is a character (actually a control character) among many others. However, many C/C++

RE: discovering code points with embedded nulls

2003-02-06 Thread Marco Cimarosti
Stefan Persson wrote: What is that strange file (winmail.dat) attached to your mail? I really hope that it isn't a virus. http://support.microsoft.com/default.aspx?scid=KB;en-us;q241538 (Whether MS Outlook is a virus or not, is still a debated issue. :-) _ Marco

RE: discovering code points with embedded nulls

2003-02-05 Thread Marco Cimarosti
Erik Ostermueller wrote: I'm dealing with an API that claims it doesn't support unicode characters with embedded nulls. I'm trying to figure out how much of a liability this is. If by embedded nulls they mean bytes of value zero, that library can *only* work with UTF-8. The other two UTF's

RE: Suggestions in Unicode Indic FAQ

2003-01-30 Thread Marco Cimarosti
Keyur Shroff wrote: However, I totally agree with Kent that this funny rendering is *not* a requirement of the Unicode standard, as Keyur Shroff seems to suggest. It is just an example of many several methods [that] are available to deal with strange sequences. A sequence should

RE: Indic Devanagari Query

2003-01-29 Thread Marco Cimarosti
Aditya Gokhale wrote: Hello Everybody, I had few query regarding representation of Devanagari script in Unicode All your questions are FAQ's, so I'll just reference the entries which answers them. (Code page - 0x0900 - 0x097F). Devanagari is a writing script, is used in Hindi, Marathi

RE: Suggestions in Unicode Indic FAQ

2003-01-29 Thread Marco Cimarosti
Keyur Shroff wrote: In the FAQ http://www.unicode.org/faq/indic.html#16 It is mentioned that following are equivalent ISCII Unicode KA halant INV KA virama ZWJ RA halant INV RAsup (i.e., repha) The last line is really bizarre! I would agree that it is

RE: Suggestions in Unicode Indic FAQ

2003-01-29 Thread Marco Cimarosti
Keyur Shroff wrote: But sometimes a user may want visual representation of these symbols in two different ways: with dotted circle and without dotted circle. Why not using a dotted circle character explicity, when you want to see one? Example of this could be RAsup on top of dotted circle

RE: Indic Devanagari Query

2003-01-29 Thread Marco Cimarosti
Christopher John Fynn wrote: I had thought that the argument for including KSSA as a seperate character in the Tibetan block (rather than only having U+0F40 and U+0FB5) was originally for compatibility / cross mapping with Devanagari and other Indic scripts. Which is not a valid reason

RE: unicode for Japanese/Chinese web sites w/forms

2003-01-23 Thread Marco Cimarosti
Eric wrote: [...] The sites utilize forms and my current programmers cannot code these two sites for form uploads with Japanese and Chinese text. This is a bit generic, and I can't imagine how Unicode could possibly conflict with HTML forms. Can't you put on-line an sample Chinese or Japanese

[Very OT] You're with me always...

2003-01-08 Thread Marco Cimarosti
Sorry for this small OT. Anyone wishing to contribute a translation to the next I-can-eat-glass-like project? Te tengo conmigo siempre... con tu nombre en mil idiomas. En cada nota, en cada arpegio, en cada aroma. http://groups.google.com/groups?threadm=avg1e5%24eql6d%241%40ID-155044.news

RE: Unicode Standards for Indic Scripts

2003-01-08 Thread Marco Cimarosti
Michael Everson wrote: At 06:43 + 2003-01-08, Manoj Jain wrote: Dear Friends, The existing Unicode Standards for Indic scripts have some discrepancies. What does that mean? What are discrepancies? Can you summarize? I will download the very large files, which will take some time,

Status of Unihan Mandarin readings?

2002-12-19 Thread Marco Cimarosti
I have tried to follow the discussion about the errors in field kMandarin of file Unihan.txt but, after a while, I lost my way with all those dictionary references... Could someone kindly make a short summary of the situation? Here are my biggest ???'s: - Are the errors really there? - Any

RE: Precomposed Ethiopic (Was: Precomposed Tibetan)

2002-12-18 Thread Marco Cimarosti
John Hudson wrote: The Ethiopic script is *not* made up of sub-syllabic units: the syllable is the minimum unit of writing. The same is true to Yi and the Canadian Aboriginal Syllabics. The fact that Ethiopic has recently been input phonetically should not lead to confusion about the

RE: Precomposed Tibetan

2002-12-18 Thread Marco Cimarosti
Andrew C. West wrote: If anyone thinks that a mapping table would be useful as a weapon in the fight against the Chinese proposal, I would be happy to provide one. Do you have the relevant data? As I said, so far I found little or nothing about BrdaRten or about the Founders System mentioned

RE: Precomposed Tibetan

2002-12-18 Thread Marco Cimarosti
Andrew C. West wrote: On Wed, 18 Dec 2002 04:59:08 -0800 (PST), Marco Cimarosti wrote: Do you have the relevant data? As I said, so far I found little or nothing about BrdaRten or about the Founders System mentioned by Ken Whistler. Don't need anything more than the code charts

RE: converting devanagari to mangal unicode

2002-12-17 Thread Marco Cimarosti
John Hudson wrote: At 03:09 PM 12/16/2002, Eric Muller wrote: In order to convert any Devanagari font to be rendered in the same way, May be Sunil is just asking for a conversion of data, presumably from ISCII to Unicode. Ah, yes, this is possible. I'm so used to people asking the

RE: converting devanagari to mangal unicode

2002-12-17 Thread Marco Cimarosti
Bob Hallissy wrote: NB: One of the complexities you may run into, and which will limit your options, is that your encoding may store text in a different order than Unicode requires. If this is the case, TECkit can do the rearrangement for you but I'm not sure ICU will easily do that. Certainly

RE: Precomposed Tibetan

2002-12-17 Thread Marco Cimarosti
Jungshik Shin wrote: [...] http://std.dkuug.dk/jtc1/sc2/WG2/docs/n2558.pdf [...] Is there any opentype/AAT font for Tibetan? Do Uniscribe, Pango, ATSUI, and Graphite support them if there are opentype Tibetan fonts? In addition to the principle of character encoding, the best practical

RE: Precomposed Tibetan

2002-12-17 Thread Marco Cimarosti
Carl W. Brown wrote: Marco, I was disappointed that Unicode used precomposed encoding for Ethiopic. Was that my fault? I'm not even a member of Unicode! _ Marco :-)

RE: Precomposed Tibetan

2002-12-17 Thread Marco Cimarosti
Michael Everson wrote: What the encoding of a set of brDa rTen precomposed syllables would do would be to restrict the Tibetans to this set, to which they have been restricted by the proprietary Founder software used in China. These 950 syllables are insufficient to express anything but

RE: Farsi Keheh +06A9 vs. Arabic Kaf +0643 ??

2002-12-12 Thread Marco Cimarosti
Houman Pournasseh wrote: The difference between the Arabic Kaf (U+0643) and the Persian Kaf (U+06A9) is in it's final form. The Arabic Kaf has a Hamza and is missing the diagonal line above the glyoh. BTW, the Persian final form is also common in some Arabic countries. I am attending a course

RE: Farsi Keheh +06A9 vs. Arabic Kaf +0643 ??

2002-12-12 Thread Marco Cimarosti
Miikka-Markus Alhonen wrote: Lainaus Marco Cimarosti [EMAIL PROTECTED]: These made me wonder about a couple of Unicode disunifications: - U+0643 (ARABIC LETTER KAF) vs. U+06A9 (ARABIC LETTER KEHEH) vs. U+06AA (ARABIC LETTER SWASH KAF); Keheh vs. swash Kaf seems to be contrastive

RE: [OT] HAIKU computer talk

2002-12-05 Thread Marco Cimarosti
Peter Constable wrote: On 12/05/2002 02:14:21 AM Joe Becker wrote: Poetry in motion: text elements rendered on sheep http://www.ananova.com/news/story/sm_719935.html http://www.freerepublic.com/focus/news/800101/posts/ Hmmm... I wonder if I could get money to experiment

RE: Localized names of character ranges

2002-12-03 Thread Marco Cimarosti
Mark Davis wrote: While not a trivial task (about 400 terms), it is many, many times easier than translating all the significant character names. That might someday be worth considering for the Common XML Locale Repository (http://oss.software.ibm.com/icu/locale/). The problem is not the

RE: Devanagari

2002-12-03 Thread Marco Cimarosti
Vipul Garg wrote: I have downloaded your font chart for Devanagari, which is in the range from 0900 to 097F. I have also installed the Arial Unicode font supplied by Microsoft office XP suite. I found that not all characters are available for Devanagari. For example letters such as Aadha

RE: Proposal to add Bengali Khanda Ta

2002-11-29 Thread Marco Cimarosti
Andy White wrote: Please see and comment on my Proposal to add 'Bengali Letter Khanda Ta' to the Bengali Block (initial version): http://www.exnet.btinternet.co.uk/KhandaWeb/khandaproposal.htm | [...] | This example shows that in order to display the correct | spelling of the word 'satmaa',

RE: Proposal to add Bengali Khanda Ta

2002-11-29 Thread Marco Cimarosti
Andy White wrote: Marco wrote I have a few questions: - What is the meaning of satmaa and sadaatmaa? 'satmaa' means stepmother. 'sadaatmaa' means 'good soul' / 'virtuous' Bingo! Well, nearly... My guess was that satmaa was the Bengali for Wachstube. :-) German has two different

XTF-Morse (was RE: UTF-Morse)

2002-11-22 Thread Marco Cimarosti
Doug Ewell wrote: Yes, it's true. Marco had sent me his UTF-Morse proposal just yesterday, along with a suggestion that I put together an implementation for April Fool's Day. And darned if I wasn't really going to do it. As a JOKE. But Marco, you need to check your invented sequences

RE: UTF-Morse

2002-11-22 Thread Marco Cimarosti
Otto Stolz wrote: Marco, you shall be called Marcone, or even (granting a Pluralis majestatis): Marconi ;-) Hey! I have a little bit of a belly, but not yet enough to justify calling me Marcone. :-) BTW, your careful analysis of Morse needing four code units made me think that there could be a

RE: Morse code

2002-11-19 Thread Marco Cimarosti
Andrew C. West wrote: On Tue, 19 Nov 2002 04:41:58 -0800 (PST), Radovan Garabik wrote: Moreover, Morse characters are distinct logical entities, primary representation of them is audible Precisely. So for example ..- is pronounced dot dot dash (three distinct logical entities) not u.

RE: Morse code

2002-11-18 Thread Marco Cimarosti
Otto Stolz wrote: Radovan Garabik wrote: Recently I got a crazy idea: why not include Morse code characters in unicode? (Yes, I know it is crazy, but when Braille is already included...) I was under the impression that all three Morse code elements are already in Unicode: U+00B7

RE: The result of the plane 14 tag characters review.

2002-11-18 Thread Marco Cimarosti
Dominikus Scherkl wrote: A good example is the production of multilingual manuals, which seem to be more and more common these days. This is indeed a very good example. ... of something which is not very appropriate for plain text. I agree that in this example, higher-level markup would

RE: Errors in the Indic FAQ

2002-11-18 Thread Marco Cimarosti
Andy White wrote: A graphical version of this message available here: http://www.exnet.btinternet.co.uk/KhandaWeb/khanda.htm It is proposed by the Indic Unicode FAQ that Bengali Khanda_Ta should be encoded as Ta Virama ZWJ ... and that an explicit Ta_Virama can be encoded as Ta Virama

RE: The result of the plane 14 tag characters review.

2002-11-13 Thread Marco Cimarosti
Kenneth Whistler wrote: Ahem... The Unicode Technical Committee would like to announce that no formal decision has been taken regarding the deprecation of Plane 14 language tag characters. The period for public review of this issue will be extended until February 14, 2003. Out of

RE: The result of the plane 14 tag characters review.

2002-11-13 Thread Marco Cimarosti
Doug Ewell wrote: 1. What extra processing is necessary to interpret Plane 14 tags that wouldn't be necessary to interpret any other form of tags? In order for the question to make sense, we should compare plain text with plain text and rich text with rich text. 1.a) Take plain text: however

RE: The result of the plane 14 tag characters review.

2002-11-13 Thread Marco Cimarosti
I wrote: [...] A lighter-weight method is not having language tagging at all in plain text. This is appropriate in two cases: 3.a) When you don't language tagging. [...] ^ Sorry: I meant: When you don't need _ Marco

RE: Is long s a presentation form?

2002-11-11 Thread Marco Cimarosti
Michael Everson wrote: I like to think of the long s as similar to the final sigma. Nobody thinks that final sigma should be a presentation form of sigma. Never say nobody: I *do* think that Greek final sigma, final Hebrew letters, and Latin long s should all be presentation forms. I think

RE: A .notdef glyph

2002-11-08 Thread Marco Cimarosti
. Italians, for instance, pull a different part of the body, which is located at the top of the back side of the legs. Marco Cimarosti 8 November 2002 attachment: pulleg.gif

RE: A .notdef glyph

2002-11-08 Thread Marco Cimarosti
Michael Everson wrote: The, ah, tail? Hem, slightly closer to the legs. _ Marco

RE: ct, fj and blackletter ligatures

2002-11-07 Thread Marco Cimarosti
Kent Karlsson wrote: (Subword boundaries are likely hyphenation points, whereas occurrences of ff, fi etc. elsewhere are unlikely hyphenation points.) I am sorry to always contradict you but, in Italian, there always is an hyphenation point between two identical consonant letters.

RE: Unicodes For Devanagari: Magic The Gathering Card

2002-11-06 Thread Marco Cimarosti
Victor Campbell wrote: I'm looking for help with converting the text of a Sanskrit trading card to Unicode. I am not connected with the publisher of the card, just a programmer who helps support a site for collectors. I have set up a test page for experimenting with the Devanagari

RE: Names for UTF-8 with and without BOM - pragmatic

2002-11-06 Thread Marco Cimarosti
Lars Kristan wrote: .txtUTF-8 require We want plain text files to have BOM to distinguish from legacy codepage files H, what does plain mean?! Perhaps files with a BOM should be called text files (or .txt files;) as opposed to plain

RE: In defense of Plane 14 language tags (long)

2002-11-05 Thread Marco Cimarosti
Doug Ewell wrote: [...] Readers are asked to consider the following arguments individually, so that any particular argument that seems untenable or contrary to consensus does not affect the validity of other arguments. [...] Here are my three pence *pro* the deprecation: 1. Language tags

RE: Special characters

2002-11-05 Thread Marco Cimarosti
Johan Marais wrote: Could someone tell me whether it is possible to produce the following characters please? k with a small line underneath K with a small line underneath ?/? (U+1E35/U+1E34, LATIN SMALL/CAPITAL LETTER K WITH LINE BELOW) H with a dot underneath h with a dot underneath ?/?

RE: In defense of Plane 14 language tags (long)

2002-11-05 Thread Marco Cimarosti
John Cowan wrote: Marco Cimarosti scripsit: { As a side note, the idea that a language my use foreign words seems terribly naive to me. It is true that, in Italian, we use loanwords such as hardware, punk, or footing, but it would be silly to consider or tag them as English words

RE: Header Reply-To

2002-11-04 Thread Marco Cimarosti
Stefan Persson wrote: Why doesn't that page follow the ASCII standard and/or any ASCII-based standard? What? As far as I can tell, it's 100% ASCII. It doesn't follow the ASCII standard as far as quotation marks are concerned. Using ` and ' as quotation marks is a long-standing

RE: Character identities

2002-10-30 Thread Marco Cimarosti
Keld Jørn Simonsen wrote: On Tue, Oct 29, 2002 at 09:07:16PM +0100, Marco Cimarosti wrote: Kent Karlsson wrote: Marco, Keld, please allow me to begin with the end of your post: I really have not contributed much to this thread, I think you mean Kent. Oh No! Again! Apologies

RE: Character identities

2002-10-30 Thread Marco Cimarosti
Alain LaBonté wrote: [Alain] However I agree with Kent. Let's say a text identified as German quotes a French word with an U DIAERESIS *in the German text* (a word like capharnaüm). A Fraktur font designed solely for German should not be used for typesetting French words. (And, BTW, that is

RE: RE: Character identities

2002-10-30 Thread Marco Cimarosti
I said: Ah! I never realized that the Sütterlin zig-zag-shaped e was the missing with the ¨ glyph! ^ Sorry: ... the missing LINK with _ Marco

RE: RE: Character identities

2002-10-30 Thread Marco Cimarosti
Doug Ewell wrote: Actually, the Sütterlin umlaut-mark is a small italicized e, which is very similar to an n. What it really ends up looking like, from a distance, is a double acute. Ah! I never realized that the Sütterlin zig-zag-shaped e was the missing with the ¨ glyph! Thanks! After all,

RE: Character identities

2002-10-30 Thread Marco Cimarosti
Kent Karlsson wrote: I insist that you can talk about character-to-character mappings only when the so-called backing store is affected in some way. No, why? It is perfectly permissible to do the equivalent of print(to_upper(mystring)) without changing the backing store (mystring in

RE: Character identities

2002-10-29 Thread Marco Cimarosti
Kent Karlsson wrote: The claim was that dieresis and overscript e are the same in *modern* *standard* German. Or, better stated, that overscript e is just a glyph variant of dieresis, in *modern* *standard* German typeset in Fraktur. Well, we strongly disagree about that then. Marc

RE: Character identities

2002-10-29 Thread Marco Cimarosti
Kent Karlsson wrote: Marco, Keld, please allow me to begin with the end of your post: Marco, please calm down and reread every sentence of my previous message. You seem to have misread quite a few things, but it is better you reread calmly before I try to clear up any remaining

RE: Character identities

2002-10-28 Thread Marco Cimarosti
Kent Karlsson wrote: For this reason it is quite impermissible to render the combining letter small e as a diaeresis So far so good. There would be no reason for doing such a thing. ... or, for that matter, the diaeresis as a combining letter small e (however, you see the latter

RE: Character identities

2002-10-25 Thread Marco Cimarosti
Peter Constable wrote: then *any* font having a unicode cmap is a Unicode font. No, not if the glyps (for the supported characters) are inappropriate for the characters given. Kent is quite right here. There are a *lot* of fonts out there with Unicode cmaps that do not at all conform

RE: need open source tools to convert indic font encoding into ISCII or Unicode

2002-10-25 Thread Marco Cimarosti
Frank Tang wrote: I am looking for open source tool (C / C++ / Perl or Java) to convert between (UTF-8/UTF-16 or ISCII) and differnt Indict font encoding. Please let me know if you know anything available. Language: C, [...] Convert from A to / from B where A mean UTF-8

RE: Character identities

2002-10-25 Thread Marco Cimarosti
Marc Wilhelm Küster wrote: At 14:04 25.10.2002 +0200, Kent Karlsson wrote: Font makers, please do not meddle with the authors intent (as reflected in the text of the document!). Just as it is inappropriate for font makers to use an ø glyph for ö (they are the same, just slightly different

RE: Character identities

2002-10-25 Thread Marco Cimarosti
Kent Karlsson wrote: ... Like it or not, superscript e *is* the same diacritic that later become ¨, so there is absolutely no violation of the Unicode standard. Of course, this only applies German. Font makers, please do not meddle with the authors intent (as reflected in the text of

RE: Character identities

2002-10-24 Thread Marco Cimarosti
Kent Karlsson wrote: And it is easy for Joe User to make a simple (visual...) substitution cipher by just swiching to a font with the glyphs for letters (etc.) permuted. Sure! I think it would be a bad idea to call it a Unicode font though... (That it technically may have a unicode cmap is

RE: Sorting on number of strokes for Traditional Chinese

2002-10-16 Thread Marco Cimarosti
John H. Jenkins wrote: I wonder Unicode provide us a way to do sorting on number of strokes for Traditional Chinese characters. The Unihan database has total stroke count for many (but not all) characters. It may provide an adequate first-order set of data for a pure stroke-based

RE: is this a symbol of anything? CJK?

2002-10-11 Thread Marco Cimarosti
John Delacour wrote: At 3:48 pm -0600 10/10/02, John H. Jenkins wrote: I think it's a variant turtle ideograph. :-) (Nothing bad, so far as I know.) Hmm. Even without looking at the character it sounds very risky to me and is likely to be extremely offensive. Turtles eggs etc.? I

FW: Indic language fonts releasde under GPL by Akruti

2002-10-10 Thread Marco Cimarosti
For everybody's info. The fonts are designed for hack encoding, not for Unicode. But the glyphs look nice, and they are free and GPL-licensed! Hopefully, some good soul would add all the OpenType stuff in them, sooner or later. _ Marco -Original Message- Date: Tue, 08 Oct 2002

RE: Historians- what is origin of i18n, l10n, etc.?

2002-10-10 Thread Marco Cimarosti
Radovan Garabik wrote: Google is your friend :-) i18n is first mentioned in USENET on 30 nov 1989, Cute, I didn't imagine Google archives went all that way back! BTW, the first mention of Unicode on Usenet predates it by eight days: Subject: Re: ASCII for national characters Newsgroups:

RE: ISO 8859-11 (Thai) cross-mapping table

2002-10-08 Thread Marco Cimarosti
Kenneth Whistler wrote: Elliotte Harold asked: The Unicode data files at http://www.unicode.org/Public/MAPPINGS/ISO8859/ do not include a mapping for ISO-8859-11, Thai. Is there any particular reason for this? Just that nobody got around to submitting and posting one. Since

RE: ISO 8859-11 (Thai) cross-mapping table

2002-10-08 Thread Marco Cimarosti
John Aurelio Cowan wrote:) Marco Cimarosti scripsit: Talking about the format of mapping tables, I always wondered why not using ranges. In the case of ISO 8859-11, the table would become as compact as three lines: Well, that wins for 8859-1 and 8859-11 and ISCII-88, where Unicode

RE: [ANN] World Address Project starts and relies on Unicode heavily

2002-10-07 Thread Marco Cimarosti
Dear all, World Address Project promotes an idea of utilizing Unicode on online shopping websites for solving the international shipping address problem. This will greatly benefit both customers and online businesses. Please take a look at http://www.bytecool.com/wap/ and feel free

RE: [ANN] World Address Project starts and relies on Unicode heavily

2002-10-07 Thread Marco Cimarosti
Carl W. Brown wrote: Marco, Things are a bit more complicated. The address should be in the format language of the recipient but the country should be in the language and positioned according to the sending country. Er... Have I denied this? Unicode is not a complete solution. Yao

Omega + upsilon ligature? [2nd attempt]

2002-10-02 Thread Marco Cimarosti
[Sorry for my previous message: I forgot to set the encoding.] I am trying to identify a Greek glyph found in an ancient Latin text. I have not seen what it looks like, but it has been described to me as an 8 with the top circle opened. The sign was in a word looking like 8ρων (8rôn) and which,

Omega + upsilon ligature?

2002-10-02 Thread Marco Cimarosti
I am trying to identify a Greek glyph found in an ancient Latin text. I have not seen what it looks like, but it has been described to me as an 8 with the top circle opened. The sign was in a word looking like 8??? (8rôn) and which, according to the text, corresponds to Latin urina. If I

RE: The Currency Symbol of China

2002-10-01 Thread Marco Cimarosti
Stefan Persson wrote: Similarly, yen is just the Japanese (kun) pronunciation of Chinese yuan. IMHO, the preferred symbol for both currencies should be U+00A5. Wrong: Yen (円) is U+5186, while yuan (元) is U+5143. Yen is an ancient on pronunciation for U+5186; today it's pronounced

RE: Pound and Lira (was: Re: The Currency Symbol of China)

2002-10-01 Thread Marco Cimarosti
Kenneth Whistler wrote: [...] So it is possible that the lira sign simply derives from a draft list that was standardized without anyone ever spending time to debate the pound/lira symbol unification first. [...] If it proves true that the lira sign was an unification fault, why not stating

RE: Keys. (derives from Re: Sequences of combining characters.)

2002-09-30 Thread Marco Cimarosti
Doug Ewell wrote: Marco Cimarosti marco dot cimarosti at essetre dot it wrote: He said that he didn't understand how this detail could help us but, anyway, he obtained the child's name and address from the parent: Daniel Zubeispiel Hauptkirchestrasse, 26 Zürich, Switzerland

My German blunders (was Keys. (derives from Re: Sequences of combining characters.))

2002-09-30 Thread Marco Cimarosti
I (Marco Cimarosti) wrote: Of course. AFAIK, Zu Beispiel means e.g., for example. Hauptkirchestrasse is a made-up road name meaning cathedral street. Zurich is the only real piece of the address. But a native German speaker patiently explained, in a private message: | If it's an example

RE: The Currency Symbol of China

2002-09-30 Thread Marco Cimarosti
John Cowan wrote: My suspicion is that the one-bar-vs.-two is normal glyphic variation, the same as with the $ sign. The same should be true for the £ sign. But unluckily, for some obscure reason, Unicode thinks that currencies called pound should have one bar and be encoded with U+00A3,

RE: Keys. (derives from Re: Sequences of combining characters.)

2002-09-27 Thread Marco Cimarosti
Tex Texin wrote: What's funny to me about this message, is a product message catalog I was responsible for localizing had messages created by software developers, such as (paraphrasing from memory): The client is dead. The client has been killed. You killed the client. Some of the

RE: Keys. (derives from Re: Sequences of combining characters.)

2002-09-25 Thread Marco Cimarosti
William Overington wrote: The recent discussion on sequences has led me to have a look through the various combining characters and I have found the following. U+20E3 COMBINING ENCLOSING KEYCAP It has occurred to me that the use of a sequence of a base character, then one or more

RE: Sequences of combining characters (from Romanization of Cyrillic and Byzantine legal codes)

2002-09-18 Thread Marco Cimarosti
William Overington wrote: Regarding Ken's response to the Byzantine legal codes matter, it would appear possible that the way that the ts ligature with a dot above for romanization of Cyrillic could be represented in Unicode would be by the following sequence. t U+FE20 s U+FE21 U+0307 I

RE: Problems converting from UTF-8 to UCS-2 and vice-versa using JRun 3.1, SQL Server 2000, Windows 2000 and Java 3.1

2002-09-12 Thread Marco Cimarosti
Philippe de Rochambeau wrote: On the other hand, if I store the previous go character plus an unusual CJK ideogram whose Unicode equivalent is \u5439 (E5 90 B9 in UTF-8) in the DB and retrieve the data, JRun 3.1 will only display the first character in my form's textarea, plus a few

RE: Problems converting from UTF-8 to UCS-2 and vice-versa using JRun 3.1, SQL Server 2000, Windows 2000 and Java 3.1

2002-09-12 Thread Marco Cimarosti
I (Marco Cimarosti wrote): [...] doesn't (newQfLibelleArray[i]) have a method to return a String object directly? Perhaps I have been clumsy. By returning a String object directly I meant, can't you so something like this: String tempUtf16 = new String( (newQfLibelleArray[i

RE: Latin vowels?

2002-09-10 Thread Marco Cimarosti
Peter Constable wrote: On 09/09/2002 02:43:52 AM Marco Cimarosti wrote: 1. List Vowels - probably not vowels: U+212B # (Å) ANGSTROM SIGN Given that this is canonically equivalent with a-ring, does it make sense to consider one a vowel but the other not? I stand corrected. Somehow, I

RE: Latin vowels?

2002-09-10 Thread Marco Cimarosti
I wrote: Peter Constable wrote: On 09/09/2002 02:43:52 AM Marco Cimarosti wrote: 1. List Vowels - probably not vowels: U+212B # (Å) ANGSTROM SIGN Given that this is canonically equivalent with a-ring, does it make sense to consider one a vowel but the other not? I stand

RE: Latin vowels?

2002-09-09 Thread Marco Cimarosti
Mark Davis wrote: I need to get a list of Latin characters that are generally considered vowels. I partitioned the characters as in the list below, but there are lots of oddball ones for which I can only guess (LATIN CAPITAL LETTER OU? LATIN LETTER WYNN?...).

RE: [OT] Spanish grammar (was Re: [q] Typesetting rules in Spanish)

2002-09-09 Thread Marco Cimarosti
Martin Kochanski wrote: To expand: su can mean his her their as well as the polite your. In this context, el marino, el hermano de su madre risks being felt as a complete phrase in itself (the sailor, the brother of his mother), so you need de usted to anchor it firmly to the second

RE: Latin vowels?

2002-09-09 Thread Marco Cimarosti
Radovan Garabik wrote: Originally, of course, latin had only capital letters Well... This reminds me of people who say that language XYZ only has one gender. :-) I mean: if there was just one set of letters, how do you say they were capitals or not? Are Arabic letters capitals? Seriously

<    1   2   3   4   5   6   7   8   >