RE: MS Windows and Unicode 4.0 ?

2003-12-01 Thread Peter Constable
From: [EMAIL PROTECTED] on behalf of Patrick Andries >I'm interested in knowing whether the following features would soon be found >in Windows : fonts for scripts covered by Unicode 4.0, This is certainly growing, and the next version of Windows ("Longhorn") will have significant improvement in

Re: MS Windows and Unicode 4.0 ?

2003-12-01 Thread Michael \(michka\) Kaplan
I know I'll end up regretting this From: "Philippe Verdy" <[EMAIL PROTECTED]> > That far? So why isn't there correct support of UTF-16 on Windows > 95OSR2, 98, 98SE and ME (notably for their FAT32 filesystem)? I can > understand it for Windows 95 and 95OSR1 as they were designed before, > and

RE: MS Windows and Unicode 4.0 ?

2003-12-01 Thread Philippe Verdy
Kenneth Whistler wrote: > > Oh God... Surrogates were standardized long before they started > > being used in Unicode 3.2 for new codepoint assignments out of > > the BMP... > > Actually, the first supplementary graphic characters were assigned for > Unicode 3.1. Unicode 3.2 added only BMP charact

Re: MS Windows and Unicode 4.0 ?

2003-12-01 Thread Patrick Andries
- Message d'origine - De: "Michael (michka) Kaplan" <[EMAIL PROTECTED]> > To answer the original question, support of Unicode in *any* version of > Windows (or indeed any operating system) is between 1.1 and 4.0, depending > on what feature you are looking at. To answer such a question,

Re: MS Windows and Unicode 4.0 ?

2003-12-01 Thread Mark E. Shoulson
On 12/01/03 11:46, Mark Davis wrote: It is useful to read the standard before asserting something about it. If you don't have a hard-copy of the standard, you can always consult the online version. In this case, see "3.13 Default Case Operations" in http://www.unicode.org/versions/Unicode4.0.0/ch0

Oriya: unusual conjuncts

2003-12-01 Thread Peter Constable
On pages 56 -57 of the TDIL newsletter from April 2002 (http://tdil.mit.gov.in/ori-guru-telu.pdf), there are various conjuncts listed that are unusual in that the shapes that make up the conjunct are quite different from the nominal characters that supposedly underlie the conjunct and from the pron

RE: Complex Combining

2003-12-01 Thread jameskass
. Jonathan Coxhead wrote, > .... Quoting from the page, "... the longest word you can write upside-down in Unicode is `aftereffect?). " In UTF-8: zʎxʍʌnʇsɹbdouɯլʞſ̣ı̣ɥɓɟəpɔqɐ Best regards, James Kass .

RE: Complex Combining

2003-12-01 Thread Jonathan Coxhead
My take on "Cleanicode", the "Atomic Theory of Unicode", can be found at . It is very much a software engineer's view of character coding. The characters START GROUP and POP DIRECTIONAL FORMATTING are used as brackets. Yes, it could involve arbitr

is ISO-2022-CN actually used?

2003-12-01 Thread Markus Scherer
Question: Is the ISO-2022-CN or ISO-2022-CN-EXT charset for Chinese actually used significantly? I am aware that there is a significant user base for ISO-2022-JP and its subvariants (for Japanese). I am aware that there are numerous implementations of ISO-2022-CN converters. However, I would lik

RE: MS Windows and Unicode 4.0 ?

2003-12-01 Thread Kenneth Whistler
Philippe wrote: > Oh God... Surrogates were standardized long before they started > being used in Unicode 3.2 for new codepoint assignments out of > the BMP... Actually, the first supplementary graphic characters were assigned for Unicode 3.1. Unicode 3.2 added only BMP characters. > It was clea

Re: MS Windows and Unicode 4.0 ?

2003-12-01 Thread Frank Yung-Fong Tang
Michael (michka) Kaplan wrote: > To answer the original question, support of Unicode in *any* version of > Windows (or indeed any operating system) is between 1.1 and 4.0, > depending > on what feature you are looking at. To answer such a question, the > specific > feature about which the

RE: MS Windows and Unicode 4.0 ?

2003-12-01 Thread Carl W. Brown
Philippe, Win2000 was released to manufacturing in 1999 and was frozen about 6 months before. If I remember correctly Unicode 3.0 came out after the freeze date. It implemented surrogate support but disabled it in the registry. I think it was a bad decision. With all the last min bug fixes it

RE: MS Windows and Unicode 4.0 ?

2003-12-01 Thread Frank Yung-Fong Tang
Carl W. Brown wrote: > Jill, > > > I know that Unicode does have some > > locale-sensitive case mappings (Turkish > > uppercase I to dotless lowercase > > I for example), I was under the impression > > that "ss" to "Ã" was not one of them. > > You are correct that "SS" and "Ã" are the s

Re: How can I have OTF for MacOS

2003-12-01 Thread John Jenkins
On Dec 1, 2003, at 4:24 PM, Frank Yung-Fong Tang wrote: John What 'cmap' format Apple use in the MacOS X Devanagari and Bangla fonts? The formats are irrelevant; the Mac supports all the 'cmap' subtable formats for all subtables. For rendering complex scripts, however, the font can only be rend

Re: How can I have OTF for MacOS

2003-12-01 Thread Frank Yung-Fong Tang
John Jenkins wrote: > > On Nov 26, 2003, at 7:26 AM, [EMAIL PROTECTED] wrote: > > > > > But what about devnagri or Bangla. > > > > Devanagari and Bangla cannot be supported on Mac OS X through QuickDraw > text rendering. Since Office on the Mac is currently restricted to > QuickDraw t

RE: Oriya: mba / mwa ?

2003-12-01 Thread Michael Everson
At 22:10 + 2003-12-01, [EMAIL PROTECTED] wrote: We should rejoice that these TDIL reports exist and urge the various authors to contribute to discussions on any edge-case issues. Yes. Rather than revising history or revising encoding practices, maybe the TDIL reports could be revised where ap

RE: MS Windows and Unicode 4.0 ?

2003-12-01 Thread Philippe Verdy
Carl W. Brown wrote: > Doug writes: > > You might remember that I chided Microsoft for > > its definition of "Unicode" in > > Windows 2000 Help, where Unicode was described > > as a "16-bit standard" that was "developed between > > 1988 and 1991," implying that the work was > > finished. Even a

Re: creating a test font w/ CJKV Extension B characters.

2003-12-01 Thread Frank Yung-Fong Tang
as my last memory, IE even could render the GB18030, still treat multi byte characters accorss TCP block poorly. For example, if you have a 4 bytes GB18030 across a TCP block (4k? 8k?), it will be trashed. Andrew C. West wrote: > On Mon, 24 Nov 2003 10:12:52 +, [EMAIL PROTECTED] wrote:

no more precomposed characters for 1:1 conversion

2003-12-01 Thread Markus Scherer
I would like to point out one of the new features of ICU 2.8, which is currently available as an alpha release: http://oss.software.ibm.com/icu/download/2.8/ ICU 2.8 has the ability to handle m:n character conversion mappings driven by simple lines in Unicode conversion tables (text files). I s

RE: Oriya: mba / mwa ?

2003-12-01 Thread jameskass
. Michael Everson wrote, > You should implement according to what is on page 238 of the Unicode > Standard, and if there are people in India who think otherwise they > had better argue their case convincingly to the UTC. > > >I don't personally care which character is used. > > I *do*. Someone

Re: How can I have OTF for MacOS

2003-12-01 Thread Deborah Goldsmith
On Nov 24, 2003, at 6:47 PM, John Jenkins wrote: Keyboards are just XML files. If you're on Mac OS X 10.3, you can find samples inside /System/Library/Fonts/Unicode.bundle/Contents/Resources/*.keylayout. Documentation on the XML keyboard file format for Mac OS X can be found in Apple Tech Note

RE: Oriya: mba / mwa ?

2003-12-01 Thread Michael Everson
At 11:52 -0800 2003-12-01, Peter Constable wrote: > Well, Peter, it's right there on the page. What page? Page 18 of Learn Oriya in 30 Days, what I have been quoting from. > KA with Virama + BA = KWA, in Oriya and with Latin transliterations. It's a BA. I swear. And how do you know it's BA an

RE: Oriya: mba / mwa ?

2003-12-01 Thread Peter Constable
> -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On > Well, Peter, it's right there on the page. What page? > KA with Virama + BA = KWA, > in Oriya and with Latin transliterations. It's a BA. I swear. And how do you know it's BA and not a distinct character that

RE: Oriya: mba / mwa ?

2003-12-01 Thread Michael Everson
At 10:24 -0800 2003-12-01, Peter Constable wrote: > Your suggestion that NYA could be involved is less plausible. I didn't actually suggest it was nya; I merely pointed out that the same shape is used for more than /o/. But many WAs have differently shaped O-parts. I think your observation was

RE: MS Windows and Unicode 4.0 ?

2003-12-01 Thread Philippe Verdy
Michael (michka) Kaplan writes: > I would not expect Windows (whose most recent shipping version shipped > before Unicode 4.0 was released) to support 4.0 properties and > such. But at > the same time, if you have fonts and build a keyboard you can support any > number of 4.0-only scripts. Isn't

RE: Oriya: mba / mwa ?

2003-12-01 Thread Peter Constable
> -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf > Your suggestion that NYA could be involved is less plausible. I didn't actually suggest it was nya; I merely pointed out that the same shape is used for more than /o/. > >I still haven't seen clear evi

RE: MS Windows and Unicode 4.0 ?

2003-12-01 Thread jon
> > But there's no "official" Unicode standard that > > I know of (and that isn't saying much) that says that ss and ß have to > > compare as equals. > > http://www.unicode.org/Public/UNIDATA/CaseFolding.txt And case mappings are defined as a normative property in section 4.2 of the standard. -

Re: Exclamation mark comma

2003-12-01 Thread Anto'nio Martins-Tuva'lkin
On 2003.11.27, 01:37, Rick McGowan <[EMAIL PROTECTED]> wrote: > Well, here's an exclomma for you. Looks nice (though .GIF or .PNG would have been a wiser choice). > Theodore H. Smith asked: > >> I've often wanted to type a symbol, that's like an exclamation mark, >> and a comma at the same time.

RE: MS Windows and Unicode 4.0 ?

2003-12-01 Thread Carl W. Brown
Doug, > You might remember that I chided Microsoft for > its definition of "Unicode" in > Windows 2000 Help, where Unicode was described > as a "16-bit standard" that was "developed between > 1988 and 1991," implying that the work was > finished. Even at the time Windows 2000 was being > deve

Re: MS Windows and Unicode 4.0 ?

2003-12-01 Thread Mark Davis
It is useful to read the standard before asserting something about it. If you don't have a hard-copy of the standard, you can always consult the online version. In this case, see "3.13 Default Case Operations" in http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf and "4.2 CaseâNormative" in http

RE: MS Windows and Unicode 4.0 ?

2003-12-01 Thread Carl W. Brown
Mark, > But there's no "official" Unicode standard that > I know of (and that isn't saying much) that says that ss and ß have to > compare as equals. http://www.unicode.org/Public/UNIDATA/CaseFolding.txt Carl

Re: MS Windows and Unicode 4.0 ?

2003-12-01 Thread Doug Ewell
On a somewhat related topic, the other day I got onto a Windows XP system and took a look at the Help item for "Unicode." You might remember that I chided Microsoft for its definition of "Unicode" in Windows 2000 Help, where Unicode was described as a "16-bit standard" that was "developed between

PUA selecting font for browser

2003-12-01 Thread Giles, Suzanne
Hello, perhaps someone can throw some light on this for me, I've read the display problems page on the Unicode site but haven't spotted anything that can help, I've also searched the Microsoft site but again drew a blank. I am using Internet Explorer 5 and trying to display characters that are in

RE: MS Windows and Unicode 4.0 ?

2003-12-01 Thread Carl W. Brown
Jill, > I know that Unicode does have some > locale-sensitive case mappings (Turkish > uppercase I to dotless lowercase > I for example), I was under the impression > that "ss" to "ß" was not one of them. You are correct that "SS" and "ß" are the same in case insensitive compares regardless of lo

Re: MS Windows and Unicode 4.0 ?

2003-12-01 Thread Mark E. Shoulson
On 12/01/03 09:57, Arcane Jill wrote: I believe that "A" is not canonically equivalent to "a", but you still can't have filenames "A" and "a" coexisting in the same Windows folder. This is a consequence of having a case-insensitive filesystem. As to whether or not the case-equivalence of "ss" a

Re: MS Windows and Unicode 4.0 ?

2003-12-01 Thread Stefan Persson
Arcane Jill wrote: As to whether or not the case-equivalence of "ss" and "Ã" should be expressed (a) only in Germany, Don't forget 150 years old Swedish computers! â I don't think it would make a great deal of sense to enforce it only in Germany, however. If you did that, then a directory tree

Re: MS Windows and Unicode 4.0 ?

2003-12-01 Thread jon
> Shouldn't it permit "assa" and "aßa" to co-exist? It isn't like ß is > canonically equivalent to ss (if I read the file aright, it isn't even > compatibility equivalent). It is a case-insensitive system. If it is a case-insensitive system then one should be able to safely treat Uppercase(x)

Re: MS Windows and Unicode 4.0 ?

2003-12-01 Thread Michael \(michka\) Kaplan
You are correct, Mark. I could probably intrigue people with tales of attempts at file systems that change their rules based on locale settings, but mostly it would just cause nightmares for anyone who understood what a bad idea that would be. Suffice to day that Windows will not boot if "I" != "i"

RE: MS Windows and Unicode 4.0 ?

2003-12-01 Thread Arcane Jill
I believe that "A" is not canonically equivalent to "a", but you still can't have filenames "A" and "a" coexisting in the same Windows folder. This is a consequence of having a case-insensitive filesystem. As to whether or not the case-equivalence of "ss" and "ß" should be expressed (a) only in

Re: MS Windows and Unicode 4.0 ?

2003-12-01 Thread Mark E. Shoulson
Shouldn't it permit "assa" and "aßa" to co-exist? It isn't like ß is canonically equivalent to ss (if I read the file aright, it isn't even compatibility equivalent). It's a language-dependent choice to regard them as equivalent. I'd guess that should be the responsibility of the de_DE local

Re: Compression through normalization

2003-12-01 Thread Peter Kirk
On 01/12/2003 04:25, Philippe Verdy wrote: ... And what about a compressor that would identify the source as being Unicode, and would convert it first to NFC, but including composed forms for the compositions normally excluded from NFC? This seems marginal but some languages would have better

RE: Compression through normalization

2003-12-01 Thread jon
Quoting Philippe Verdy <[EMAIL PROTECTED]>: > [EMAIL PROTECTED] wrote: > > Further, a Unicode-aware algorithm would expect a choseong character to > > be followed by a jungseong and a jongseong to follow a jungsong, and > > could essentially perform the same benefits to compression that > > nor

RE: Compression through normalization

2003-12-01 Thread Philippe Verdy
[EMAIL PROTECTED] wrote: > Further, a Unicode-aware algorithm would expect a choseong character to > be followed by a jungseong and a jongseong to follow a jungsong, and > could essentially perform the same benefits to compression that > normalising to NFC perfroms but without making an irrevers

RE: Oriya: mba / mwa ?

2003-12-01 Thread Michael Everson
At 22:12 -0800 2003-11-30, Peter Constable wrote: From: [EMAIL PROTECTED] on behalf of Michael Everson What I haven't seen is clear evidence that the wa-phallaa is considered to be related to nominal BA and not a distinct character falling after LA. WA has been added as a new independent letter, w

Re: Compression through normalization

2003-12-01 Thread jon
Quoting Doug Ewell <[EMAIL PROTECTED]>: > Someone, I forgot who, questioned whether converting Unicode text to NFC > would actually improve its compressibility, and asked if any actual data > was available. I was pretty sure converting to NFC would help compression (at least some of the time), I

RE: numeric properties of Nl characters in the UCD

2003-12-01 Thread Arcane Jill
No probs, Doug. I was actually ill over the weekend, and I think I was probably way too sensitive on Friday when it was coming on. I guess I didn't really notice at the time and blamed everyone else for having a go at me when I should have been blaming a bunch of nasty microbes for making me fe

RE: MS Windows and Unicode 4.0 ?

2003-12-01 Thread Arcane Jill
Indeed. The current Windows OS still stores filenames as strings of sixteen-bit wide words (not codpoints; not characters). It allows filenames "assa" and "aßa" to coexist in the same folder, despite its claim to being case-insensitive, and I have even managed to create filenames containing un

RE: Complex Combining

2003-12-01 Thread Arcane Jill
Of course, one really important point is that Unicode text should remain stateless. It would be foolish indeed if, starting from an arbitrary point in the string, one had to parse backwards and forwards to see if there were any invisible brackets. In the extreme, one would have to scan the ent

Re: help about Convert gb2312 to utf8 in Perl!

2003-12-01 Thread Hu Guoxin
It's really effect well ! Thanks JD! What a nice guy you are! - Original Message - From: John Delacour To: Hu Guoxin ; [EMAIL PROTECTED] Sent: Monday, December 01, 2003 4:44 PM Subject: Re: help about Convert gb2312 to utf8 in Perl! At 3:09 pm +0800 1/12/03, Hu Guoxin wrote: >how to

Re: help about Convert gb2312 to utf8 in Perl!

2003-12-01 Thread John Delacour
At 3:09 pm +0800 1/12/03, Hu Guoxin wrote: how to convert a string variable from gb2312 into utf8 ? WIth perl 5.8.2 I would do this: use Encode ; $string = "text in gb2312"; Encode::from_to($string, "gb2312", "utf8") ; print $string ; JD

help about Convert gb2312 to utf8 in Perl!

2003-12-01 Thread Hu Guoxin
how to convert a string variable from gb2312 into utf8 ?   detail: 1. I want this PerlScript can execute in Solaris. 2. piconv.bat can convert encoding, but it's only for Windows. 3. I have tried some method, such as :    method1: use Encode;     $gb2312="北京";   

RE: Oriya: mba / mwa ?

2003-12-01 Thread Peter Constable
From: [EMAIL PROTECTED] on behalf of Michael Everson >>What I haven't seen is clear evidence that the wa-phallaa is >>considered to be related to nominal BA and not a distinct character >>falling after LA. > >WA has been added as a new independent letter, without a >decomposition to O+BA, although