RE: ISO 8859-1 question

2000-06-14 Thread Michael Kaplan (Trigeminal Inc.)
See http://www.microsoft.com/globaldev/reference/iso.asp or more specifically http://www.microsoft.com/globaldev/reference/iso/28591.htm http://www.microsoft.com/globaldev/reference/iso/28592.htm http://www.microsoft.com/globaldev/reference/iso/28594.htm http://www.microsoft.com/globaldev/refe

RE: ISO 8859-1 question

2000-06-14 Thread Michael Kaplan (Trigeminal Inc.)
> >See > > > >http://www.microsoft.com/globaldev/reference/iso.asp > > > >or more specifically > > > >http://www.microsoft.com/globaldev/reference/iso/28591.htm > >http://www.microsoft.com/globaldev/reference/iso/28592.htm > >http://www.microsoft.com/globaldev/reference/iso/28594.htm >htt

RE: Linguistic precedence [was: (TC304.2313) AND/OR: antediluvian

2000-06-15 Thread Michael Kaplan (Trigeminal Inc.)
> >> On the cover of my French driver's license, it says ``Driving > >> license'' in 10 languages (all the EU languages at the time it was > >> printed). The titles are ordered alphabetically by the name of the > >> language in the language itself. The Portuguese don't seem to mind. > >> > >>

RE: Linguistic precedence [was: (TC304.2313) AND/OR: antediluvian

2000-06-15 Thread Michael Kaplan (Trigeminal Inc.)
>I admit to nitpicking because in this particular case, the language names, >we may be just lucky so that there are no collation conflicts. I believe this is an accurate statement... .we ARE lucky, so far. >But believing that there is a collation order that works across a

RE: Linguistic precedence [was: (TC304.2313) AND/OR: antediluvian

2000-06-15 Thread Michael Kaplan (Trigeminal Inc.)
> > >(Has somebody written a comprehensive collection of all these collation > > >problems?) > Ok, here is the full list of ones I know about, and the VB code that would demonstrate them, as needed: (Note: All of this is coming from the book I am working on that discussed i18N for Visual Basic,

RE: Collation curiosities (was: RE: Linguistic precedence [was: (

2000-06-15 Thread Michael Kaplan (Trigeminal Inc.)
> I think somebody just mentioned that many Italians like "i" and "j" to be > "equal". > Ah, since I am very "Windows" based I always bow to the built-in sorts in the NLS database, and never recognize other ones until I have a customer clamoring for support of that sort in an application for whic

RE: The mother of all collation schemes

2000-06-15 Thread Michael Kaplan (Trigeminal Inc.)
Well, along with posts made to the list earlier, there is the problem of languages that may have native speakers who are unhappy with your collation scheme. Period. In my experience the fastest way to piss off a user is to refuse them the right to see things sorted as they would prefer. But the g

RE: Linguistic precedence [was: (TC304.2313) AND/OR:

2000-06-16 Thread Michael Kaplan (Trigeminal Inc.)
: Robert A. Rosenberg[SMTP:[EMAIL PROTECTED]] > Sent: Thursday, June 15, 2000 1:27 PM > To: Unicode List > Cc: Unicode List > Subject: RE: Linguistic precedence [was: (TC304.2313) AND/OR: > > At 07:53 AM 06/15/2000 -0800, Michael Kaplan (Trigeminal Inc.) wr

RE: Linguistic precedence

2000-06-16 Thread Michael Kaplan (Trigeminal Inc.)
One of things I like about Windows: its so easy to look at different date formats. See http://www.trigeminal.com/samples/setlocalesample.asp Its a US NT4 server so I could do everything I wanted to like Japan, Korea, TamilNadu, etc. But I tried for a little variety, and stuck a few RTL langs

RE: UTF-8 vs UTF-16 as processing code

2000-06-16 Thread Michael Kaplan (Trigeminal Inc.)
To Windows 2000 (and Windows NT circa SP4 as well), UTF-8 is another multibyte encoding, which you can get to via "code page 65001" and MultiByteToWideChar and get from via WideCharToMultiByte. So the only difference between it and any other code page, be it iso-8859-1 or windows-1252 is that happ

RE: How to distinguish UTF-8 from Latin-* ?

2000-06-18 Thread Michael Kaplan (Trigeminal Inc.)
> if it is xml, then have a look at the xml spec (with the errata list!!). > it is very clearly specified how to figure that all out there. > ... > Actually, the XML spec is quite clear that neither UTF-16 nor UTF-8 require the encoding tag XML is defined by one of the following: 1) Starts w

RE: UTF-8N?

2000-06-20 Thread Michael Kaplan (Trigeminal Inc.)
Danger is a relative term, I think. Windows 2000 Notepad includes one so that it can easily recognize a file you saved as UTF-8 actually being UTF-8 the next time you load it. If you remove it, then obviously Notepad may not be able to recognize the file as UTF-8. You should obviously never dis

RE: Bengali: variants of same conjunct

2000-06-21 Thread Michael Kaplan (Trigeminal Inc.)
Thus far it is something that has been implemented in the fonts, rather than anywhere else for example there are several ligatures in Tamil that will display one way with the Latha font and the other way with Monotype Tamil Arial (the way set out in Unicode 3.0 is done in the latter). Thus s

RE: Bengali: variants of same conjunct

2000-06-22 Thread Michael Kaplan (Trigeminal Inc.)
> > Thus since people who write the language sent both, > > > Do you mean that Tamil writers *purposely* use both the "ancient" and the > "modern" forms in the same document? > What is the intent? > yes, that is what am I saying. If you go to several of the Tamil resource sites on the web, you

RE: UTF-8 BOM Nonsense

2000-06-22 Thread Michael Kaplan (Trigeminal Inc.)
I agree Gary. Windows 2000 Notepad, however, does not agree and writes one. Since Notepad in prior versions of Windows was in fact the defacto standard for HTML editor (), clearly it is a program to be reckoned with. People should be aware of the fact that there are going to MANY files out there

RE: Java, SQL, Unicode and Databases

2000-06-23 Thread Michael Kaplan (Trigeminal Inc.)
Microsoft is very COM-based for its actual data access methods and COM uses BSTRs that are BOM-less UTF-16. Because of that, the actual storage format of any database ends up irrelevant since it will be converted to UTF-16 anyway. Given that this is what the data layers do, performance is cer

RE: UTF-8 BOM Nonsense

2000-06-23 Thread Michael Kaplan (Trigeminal Inc.)
> Sent: Friday, June 23, 2000 11:34 AM > To: Michael Kaplan (Trigeminal Inc.) > Cc: Unicode List > Subject: RE: UTF-8 BOM Nonsense > > At 11:31 AM 06/22/2000 -0800, Michael Kaplan (Trigeminal Inc.) wrote: > >I do not believe that this will require it to be added to a

RE: Bengali: variants of same conjunct

2000-06-22 Thread Michael Kaplan (Trigeminal Inc.)
>But what is the semantic intent, then? >In other words, what may mean the use of "elephant-trunk" ai vs the "normal" one? >What may mean the use of the rounded naa vs the "normal", two parts, one? I do not know enough about Tamil usage to understand THAT part. :-)

RE: Java, SQL, Unicode and Databases

2000-06-23 Thread Michael Kaplan (Trigeminal Inc.)
is case is hiding the differences. Michael > -- > From: [EMAIL PROTECTED][SMTP:[EMAIL PROTECTED]] > Sent: Friday, June 23, 2000 2:27 PM > To: Michael Kaplan (Trigeminal Inc.) > Cc: Unicode List; [EMAIL PROTECTED] > Subject: RE: Java

RE: Arabic Script converting Between different Code Pages

2000-06-27 Thread Michael Kaplan (Trigeminal Inc.)
If you are on a Microsoft platform and have the code page support for the arabic code page, then a simple MultiByteToWideChar call will take care of it. Here are the code page numbers to use: Arabic (ASMO 708): 708 Arabic (DOS): 720 Arabic (ISO): 28596 Arabic (Mac

Not all Arabics are created equal...

2000-06-28 Thread Michael Kaplan (Trigeminal Inc.)
I have heard the same thing, and think it is underscores a point that MANY companies forget: not all dialects of Arabic are the same, despite the fact that most software packages have *one* Arabic version. Issues such as this one can obviously cause major issues since it even affects logical vs.