Re: How to distinguish UTF-8 from Latin-* ?

2000-06-23 Thread Doug Ewell
Kent Karlsson <[EMAIL PROTECTED]> wrote: > A hacker may try to hide characters that trigger the undesired, and > potentially dangerous, interpretation, by using overlong UTF-8 > sequences. If the security scanner program does not "decode" overlong > UTF-8 sequences, but the interpreter accepts t

Re: UTF-8N?

2000-06-23 Thread Doug Ewell
Kenneth Whistler <[EMAIL PROTECTED]> wrote: >> It all stems from the fact that U+FEFF is not only what is used for >> the BOM, but also a valid Unicode/ISO 10646 codepoint. The issue >> would be solved by deprecating the use of U+FEFF as a Unicode >> character (for example by defining a new code

Re: Java, SQL, Unicode and Databases

2000-06-23 Thread Tex Texin
Addison, thanks for this. Good points. I am sure if we bear down on it, there can be many more than 2 problems. JDBC driver differences will be a third. We went thru similar issues programming for double-byte databases a few years back. At least with Unicode, we are doing this for the last time. ;

µ vs. M (was: Case mapping errors?)

2000-06-23 Thread Otto Stolz
Am 2000-06-22 um 10:29 h hat Antoine Leca geschrieben: > On the other hand, about the capitalization of µ, [...] using M > would be definitively wrong (the value will be multiplied by 1000), Actually, it would be multiplied by 1 000 000 000 000, cf.

Re: UTF-8N?

2000-06-23 Thread Peter_Constable
On 06/22/2000 10:54:35 PM <[EMAIL PROTECTED]> wrote: >Now that Unicode plans to deprecate the use of U+FEFF as ZWNBSP, programs that >*expect* UTF-8 instead of SBCS will be able to throw away an initial U+FEFF >with even greater confidence. It may even be possible for operating system >devel

Re: UTF-8N?

2000-06-23 Thread Peter_Constable
Ken: >Yes. The Unicode Standard will deprecate the use of U+FFEF (Note: not U+FFFE) >as a zero-width non-breaking space (despite its formal name). > >And U+FFEF should *only* be used as a byte order mark and/or signature. (That >is already ambiguous and trouble enough -- without tossing in the o

Re: UTF-8N?

2000-06-23 Thread Markus Scherer
would it still be possible to disunify bom/signature (which would have to remain at feff) from zwnbsp? this seems to be the natural solution to this since then a signature character could always be ignored or stripped. markus

Re: Java, SQL, Unicode and Databases

2000-06-23 Thread Joe_Ross
I think that this is also true for DB2 using UTF-8 as the database encoding. >From an application perspective, MS SQL Server is the one that gives us the most trouble, because it doesn't support UTF-8 as a database encoding for char, etc. Joe Kenneth Whistler <[EMAIL PROTECTED]> on 06/22/2000 0

RE: Java, SQL, Unicode and Databases

2000-06-23 Thread Michael Kaplan (Trigeminal Inc.)
Microsoft is very COM-based for its actual data access methods and COM uses BSTRs that are BOM-less UTF-16. Because of that, the actual storage format of any database ends up irrelevant since it will be converted to UTF-16 anyway. Given that this is what the data layers do, performance is cer

Re: Java, SQL, Unicode and Databases

2000-06-23 Thread Tex Texin
Joe, Can you expand on this a bit more? Privately if you prefer. Do you mean version 7 of MS SQL Server? I assume if it doesn't have UTF-8, it uses UTF-16. How does this being the storage encoding, become problematic? tex [EMAIL PROTECTED] wrote: > > I think that this is also true for DB2 us

Unicode in XML and other Markup Languages

2000-06-23 Thread Misha Wolf
The latest version of: Unicode in XML and other Markup Languages is now available. The document is published jointly by the Unicode Consortium and the World Wide Web Consortium (W3C). The latest version, dated 2000-06-23, is now available at: http://www.w3.org/TR/unicode-xml and will

Re: Java, SQL, Unicode and Databases

2000-06-23 Thread Tex Texin
Michael, Thanks for this. Are there any programming adjustements needed that arise when accessing MS SQL Server to store/retrieve UTF-16 using SQL, Java and JDBC? (If the MS JDBC driver goes thru COM to the database, I didn't know that.) Tex "Michael Kaplan (Trigeminal Inc.)" wrote: > > Micr

17th International Unicode Conference - coming up

2000-06-23 Thread lisam
Unicoders, Time to start making your plans...we will have our 17th Unicode Conference in San Jose in September, just after Labor Day. The registration information should be posted real soon...stop by the web site (see below) and check it out - and see you there! Best regards, Lisa

RE: How to distinguish UTF-8 from Latin-* ?

2000-06-23 Thread Robert A. Rosenberg
At 09:41 AM 06/22/2000 -0800, Karlsson Kent - keka wrote: > >"Be liberal with what you accept and conservative with what you create"]). > > >Well, there is a security aspect to this: sometimes given texts >need to be scanned to try to determine if they are "harmless" >or may trigger some undesirab

RE: UTF-8 BOM Nonsense

2000-06-23 Thread Robert A. Rosenberg
At 11:31 AM 06/22/2000 -0800, Michael Kaplan (Trigeminal Inc.) wrote: >I do not believe that this will require it to be added to a standard, and >this is a non-standard usage, but life is about dealing with things as they >are (and this is how they are!). I assume that you also feel that the char

Re: UTF-8N?

2000-06-23 Thread Robert A. Rosenberg
At 10:54 PM 06/22/2000 -0800, Doug Ewell wrote: >Now that Unicode plans to deprecate the use of U+FEFF as ZWNBSP, >programs that *expect* UTF-8 instead of SBCS will be able to throw away >an initial U+FEFF with even greater confidence. It may even be possible >for operating system developers to b

RE: UTF-8 BOM Nonsense

2000-06-23 Thread Michael Kaplan (Trigeminal Inc.)
Yes, I do feel this way, actually. :-) The standard is quite clear in its language, one does not have to be a semanticist to understand that: 1) XML *is* considered to be UTF-8 if there is no BOM and UTF-16 if there is. 2) The encoding tag was added in recongition of the fact that #1 will not be

Re: UTF-8N?

2000-06-23 Thread John Cowan
"Robert A. Rosenberg" wrote: > It would be very UNCool unless the application can tell the operating > system that it wants this done for it. Otherwise it will have no way of > KNOWING that the edited stream that the operating system is passing it IS > UTF-8 (and was so identified by the deleted

Re: UTF-8N?

2000-06-23 Thread Kenneth Whistler
John Cowan wrote: > I think the implication is that the OS provides an interface to read > characters out of a text file, in which case BOM-eating BOMophagy, aka FEFFagy ;-) > (and masking the > difference between various text encodings) is very sensible. Historic > OSes have not had such an

Re: Java, SQL, Unicode and Databases

2000-06-23 Thread Joe_Ross
Yes, version 7. It requires us to use a different data type (nchar) if we want to store multilingual text as UTF-16. We want our applications to be database vendor independent so that customers can use any database under the covers. If all databases supported UTF-8 as an encoding for char, we c

RE: Java, SQL, Unicode and Databases

2000-06-23 Thread Joe_Ross
Michael, are you saying that the data type (char or nchar) doesn't matter? Are you saying that if we just use UTF-16 or wchar_t interfaces to access the data all will be fine and we will be able to store multilingual data even in fields defined as char? Maybe things aren't as bad as I feared. W

RE: Java, SQL, Unicode and Databases

2000-06-23 Thread Michael Kaplan (Trigeminal Inc.)
The datatype *does* matter in that sense you would use UTF-16 data fields (NTEXT and NCHAR and NVARCHAR) and access it with your favorite data access method, which will convert as needed to whatever format IS uses. You will never know oc care what the underlying engine stores. The web site st

RE: UTF-8N?

2000-06-23 Thread Preethi Balaji
resending the message -Original Message- From: Preethi Balaji Sent: Friday, June 23, 2000 2:14 PM To: 'Kenneth Whistler'; Unicode List Cc: [EMAIL PROTECTED] Subject: RE: UTF-8N? Sorry to intrude, I am a new member to the group and was curious to know few terms being used here,