Looking for a C library that converts UTF-8 strings from their decomposed to pre-composed form

2004-11-08 Thread Tay, William
Title: Looking for a C library that converts UTF-8 strings from their decomposed to pre-composed form Hi, It seems that accented characters generated in MacOS X are represented in UTF-8 decomposed form, e.g. the character é is represented as 65 cc 81, instead of c3 a9 (the pre-composed form

Problem with accented characters

2004-08-23 Thread Tay, William
Title: Problem with accented characters Hi, Can anyone explain why an accented character is sometimes represented as a base character plus its accent?  For example, the utf-8 representation for é is 65 CC 81, which is the utf-8 representation for e and the accent, instead of C3 A9?  I find

MacOS character sets

2004-07-12 Thread Tay, William
Hi,   I'd like to understand what character encoding an application that runs on MacOS uses.  Just as Windows applications generally use code pages and UNIX applications use ISO-8859-X character set, what about MacOS applications?   Is there any website that shows the encoding of characters

Tools that analyze C/C++ code and report potential internationali zation problems

2003-12-29 Thread Tay, William
Hi, I would like to use free tools that can help me analyze Visual C/C++ code so as to track down potential internationalization problems in the code. Would appreciate your recommendations. Will

UTF8 file transfer and interoperability problem

2002-06-07 Thread Tay, William
Hi, I'd like to know how file transfer works, for filenames encoded in UTF8, using FTP, Netware and SMB protocol. From what I know Win NT/2000 encode filenames in UTF16 LE, right? So what happens when Windows receive the UTF8 filenames via file transfer from a Linux/Unix machine? 1. Using FTP, w

Unicode in email

2002-05-28 Thread Tay, William
Hi, Can an email address contain any Unicode characters? Why and what protocol support make it possible, or not? Thanks. Will

RE: How to print the byte representation of a wchar_t string with non-ASCII ...

2001-11-02 Thread Tay, William
Dear Unicoders & C gurus, Thank you for your comments on my previous posting. They help. Have a question while digesting them on machine, would appreciate your help. At Solaris 2.6 shell prompt execute the program below by doing: > setenv LC_ALL en_US.UTF-8 > a.out fôó #include , , , main

How to print the byte representation of a wchar_t string with non-ASCII chars?

2001-10-31 Thread Tay, William
Hi, For debugging purpose, I'd like to find out how I can print the byte representation of a wchar_t string. Say in C, I have wchar_t wstr[10] = L"fran"; Is there any printf or wchar equivalent function (using appropriate format template) that prints out the string as 66 72 C3 A1 6E in en_US

Character encoding at the prompt

2001-10-24 Thread Tay, William
Hi, Do you have any idea what is the default code page and encoding scheme for MS DOS box in WinNT 4? Is there any command that can give me the info? I am trying to input a string say "fráç" at the prompt, wondering how the characters are encoded. How about at the Unix (Solaris 2.6) prompt, what