On Jun 29, 2011, at 2:37 AM, Graham Bloice wrote:

> For reference, here's the test executable output on Win7, using the SDK 7.0 
> build environment (a cmd.prompt):

Not surprisingly, it doesn't work.

Microsoft introduced Unicode support when they introduced Win32; as they were 
introducing a new API, they could make the versions of the API that support 
Unicode take UCS-2 (later UTF-16) strings as arguments.  They also offered 
"ASCII" versions, which took strings in the local code page as arguments.  This 
also applies to the C library's routines, such as open()/_open().

UN*X systems already had a well-established API when they introduced Unicode 
support, and they had what amounted to code pages (the various ISO 8859/x 
encodings, the EUC encodings, assorted other encodings); instead, they added a 
new "code page", with UTF-8 encoding.

The program was written for UN*X, to test whether, in the user's locale, UTF-8 
strings work.  In Windows, the ASCII API it was using to create a file would 
take your local code page, not UTF-8, as the string encoding, and I suspect 
cmd.exe also expects "ASCII" output from programs - such as when the test 
program was printing Stig's name - to be in the local code page, not UTF-8.

This is why GLib has file functions that do mapping on file names; the page at

        http://developer.gnome.org/glib/stable/glib-File-Utilities.html

says

        There is a group of functions which wrap the common POSIX functions 
dealing with filenames (g_open(), g_rename(), g_mkdir(), g_stat(),g_unlink(), 
g_remove(), g_fopen(), g_freopen()). The point of these wrappers is to make it 
possible to handle file names with any Unicode characters in them on Windows 
without having to use ifdefs and the wide character API in the application code.

        The pathname argument should be in the GLib file name encoding. On 
POSIX this is the actual on-disk encoding which might correspond to the locale 
settings of the process (or the G_FILENAME_ENCODING environment variable), or 
not.

        On Windows the GLib file name encoding is UTF-8. Note that the 
Microsoft C library does not use UTF-8, but has separate APIs for current 
system code page and wide characters (UTF-16). The GLib wrappers call the wide 
character API if present (on modern Windows systems), otherwise convert to/from 
the system code page.

        Another group of functions allows to open and read directories in the 
GLib file name encoding. These are g_dir_open(), 
g_dir_read_name(),g_dir_rewind(), g_dir_close().

This is also why we have our own copies of some of those functions on Windows, 
and wrap them ourselves (so that we don't require GLib 2.6, which introduced 
them, for all platforms).
___________________________________________________________________________
Sent via:    Wireshark-dev mailing list <wireshark-dev@wireshark.org>
Archives:    http://www.wireshark.org/lists/wireshark-dev
Unsubscribe: https://wireshark.org/mailman/options/wireshark-dev
             mailto:wireshark-dev-requ...@wireshark.org?subject=unsubscribe

Reply via email to