OpenVMS supports a variation of the UCS-2 Unicode encoding on for
filenames on ODS-5 volumes known as VTF-7.
Several programs on OpenVMS like Advanced Server (LANMAN Server) already
will generate and display those filenames.
This can be translated to or from an UTF-8 encoded UNIX file specification.
This issues are:
1. Currently a VMS file specification with a VTF-7 (UCS-2) can not be
translated to a usable UNIX file specification in Perl.
Interestingly enough, a UTF-8 specification might make it into a ODS-5
VMS specification through tovmsspec(), and might actually get
reconstituted back to the original binary with tounixspec(). The UTF-8
flag will have been lost in the translation.
2. The translation routine from UNIX format to VMS format needs to know
that the input string is UTF-8 encoded. This is because it normally
expects the string to be encoded with DEC-MCS which can contain
characters with the MSB set.
3. The translation routine from VMS to UNIX needs to explicitly set that
the output string is UTF-8 encoded when it translates a VTF-7 sequence.
4. The VMS C runtime library can not handle UTF-8 encoded file
specifications. These need to be converted to VTF-7 encoded file
specifications before use.
The implementation that I can see would mean that the conversion
routines behind the macros tovmsspec() and tovmsspec_ts() would need an
additional input parameter indicating if the input specification is in
UTF-8 format.
tovmspsec() is used in ext/Cwd/CWD.C and ext/DynaLoader/dl_vms.c outside
of vms.c.
The routines behind the macros tounixspec() and tounixspec_ts() would
need to be passed a pointer to a parameter initialized to the UTF-8
state of the input string and to reflect the UTF-8 state of the output
string. A UTF-8 input string implies that the specification is already
in UNIX format.
tounixspec_ts() is used in perl.c. tounixspec_ts is used in pp_ctl.c
Perl_rmsexpand() is special, as it could have two input parameters in
UTF-8 format and it could produce a UTF8 output.
Perl_rmsexpand() is used in ext/Cwd/CWD.C.
New macros would be needed for these routines that allow setting the
flag, and the old macros would assume that the input string is not UTF-8
and pass a NULL pointer for the pointer for an output string, as there
may be modules outside of core that are assuming that these macros exist.
My plan would be to first modify the routines to accept the needed
parameters, create the new macros and make sure that everything still
works. Then to start making the routines actually use the parameters.
Eventually any C library routine that uses a filename outside of vms.c
will need to have a wrapper that converts from UTF-8 if needed. vms.c
is probably the place for such wrapper code.
It may take a while to get full UTF-8 filename support, but most of the
work I need to do is also required to get the ASCII format UNIX filename
support fully working on ODS-5 volumes. It's just that with out at
least some partial UTF-8 filename support, I will run in to some VMS
format file specifications that can not be translated to usable UNIX
filenames.
And I am wondering if that instead of "tovmsspec()" and "tounixspec()",
or vmsify() and unixify() it would be better to have generic macros or
Perl routines of "to_hostspec()", "to_hostspec_short()", and
"to_perlspec()".
-John
[EMAIL PROTECTED]
Personal Opinion Only