OpenVMS supports a variation of the UCS-2 Unicode encoding on for filenames on ODS-5 volumes known as VTF-7.

Several programs on OpenVMS like Advanced Server (LANMAN Server) already will generate and display those filenames.

This can be translated to or from an UTF-8 encoded UNIX file specification.

This issues are:

1. Currently a VMS file specification with a VTF-7 (UCS-2) can not be translated to a usable UNIX file specification in Perl.

Interestingly enough, a UTF-8 specification might make it into a ODS-5 VMS specification through tovmsspec(), and might actually get reconstituted back to the original binary with tounixspec(). The UTF-8 flag will have been lost in the translation.

2. The translation routine from UNIX format to VMS format needs to know that the input string is UTF-8 encoded. This is because it normally expects the string to be encoded with DEC-MCS which can contain characters with the MSB set.

3. The translation routine from VMS to UNIX needs to explicitly set that the output string is UTF-8 encoded when it translates a VTF-7 sequence.

4. The VMS C runtime library can not handle UTF-8 encoded file specifications. These need to be converted to VTF-7 encoded file specifications before use.


The implementation that I can see would mean that the conversion routines behind the macros tovmsspec() and tovmsspec_ts() would need an additional input parameter indicating if the input specification is in UTF-8 format.

tovmspsec() is used in ext/Cwd/CWD.C and ext/DynaLoader/dl_vms.c outside of vms.c.

The routines behind the macros tounixspec() and tounixspec_ts() would need to be passed a pointer to a parameter initialized to the UTF-8 state of the input string and to reflect the UTF-8 state of the output string. A UTF-8 input string implies that the specification is already in UNIX format.

tounixspec_ts() is used in perl.c.  tounixspec_ts is used in pp_ctl.c

Perl_rmsexpand() is special, as it could have two input parameters in UTF-8 format and it could produce a UTF8 output.

Perl_rmsexpand() is used in ext/Cwd/CWD.C.

New macros would be needed for these routines that allow setting the flag, and the old macros would assume that the input string is not UTF-8 and pass a NULL pointer for the pointer for an output string, as there may be modules outside of core that are assuming that these macros exist.

My plan would be to first modify the routines to accept the needed parameters, create the new macros and make sure that everything still works. Then to start making the routines actually use the parameters.

Eventually any C library routine that uses a filename outside of vms.c will need to have a wrapper that converts from UTF-8 if needed. vms.c is probably the place for such wrapper code.

It may take a while to get full UTF-8 filename support, but most of the work I need to do is also required to get the ASCII format UNIX filename support fully working on ODS-5 volumes. It's just that with out at least some partial UTF-8 filename support, I will run in to some VMS format file specifications that can not be translated to usable UNIX filenames.

And I am wondering if that instead of "tovmsspec()" and "tounixspec()", or vmsify() and unixify() it would be better to have generic macros or Perl routines of "to_hostspec()", "to_hostspec_short()", and "to_perlspec()".

-John
[EMAIL PROTECTED]
Personal Opinion Only

Reply via email to