Re: UTF8 help/review of possible implementation on VMS

John E. Malmberg Thu, 13 Apr 2006 19:08:03 -0700

Nicholas Clark wrote:

On Tue, Apr 11, 2006 at 11:30:41AM -0400, John E. Malmberg wrote:

What I would like to know is if I have figured out this patch fragment
correct for getting the UTF8 attribute passed back and forth.
Specifically, when I am returning a UTF8 encoded string back to Perl, doI need to run it through sv_utf8_upgrade(), or is there a better method?


No, sv_utf8_upgrade is for converting an SV holding a sequence of bytes that
are ISO-8859-1 characters into an SV holding a (longer) sequence of bytes
that are those same characters encoded in UTF-8.

What I think you need here is

   ST(0) = sv_newmortal();
-  if (rslt != NULL) sv_usepvn(ST(0),rslt,strlen(rslt));
+  if (rslt != NULL) [
+    sv_usepvn(ST(0),rslt,strlen(rslt));
+    if (fs_utf8) {
+       SvUTF8_on(ST(0));
+    }
+  }

because you need to signal to the internals that the sequence of bytes in
the SV is in UTF-8.

(I'm assuming that the sequence of bytes in rslt was in ISO-8859-1 if fs_utf8
was false, and UTF-8 if fs_utf8 was true. If not, I misunderstood something)

Yes, DEC-MCS, ISO-8859-1 or ISO-LATIN-1 for U.S. users, or one of the 8bit character sets.

I was not sure if the member of the SV that indicated the length wassupposed to be in characters or bytes. I will be returning it in bytes.

If you're re-using an existing SV (rather than the new one created here by
sv_newmortal()), I'd add an else block with SvUTF8_off(...), as there have
been bugs in the core caused by scalars getting SvUTF8(...) turned on, but
then never turned on, so it "leaks" through on scalar re-use.


I will keep that in mind.

This allows me to complete the interfaces between Perl and the internalvmsify/unixify type routines to be ready for when I get the UTF8 <==>VTF7 (7bit encoding of UCS-2) working.


Thanks,
-John
[EMAIL PROTECTED]
Personal Opinion Only

Re: UTF8 help/review of possible implementation on VMS

Reply via email to