Nicholas Clark wrote:
On Tue, Apr 11, 2006 at 11:30:41AM -0400, John E. Malmberg wrote:
What I would like to know is if I have figured out this patch fragment
correct for getting the UTF8 attribute passed back and forth.
Specifically, when I am returning a UTF8 encoded string back to Perl, do
I need to run it through sv_utf8_upgrade(), or is there a better method?
No, sv_utf8_upgrade is for converting an SV holding a sequence of bytes that
are ISO-8859-1 characters into an SV holding a (longer) sequence of bytes
that are those same characters encoded in UTF-8.
What I think you need here is
ST(0) = sv_newmortal();
- if (rslt != NULL) sv_usepvn(ST(0),rslt,strlen(rslt));
+ if (rslt != NULL) [
+ sv_usepvn(ST(0),rslt,strlen(rslt));
+ if (fs_utf8) {
+ SvUTF8_on(ST(0));
+ }
+ }
because you need to signal to the internals that the sequence of bytes in
the SV is in UTF-8.
(I'm assuming that the sequence of bytes in rslt was in ISO-8859-1 if fs_utf8
was false, and UTF-8 if fs_utf8 was true. If not, I misunderstood something)
Yes, DEC-MCS, ISO-8859-1 or ISO-LATIN-1 for U.S. users, or one of the 8
bit character sets.
I was not sure if the member of the SV that indicated the length was
supposed to be in characters or bytes. I will be returning it in bytes.
If you're re-using an existing SV (rather than the new one created here by
sv_newmortal()), I'd add an else block with SvUTF8_off(...), as there have
been bugs in the core caused by scalars getting SvUTF8(...) turned on, but
then never turned on, so it "leaks" through on scalar re-use.
I will keep that in mind.
This allows me to complete the interfaces between Perl and the internal
vmsify/unixify type routines to be ready for when I get the UTF8 <==>
VTF7 (7bit encoding of UCS-2) working.
Thanks,
-John
[EMAIL PROTECTED]
Personal Opinion Only