On Tue, Apr 11, 2006 at 11:30:41AM -0400, John E. Malmberg wrote: > What I would like to know is if I have figured out this patch fragment > correct for getting the UTF8 attribute passed back and forth. > > Specifically, when I am returning a UTF8 encoded string back to Perl, do > I need to run it through sv_utf8_upgrade(), or is there a better method?
Sorry, missed this question, which I knew the answer to. > + if (rslt != NULL) [ > + sv_usepvn(ST(0),rslt,strlen(rslt)); > + if (fs_utf8) { > + sv_utf8_upgrade(ST(0)); > + } > + } No, sv_utf8_upgrade is for converting an SV holding a sequence of bytes that are ISO-8859-1 characters into an SV holding a (longer) sequence of bytes that are those same characters encoded in UTF-8. What I think you need here is ST(0) = sv_newmortal(); - if (rslt != NULL) sv_usepvn(ST(0),rslt,strlen(rslt)); + if (rslt != NULL) [ + sv_usepvn(ST(0),rslt,strlen(rslt)); + if (fs_utf8) { + SvUTF8_on(ST(0)); + } + } because you need to signal to the internals that the sequence of bytes in the SV is in UTF-8. (I'm assuming that the sequence of bytes in rslt was in ISO-8859-1 if fs_utf8 was false, and UTF-8 if fs_utf8 was true. If not, I misunderstood something) If you're re-using an existing SV (rather than the new one created here by sv_newmortal()), I'd add an else block with SvUTF8_off(...), as there have been bugs in the core caused by scalars getting SvUTF8(...) turned on, but then never turned on, so it "leaks" through on scalar re-use. Nicholas Clark