Hi Tom,

That makes a lot of sense, thanks.  The distinction between the variables 
containing UTF-8 bytes rather than a Unicode string is not one that had clicked 
before but has now!  I'll take a look and see if I can find out/trace what is 
happening.

Thanks again for your help.


Best Regards,
David


From: [email protected] 
[mailto:[email protected]] On Behalf Of Tom Molesworth
Sent: 21 October 2014 09:19
To: [email protected]
Subject: Re: [Templates] Template Toolkit and UTF-8 template files

On 21/10/14 09:07, David Hickman wrote:
Hi Tom,

Thanks for your quick reply.  The output is being generated by Perl, the 
strange thing being that when I use the ENCODING => 'UTF-8' configuration 
parameter this breaks the output.  If I don't include it then the £ sign is 
returned in the correct UTF-8 format.

I'll have a look at the debugging options that you suggest and see if I can 
find anything else out.


In that case it sounds like you might be passing around UTF-8 in the code, 
rather than Unicode strings.

I'd suggest starting from the part of the code that sends the output to the 
browser, work backwards from there - usually there's either an output layer on 
the filehandle, or something like Encode::encode('UTF-8' => $output).

If you don't have either, that means you're not really dealing with characters, 
just bytes. One way to test that: with the version of the code where the £ sign 
is being rendered correctly in the browser, try removing everything else from 
the template apart from the £ sign. Then see what length($output) returns, 
where $output is the result of rendering the template ($tt->process('template', 
{...}, \$output) for example). If length() returns 1, then it's probably a 
Unicode string. If it's >1, you have something else - presumably UTF-8 bytes.

I'd generally recommend using Unicode inside the app rather than UTF-8, 
otherwise even basic things like substr() are likely to result in corrupted 
output. Decode UTF-8 to Unicode string as soon as you read the data, and encode 
Unicode to UTF-8 just before you write to a socket/filehandle.

cheers,

Tom
_______________________________________________
templates mailing list
[email protected]
http://mail.template-toolkit.org/mailman/listinfo/templates

Reply via email to