Hi Tom,

Thanks for your quick reply.  The output is being generated by Perl, the 
strange thing being that when I use the ENCODING => 'UTF-8' configuration 
parameter this breaks the output.  If I don’t include it then the £ sign is 
returned in the correct UTF-8 format.

I’ll have a look at the debugging options that you suggest and see if I can 
find anything else out.



Thanks again,
David


From: Tom Molesworth [mailto:[email protected]]
Sent: 21 October 2014 09:00
To: [email protected]; David Hickman
Subject: Re: [Templates] Template Toolkit and UTF-8 template files

Hi,

On 21/10/14 08:44, David Hickman wrote:
I wondered if anyone had any experience of using the Perl Template Toolkit and 
UTF-8 encoded files? I appear to be facing a rather Strange issue with a UTF-8 
encoded template file (although the only UTF-8 encoded characters in the file 
are the £ sign):

• If I don’t tell the template toolkit anything about the encoding of the 
template file then the resulting output from the template is correct and the £ 
signs are output in UTF-8 format

How are you generating output? Is this via tpage/ttree, or Perl? A test case 
should make it much easier to track down any encoding issues.



• If I specifically tell the template toolkit that the file is encoded in UTF-8 
(either using the configuration parameters, a BOM on the template file, or 
both) then the £ signs in the template are converted to character code 163 (the 
ANSI equivalent). This breaks my intended output of the template as the 
character encoding reported to the browser in the response header is UTF-8 even 
though the file now contains characters that are  not compatible with UTF-8.

The £ Unicode character is codepoint 163 - 
http://www.fileformat.info/info/unicode/char/a3/index.htm - so it sounds like 
you might be trying to write Unicode strings directly without going through 
UTF-8 encoding?

For what it's worth, tpage does the right thing:

$ echo '[% "test: " %] £' > test.tt2
$ tpage test.tt2
test:  £
$ tpage test.tt2 | od -t x1z
0000000 74 65 73 74 3a 20 20 c2 a3 0a                    >test:  ...<

That 0xC2 0xA3 is the expected UTF-8 encoding. In Perl, I think you'd need the 
ENCODING parameter if TT2 is reading the template file:

#!/usr/bin/env perl
use strict;
use warnings;
use Template;

# default ->process target is STDOUT,
binmode STDOUT, ':encoding(UTF-8)';

my $tt = Template->new(ENCODING => 'UTF-8');
$tt->process('test.tt2', {}) or die $tt->error;

Maybe try writing the template output to a scalar, and see if it's a valid 
Unicode (not UTF-8) string? Data::Dumper should report \x{a3} as the character 
in this case.

cheers,

Tom
_______________________________________________
templates mailing list
[email protected]
http://mail.template-toolkit.org/mailman/listinfo/templates

Reply via email to