Re: Character Encoding

Daniel Dekany Sat, 25 Oct 2003 18:14:09 -0700

Saturday, October 25, 2003, 3:29:52 PM, Ilkka Priha wrote:

[snip]
> why not to apply the XML (and XHTML and WML as well) declaration instead
> of a velocity specific one as most markups are based on XML:
>
> <?xml version="1.0" encoding="UTF-8" ?>
[snip]


It can't be used for Vel. templates, as it would be bad if Vel. does not
output <?xml ...?> as is... (and after all, Vel. templates are not XML).

As of the possibility of automatic charset detection, the problem is
with the non US-ASCII "compatible" charsets, as EBCDIC based charsets or
UTF-16. XML charset detection works only because it knows that
non-UTF-8/UTF-16 file start with '<', so 4C must mean that the file uses
EBCDIC characters, and also FE FF and such must be BOM (nor FE nor FF
nor 00 is '<' in any charsets). But in the case of Vel. templates, the
first character can be anything.

But still, a practical solution would be to read the file as ISO-8859-1,
and if it starts with #encoding=Foo, then use charset Foo to re-decode
the file. Of course, with this method, the special comment can't be
detected if the file uses UTF-16 or some EBCDIC spawn, but in this case
it just uses the default encoding as now, so you have lost nothing
compared to the current situation. At least it works for ISO-8889-X,
UTF-8, cpXXX, Shift_JIS, etc. These are the charsets almost everybody
uses anyway.

p.s. OK, it is possible to create an EBCDIC file that can be badly
interpreted in ISO-8859-1 as "#encoding=...", but, well... there is no
real chance for it.

-- 
Best regards,
 Daniel Dekany



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Character Encoding

Reply via email to