> > 1.
> > JSP pages must inlcude the header:
> > 
> > <%@ page
> >  contentType="text/html; charset=UTF-8"
> > %>
> 
> This is if you use JSP.  If you work with servlets, then you should output the
> appropriate headers.

Actually, it also sets encoding of the output stream. Or at least it used to in some 
versions of Tomcat. The full declaration would look like this:

<%@ page
  contentType="text/html; charset=UTF-8"
  pageEncoding="UTF-8"
  import="..."
  info="..."
  ...
%> 

> > 2.
> > In the Catalina.bat (windows) catalina.sh (windows) apache$jakarta_config.com
> > (OpenVMS), file there must be a switch added to the call to java.exe.  The
> > switch is:
> > 
> > -Dfile.encoding=UTF-8
> > 
> > I cannot find documentation for this environment variable anywhere or what it
> > actually does but it is essential.
> 
> It's not Tomcat-specific, tt should be probably somewhere in Java specifications.

Java/Tomcat should be independant of local settings and encodings. Each JSP carries 
sufficient information on it's static (pageEncoding) and output encoding 
(contentType). Servlets have to specify this explicitely (set "Content-type:" header 
in the ServletResponse and encoding of the output stream).

Resource files are a different story, again servlets have to set their encoding 
manually. Again, it should not be a global setting of the JVM.

> > 3.
> > For translation of inputs coming back from the browser there must be a method
> > that translates from the browser's ISO-8859-1 to UTF-8.  It seems to me that
> > -1 is used in all regions as I have had people in countries such as Greece &
> > Bulgaria test this and they always send input back in -1 encoding.  The
> > method which you will use constantly should go something like this:
> 
> I wonder why you need this.  I have no need to convert anything into UTF-8 by
> hand - Tomcat does it for me (and I work not only with European languages).  My
> code includes the following line:
> 
> req.setCharacterEncoding("UTF-8");
> 
> and everything works OK with IE and Mozilla.

Yup. One additional word of warning - browsers should be able to support multiple 
client encodings (Windows - switching from one keyboard to another). And they should 
be able to tell server which encoding was used for the data - HTTP/HTML have support 
for this. The problem is most browsers ignore this, so you'll have to assume that the 
data was encoded using some fixed encoding. The problem is present in my country - we 
use both cyrilic and latin alphabets. If the page is designated to be windows-1250 
encoded (latin) and user enters data using cyrilic keyboard with windows-1251, server 
will have no way of knowing this. Servlet author is forced to assume that the encoding 
is CP1250 and it will be wrong.

Nix.

Reply via email to