Thanks for your help.
1.
if I will use
res.setContentType( "text/html; charset=UTF-8");
PrintWriter out = new PrintWriter( new
utputStreamWriter( res.getOutputStream(), "UTF-8"), true);
it will convert the data for output, but I need to have them converted
before it to write utf-8 to database.
2.
I tried to use
String mypar= new String("ě"); // one czech letter not in
iso-8859-1 but in iso-8859-2 , req.getParameter() returns the same sring
byte[] utf8Bytes = mypar.getBytes("UTF-8");
String roundTrip = new String(utf8Bytes, "UTF-8");
now, utf8Bytes.length returns 2 what it OK because this char is non ascii1
and in utf-8 use 2 bytes ,but
roundTrip.length() returns 1 - the string the same as mypar !!!
I need to write to database strings in UTF-8, so in that example it should
write to database 2 character for that one string "ě"
I know that if i want to write to file i an use writeUTF() function, and in
file it will be OK, but how to write it to database ?
= it needs to be converted before outputstream.
btw. : doesn't somebody have experience with UTF-8 and databases and
servlets ?
Thanks a lot
Tomas Zeman
-----------------------------------------------------------
OLD messages :
Date: Thu, 26 Apr 2001 08:42:45 -0700
From: Alex Amies <[EMAIL PROTECTED]>
Subject: Re: character encoding problem
There is a bug in most browsers where, even if you set the content type
in the form to
utf-8 using:
<form name="menu" action="" method="POST" accept-charset="UTF-8">
The browser still does not set the content-type properly when it sends
data back to the
server. You will probably find that if you call
ServletRequest.getContentType()
it will return iso8851-1 or Cp1250. However, if the browser has it's
encoding set to utf-8 and you
are using a utf-8 input method, such as by selecting utf-8 on a CJK
input tool, the browser will actually
send back utf-8 characters. I noticed that you tried new String(byte[],
"UTF-8") and
said that it doesn't work. You might want to alter this as in the
following, first examining the request
encoding and then getting the bytes:
private static String transformEncoding(HttpServletRequest request,
String raw) {
String encoding = request.getCharacterEncoding();
String transformed = null;
if (raw != null) {
try {
byte[] bytes = raw.getBytes(encoding);
transformed = new String(bytes, UTF8);
} catch(UnsupportedEncodingException e) {
. . .
}
}
return transformed;
}
I have had luck with this in the past but I have to admit that I am not
out of the forrest with my utf-8
problems.
Alex Amies
-----Original Message-----
From: Mark Galbreath [mailto:[EMAIL PROTECTED]]
Sent: Thursday, April 26, 2001 6:46 AM
To: [EMAIL PROTECTED]
Subject: Re: character encoding problem
Setting the request object's charset is supported in API 2.2 (and
earlier)
if you import the com.oreilly.servlet.ParameterParser class.
Also, are you sure you are using "UTF-8" and not "UTF8?" You last
sentence
makes this questionable. I know the alias for UTF-8 in Java 1.1.5 and
earlier was "UTF8" and most browsers choked on the malformed content
type.
The work-around for this now is:
res.setContentType( "text/html; charset=UTF-8");
PrintWriter out = new PrintWriter(
new OutputStreamWriter( res.getOutputStream(), "UTF8"), true);
Finally, do you need to set the locale?
Locale locale = new Locale( "en", "US");
for English.
Cheers!
Mark
----- Original Message -----
From: "Tomas Zeman" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Thursday, April 26, 2001 8:54 AM
Subject: character encoding problem
> Hi all,
>
> I am still having trouble with character encoding in servlets.
>
> I want to convert all data from getParameter("form_param") to UTF-8
>
> I have this servlets
> -------------------------------------------------
> import java.io.*;
> import java.util.*;
> import javax.servlet.http.*;
> import javax.servlet.*;
>
> public class HelloServlet extends HttpServlet {
> public void doGet (HttpServletRequest req,HttpServletResponse res)
throws
> ServletException, IOException
> {
>
> // this line desn't work it needs servlet 2.3!
> file://req.setCharacterEncoding("UTF-8");
>
> res.setContentType("text/html;charset=UTF-8;");
> PrintWriter pw = res.getWriter();
>
> String par = req.getParameter("text");
>
> // What to write here to convert the par String ?? (from
iso-8859-2
and
> Cp1250)
> // TODO
> String convertedPar = new String(par.getBytes(),"UTF-8"); // but
it
> doesn't work
>
> pw.println("<head><meta http-equiv='Content-Type'
> content='text/html;charset=UTF-8;'></head>");
>
> pw.println("Hi");
> pw.println("<form method=\"POST\"><textarea cols='50' rows='8'
> name='text'></textarea><br><input type='Submit'></form>");
> pw.println("<hr> Parameter : " + par);
> pw.println("<hr> ConvertedParameter : " + convertedPar);
>
> pw.close();
>
> /*
> // this code will write my parameter to the file in good encoding, but
I
> need to have par string converted
> // to UTF-8 before that to display it on the page
>
> try {
> FileOutputStream fos = new FileOutputStream("/tmp/1.1");
> Writer out = new OutputStreamWriter(fos , "UTF-8");
> out.write(par);
> out.flush();
> out.close();
> } catch (IOException e) {
> e.printStackTrace();
> }
> */
>
> }
>
> public void doPost (HttpServletRequest req,HttpServletResponse res)
throws
> ServletException, IOException
> {
> doGet(req,res);
> }
> }
>
> ------------------------------------------------
>
> Could anybody help me, what code add to this servlet to convert all
> characters properly ?
> (I am looking for toUTF8(String s) function)
>
> Thanks a lot
>
> Tomas Zeman
> email: [EMAIL PROTECTED]
>
>
___________________________________________________________________________
To unsubscribe, send email to [EMAIL PROTECTED] and include in the body
of the message "signoff SERVLET-INTEREST".
Archives: http://archives.java.sun.com/archives/servlet-interest.html
Resources: http://java.sun.com/products/servlet/external-resources.html
LISTSERV Help: http://www.lsoft.com/manuals/user/user.html