I'm trying to track down a character encoding issue that I've been having, but don't really understand. Hopefully one of you will know what the answer is.

I am using CKEditor to generate some user-specified HTML. CKEditor offers an "insert special character" function that often creates named HTML entities like "¥" but they also have a few like the solid black right arrow that is a UTF8 character rather than an entity spec. I then generate a JSP file that includes that HTML produced by CKEditor.

Initially, because I was using the Java 6 FileWriter without specifying a character encoding and I'd end up with a generated JSP where the HTML entities were fine, but the other special characters appeared as just '?' in the file. I changed to use FileOutputStream/OutputStreamWriter and specified "UTF-8" and the JSP looked good:

<%@ page contentType="text/html; charset=utf-8" session="true" isELIgnored="true" %>
...
<p>These have issues: ► Ŵ but these don&#39;t: &trade; &hArr; &diams; &aacute; &para; &yen;</p>

With the UTF8 encoding on writing the JSP, the right arrow and latin-W appeared in the JSP file instead of two question marks. I thought maybe I had won, but when I look at the .java class file that is generated by Tomcat, I see this instead:

out.write("<p>These have issues: â–º Å´ but these don&#39;t: &trade; &hArr; &diams; &aacute; &para; &yen;</p>\n");

And when I view that in a web browser, I'm back to question marks again. View source in the browser shows:

<p>These have issues: ? ? but these don&#39;t:&trade;  &hArr;  &diams;  &aacute;  &para;  
&yen;</p>

So I figured it was the default character encoding of the JVM causing me some grief. I checked and the default on my Windows PC is Cp1252. But when I change this with the JVM argument -Dfile.encoding=UTF8, I am no better off. The JSP looks okay, but the .java generated looks like above. I did note that I could revert back to writing the JSP using FileWriter and it produced the correct JSP file, but the Tomcat-generated .java file still was wrong.

What might I need to do to ensure that the .java file created from my JSP can both read my JSP correctly encoded and write the .java file correctly encoded so that these special character appear nice. It's not really Tomcat that is the issue since CKEditor is running in Vaadin which is running in Tomcat and it looks fine there, but as soon as I run the generated JSP, the characters get lost and I end up with question marks instead.

Thanks for any ideas,
David

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to