Hi Laurie,
And thanks for your quick answer! Here are my comments.
I tried that first, changing the default encoding (in struts.xml) to utf-8.
That works fine, in java and in our web application. The problem is our
Sybase database which is configured to ISO-8859-1. And as our JDBC driver
(jconn2) does not convert from utf-8 to iso-8859-1, it will throw an
exception when trying to update or insert the characters it does not
understand.
So therefore I had to convert them myself. I can also add that there is a
special case when it comes to the Euro (€) character. It did not exist when
iso-8859-1 was created, but added as part of iso-8859-15. But our Sybase
database still only understands iso-8859-1, so a conversion needs to take
place. What I did was first convert it from utf-8 to iso-8859-15, then from
iso-8859-15 to iso-8859-1. Here is the code:
byte[] characters = charsBeforeConvert.getBytes("iso-8859-15");
for (int i = 0; i < characters.length; i++) {
if (characters[i] == (byte) 0xa4) {
//0x80 is control character and has no symbol in
iso-8859-1. It
is used for € in windows-1252
characters[i] = (byte) 0x80;
}
}
return new String(characters, "iso-8859-1");
Kind of a hassle, but it works.
It was a good idea to override the setCharacterEncoding method. This would
open the opportunity to move my converting logic from the filter to an
interceptor. But then another problem occurs. If I do the conversion in an
interceptor, I would need to know exactly which parameters that would need
to be converted. We are working with a solution for maintaining CV’s. I
would then have to do something like (pseudocode):
- String firstName = Request.getParamater(“firstName”);
- get CV object from the value stack
- firstName = performConversion(firstName)
- cv.setFirstName(firstName)
- put cv back on the value stack
In some cases this would work fine, but I have so many parameters I need to
retrieve and convert that it would not work as a proper solution. My filter
takes care of all requests parameters without the need of specifying which
parameter it is.
To improve my code, I will move the converting logic to a utility class, so
the filter can stay as thin as possible.
I will post the entire code if you like to take a look at it. Any comments
would be appreciated!
Thanks
import com.google.common.collect.Maps;
import javax.servlet.*;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletRequestWrapper;
import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.util.Map;
/**
* Filter to fix utf-8 to iso-8859-1 conversion
*
* @author Asgaut Mjolne
* @version $Revision: 1.6 $, 05.feb.2008, modified by: $Author: fiasmjol
*/
public class CharsetEncodingFilter implements Filter {
@Override
public void init(FilterConfig filterConfig) throws ServletException {
}
@Override
public void doFilter(ServletRequest servletRequest, ServletResponse
servletResponse, FilterChain filterChain) throws IOException,
ServletException {
HttpServletRequest req = (HttpServletRequest) servletRequest;
if ("utf-8".equalsIgnoreCase(req.getCharacterEncoding())) {
req = new CharsetRequestWrapper(req);
req.getParameter("foo"); //Needed to fill params. Do not remove
}
filterChain.doFilter(req, servletResponse);
}
@Override
public void destroy() {
}
static class CharsetRequestWrapper extends HttpServletRequestWrapper {
private static final byte ISO_8859_15_EURO_CODE_POINT = (byte) 0xa4;
/**
* Not in use in ISO-8859-1
*/
private static final byte CP_1252_EURO_CODE_POINT = (byte) 0x80;
public CharsetRequestWrapper(HttpServletRequest httpServletRequest)
{
super(httpServletRequest);
}
@Override
public String getParameter(String s) {
return super.getParameter(s);
}
Map<String, String[]> iso88591EncodedParams = null;
/**
* Looping through all parameters on the request, checking for
special characters.
* If any found, convert them with the fixCharset method
*/
@Override
public Map<String, String[]> getParameterMap() {
if (iso88591EncodedParams == null) {
iso88591EncodedParams = Maps.newHashMap();
Map<String, String[]> params = super.getParameterMap();
for (String key : params.keySet()) {
String[] values = params.get(key);
for (int j = 0; j < values.length; j++) {
values[j] = fixCharset(values[j]);
}
iso88591EncodedParams.put(key, values);
}
}
return iso88591EncodedParams;
}
/**
* Converting special chars from utf-8 to iso-8859-1
* Add more convertions here when needed
*/
static String fixCharset(String charsBeforeConvert) {
try {
byte[] characters =
charsBeforeConvert.getBytes("iso-8859-15");
for (int i = 0; i < characters.length; i++) {
if (characters[i] == ISO_8859_15_EURO_CODE_POINT) {
characters[i] = CP_1252_EURO_CODE_POINT;
}
}
return new String(characters, "iso-8859-1");
} catch (UnsupportedEncodingException e) {
return charsBeforeConvert;
}
}
@Override
public String[] getParameterValues(String s) {
return super.getParameterValues(s);
}
}
}
Laurie Harper wrote:
>
> Asgaut wrote:
>> I have recently been struggling with a utf-8 to ISO-8859-1 problem with
>> Ajax
>> and Struts2.
>>
>> The problem is basically that our application requires iso-8859-1
>> characters
>> and Ajax is configured to only post utf-8 (ajax is utf-8 either way, can
>> not
>> be changed). So some kind of conversion has to take place at some level.
>>
>> My problem can be divided into two parts:
>> 1. Make Struts2 understand that there is a incoming utf-8 POST, even
>> though
>> struts.xml (which set the struts2 default encoding) is configured to use
>> iso-8859-1
>> 2. Convert the characters from utf-8 to iso-8859-1
>
> 3. Change your default encoding to utf-8, which should have no effect on
> any of your code but will allow greater flexibility in the range of
> characters you can display and read. Is there any reason you must use
> iso-8859-1?
>
>> [...]
>>
>> If you take a look at this piece of code, you can see that it overrides
>> the
>> encoding if it is set as defaultEncoding (from struts.xml). This is OK,
>> the
>> problem is this check:
>> if (encoding != null) {
>> try {
>> request.setCharacterEncoding(encoding);
>> } catch (Exception e) {
>> LOG.error("Error setting character encoding to '" +
>> encoding
>> + "' - ignoring.", e);
>> }
>> }
>>
>> I think the correct thing would be to also do a check if the
>> request.getCharacterEncoding was already set. I should look like this:
>> if (encoding != null && request.getCharacterEncoding() == null ) {
>> try {
>> request.setCharacterEncoding(encoding);
>> } catch (Exception e) {
>> LOG.error("Error setting character encoding to '" +
>> encoding
>> + "' - ignoring.", e);
>> }
>> }
>> With this change utf-8 would be kept as the request character encoding
>> and I
>> could do my conversion in my interceptor.
>> This would solve my problem number 1. Am I correct when I say this is a
>> bug?
>
> I don't know if I'd call that a bug, but it does seem like a reasonable
> enhancement. It would probably require some testing with different
> browsers to make sure getCharacterEncoding() really is returning null in
> the 'normal' cases, but assuming that's true you could open a ticket in
> JIRA and attach a patch.
>
>> The way I went around it was to create a filter which is executed before
>> FilterDispatcher in struts2. In this filter I check if it is a uft-8 post
>> and if it is, I wrap the HttpServletRequest into my own
>> CharsetRequestWrapper. In my wrapper I will override getParameterMap
>> which
>> converts my characters, put them back into the map and return them. I
>> also
>> run a req.getParameter("foo"); after my wrapping to populate the
>> parameters
>> on the request.
>>
>> It works, but it took me a couple of days to work it out.
>>
>> Any comments on this?
>
> It might be simpler for your filter to call
> setCharacterEncoding("utf-8") and use a trivial request wrapper that
> delegates all calls to the wrapped request *except*
> setCharacterEncoding(), making that a no-op. It would make it clearer
> what the filter was acutaly doing with less code :-) Otherwise, seems
> like a reasonable work-around.
>
> L.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
--
View this message in context:
http://www.nabble.com/CharacterEncoding-bug-in-Struts2--tp15408328p15497775.html
Sent from the Struts - User mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]