Re: CharacterEncoding bug in Struts2?

Asgaut Fri, 15 Feb 2008 02:16:46 -0800

Hi Laurie,
And thanks for your quick answer! Here are my comments.

I tried that first, changing the default encoding (in struts.xml) to utf-8.
That works fine, in java and in our web application. The problem is our
Sybase database which is configured to ISO-8859-1. And as our JDBC driver
(jconn2) does not convert from utf-8 to iso-8859-1, it will throw an
exception when trying to update or insert the characters it does not
understand.


So therefore I had to convert them myself. I can also add that there is a
special case when it comes to the Euro (€) character. It did not exist when
iso-8859-1 was created, but added as part of iso-8859-15. But our Sybase
database still only understands iso-8859-1, so a conversion needs to take
place. What I did was first convert it from utf-8 to iso-8859-15, then from
iso-8859-15 to iso-8859-1. Here is the code:

byte[] characters = charsBeforeConvert.getBytes("iso-8859-15");
                for (int i = 0; i < characters.length; i++) {
                    if (characters[i] == (byte) 0xa4) {
                         //0x80 is control character and has no symbol in 
iso-8859-1. It
is used for € in windows-1252
                        characters[i] = (byte) 0x80;
                    }
                }
return new String(characters, "iso-8859-1");

Kind of a hassle, but it works.

It was a good idea to override the setCharacterEncoding method. This would
open the opportunity to move my converting logic from the filter to an
interceptor. But then another problem occurs. If I do the conversion in an
interceptor, I would need to know exactly which parameters that would need
to be converted. We are working with a solution for maintaining CV’s. I
would then have to do something like (pseudocode):
-       String firstName = Request.getParamater(“firstName”);
-       get CV object from the value stack
-       firstName = performConversion(firstName)
-       cv.setFirstName(firstName)
-       put cv back on the value stack

In some cases this would work fine, but I have so many parameters I need to
retrieve and convert that it would not work as a proper solution. My filter
takes care of all requests parameters without the need of specifying which
parameter it is. 

To improve my code, I will move the converting logic to a utility class, so
the filter can stay as thin as possible.

I will post the entire code if you like to take a look at it. Any comments
would be appreciated!

Thanks


import com.google.common.collect.Maps;

import javax.servlet.*;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletRequestWrapper;
import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.util.Map;

/**
 * Filter to fix utf-8 to iso-8859-1 conversion
 *
 * @author Asgaut Mjolne
 * @version $Revision: 1.6 $, 05.feb.2008, modified by: $Author: fiasmjol
 */
public class CharsetEncodingFilter implements Filter {

    @Override
    public void init(FilterConfig filterConfig) throws ServletException {
    }

    @Override
    public void doFilter(ServletRequest servletRequest, ServletResponse
servletResponse, FilterChain filterChain) throws IOException,
ServletException {
        HttpServletRequest req = (HttpServletRequest) servletRequest;

        if ("utf-8".equalsIgnoreCase(req.getCharacterEncoding())) {
            req = new CharsetRequestWrapper(req);
            req.getParameter("foo"); //Needed to fill params. Do not remove
        }

        filterChain.doFilter(req, servletResponse);
    }

    @Override
    public void destroy() {
    }

    static class CharsetRequestWrapper extends HttpServletRequestWrapper {
        private static final byte ISO_8859_15_EURO_CODE_POINT = (byte) 0xa4;
        /**
         * Not in use in ISO-8859-1
         */
        private static final byte CP_1252_EURO_CODE_POINT = (byte) 0x80;

        public CharsetRequestWrapper(HttpServletRequest httpServletRequest)
{
            super(httpServletRequest);
        }

        @Override
        public String getParameter(String s) {
            return super.getParameter(s);
        }

        Map<String, String[]> iso88591EncodedParams = null;


        /**
         * Looping through all parameters on the request, checking for
special characters.
         * If any found, convert them with the fixCharset method
         */
        @Override
        public Map<String, String[]> getParameterMap() {
            if (iso88591EncodedParams == null) {
                iso88591EncodedParams = Maps.newHashMap();
                Map<String, String[]> params = super.getParameterMap();
                for (String key : params.keySet()) {
                    String[] values = params.get(key);
                    for (int j = 0; j < values.length; j++) {
                        values[j] = fixCharset(values[j]);
                    }
                    iso88591EncodedParams.put(key, values);
                }
            }
            return iso88591EncodedParams;
        }

        /**
         * Converting special chars from utf-8 to iso-8859-1
         * Add more convertions here when needed
         */
        static String fixCharset(String charsBeforeConvert) {
            try {
                byte[] characters =
charsBeforeConvert.getBytes("iso-8859-15");

                for (int i = 0; i < characters.length; i++) {
                    if (characters[i] == ISO_8859_15_EURO_CODE_POINT) {
                        characters[i] = CP_1252_EURO_CODE_POINT;
                    }
                }
                return new String(characters, "iso-8859-1");

            } catch (UnsupportedEncodingException e) {
                return charsBeforeConvert;
            }
        }       

        @Override
        public String[] getParameterValues(String s) {
            return super.getParameterValues(s);  
        }
    }
}








Laurie Harper wrote:
> 
> Asgaut wrote:
>> I have recently been struggling with a utf-8 to ISO-8859-1 problem with
>> Ajax
>> and Struts2.
>> 
>> The problem is basically that our application requires iso-8859-1
>> characters
>> and Ajax is configured to only post utf-8 (ajax is utf-8 either way, can
>> not
>> be changed). So some kind of conversion has to take place at some level.
>> 
>> My problem can be divided into two parts:
>> 1. Make Struts2 understand that there is a incoming utf-8 POST, even
>> though
>> struts.xml (which set the struts2 default encoding) is configured to use
>> iso-8859-1
>> 2. Convert the characters from utf-8 to iso-8859-1
> 
> 3. Change your default encoding to utf-8, which should have no effect on 
> any of your code but will allow greater flexibility in the range of 
> characters you can display and read. Is there any reason you must use 
> iso-8859-1?
> 
>> [...]
>> 
>> If you take a look at this piece of code, you can see that it overrides
>> the
>> encoding if it is set as defaultEncoding (from struts.xml). This is OK,
>> the
>> problem is this check:
>> if (encoding != null) {
>>             try {
>>                 request.setCharacterEncoding(encoding);
>>             } catch (Exception e) {
>>                 LOG.error("Error setting character encoding to '" +
>> encoding
>> + "' - ignoring.", e);
>>             }
>>         }
>> 
>> I think the correct thing would be to also do a check if the
>> request.getCharacterEncoding was already set. I should look like this:
>> if (encoding != null && request.getCharacterEncoding() == null ) {
>>             try {
>>                 request.setCharacterEncoding(encoding);
>>             } catch (Exception e) {
>>                 LOG.error("Error setting character encoding to '" +
>> encoding
>> + "' - ignoring.", e);
>>             }
>>         }
>> With this change utf-8 would be kept as the request character encoding
>> and I
>> could do my conversion in my interceptor.
>> This would solve my problem number 1. Am I correct when I say this is a
>> bug?
> 
> I don't know if I'd call that a bug, but it does seem like a reasonable 
> enhancement. It would probably require some testing with different 
> browsers to make sure getCharacterEncoding() really is returning null in 
> the 'normal' cases, but assuming that's true you could open a ticket in 
> JIRA and attach a patch.
> 
>> The way I went around it was to create a filter which is executed before
>> FilterDispatcher in struts2. In this filter I check if it is a uft-8 post
>> and if it is, I wrap the HttpServletRequest into my own
>> CharsetRequestWrapper. In my wrapper I will override getParameterMap
>> which
>> converts my characters, put them back into the map and return them. I
>> also
>> run a req.getParameter("foo"); after my wrapping to populate the
>> parameters
>> on the request.
>> 
>> It works, but it took me a couple of days to work it out.
>> 
>> Any comments on this?
> 
> It might be simpler for your filter to call 
> setCharacterEncoding("utf-8") and use a trivial request wrapper that 
> delegates all calls to the wrapped request *except* 
> setCharacterEncoding(), making that a no-op. It would make it clearer 
> what the filter was acutaly doing with less code :-) Otherwise, seems 
> like a reasonable work-around.
> 
> L.
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/CharacterEncoding-bug-in-Struts2--tp15408328p15497775.html
Sent from the Struts - User mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: CharacterEncoding bug in Struts2?

Reply via email to