Chris Mannion wrote:
Hi All

I've recently started having a problem with one of the servlets I'm
running on a Tomcat 5.5 system.  The code of the servlet hasn't
changed at all so I'm wondering if there are any Tomcat settings that
could affect this kind of thing or if anyone has come across a similar
problem before.

The servlet in question accepts XML data that is posted to it as a URL
parameter called 'xml'.  The code to retrieve the XML as a String
(which is then used to build a document object) is simply -

String xmlMessage = req.getParameter("xml");

- where req is the HttpServletRequest object.  Until recently this has
worked fine with the XML being received properly formatted -
<?xml version="1.0" encoding="UTF-8"?>
  <records>
    <record>...
etc.

However, recently something has changed and the XML is now being
retrieved from the request object with escape characters in, so the
above has become -
&lt;xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
  &lt;records&gt;
    &lt;record&gt;

Before sending the XML is encoded using the java.net.URLEncoder object
and the UTF-8 character set, but using a java.net.URLDecoder on
receiving it does not get rid of the encoded characters.  I did some
reading about a possible Tomcat 6.0 bug and so tried explicitly
setting the character encoding (req.setCharacterEncoder("UTF-8"))
before retrieving the parameter but that had no effect either and even
if there's something that could explicitly decode the &lt; &gt; etc. I
couldn't use it as the XML data often contains characters like &amp;
which have to remain encoded to keep the XML valid.

As I said, this problem started without the servlet code having
changed at all so is there any Tomcat setting that could be
responsible for this?

Just a couple of indirect comments on the above.

In your post, you seem to indicate that you also control the client which sends the request to Tomcat. If so, and for that kind of data, might it not be better to send the data in the body of a request, instead of in the URL ? That is probably not the bottom reason of the issue you describe above, but it may avoid similar questions of encoding in the future.
(check the HTTP POST method, and enctype=multipart/form-data)
It will also avoid the case where your data gets so long that the request URLs (and thus your data) get cut off at a certain length.

Next, the way you indicate that the data is now received, shows an "html style" encoding, rather than a "URL style" encoding. If the data was now URL-encoded, it would not have (for example) "&quot;" replacing a quotation mark, but it would have some %xy sequence instead (where xy is the iso-8859-1 codepoint of the character, expressed in hexdecimal digits). What I mean is that it is very unlikely that this encoding just happens "automatically" due to some protocol layer at the browser or HTTP server level. There must be something that explicitly encodes your original request data in this way, before it even gets put in a URL.

I guess what I am trying to say, is that maybe you are looking in the wrong place for your problem, by focusing on the receiving Tomcat side first. I believe you should first have a good look at the sending side.



---------------------------------------------------------------------
To start a new topic, e-mail: users@tomcat.apache.org
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to