Hi.

Christopher Schultz wrote:

André,

André Warnier wrote:
an existing webapp reads from a socket connected to an external program.
The input stream is created as follows :
fromApp = socket.getInputStream();
The read is as follows :
StringBuffer buf = new StringBuffer(2000);
int ic;
while((ic = fromApp.read()) != 26 && ic != -1) // hex 1A (SUB)
           buf.append((char)ic);

This is wrong, because it assumes that the input stream is always in an
8-bit default platform encoding, which it isn't.

Does it?

The only assumption I see here is that the byte code 0x1a has a special
meaning. Since ASCII is usually the lowest common denominator for
character encodings, is this a bad assumption?

Considering the often devious ways in which character encoding questions can come back to bite one, I am not so sure. By doing a read(), the app currently "consumes" one byte, whether it matches 0x1A or not. If the input stream was UTF-8 for instance, that byte might be the 2d, or 3rd byte of a multi-byte "UTF-8 character" sequence, which might happen to have the integer value 0x1A, although it's meaning would be totally different. (I have not re-checked the UTF-8 encoding to verify if that is a possible value for a 2d or 3rd byte, but I think it is).


How do I do this correctly, assuming that I do know that the incoming
stream is an 8-bit stream (like iso-8859-x), and I do know which 8-bit
encoding is being used (such as iso-8859-1 or iso-8859-2) ?
I cannot change the InputStream into something else, because there are a
zillion other places where this webapp tests on the read byte's value,
numerically.

and there are other places where the "byte" is being tested against other values than 0x1A.


I like Chuck's suggestion to use an InputStreamReader because the
interfaces are (at least accidentally) the same, at least for the method
in question.

Me too. It is the most logical, and the one which I would apply if I were to rewrite this app from scratch. I would also have the other app (the one which sends this stream to the webapp) send some kind of prefix to the stream, indicating the encoding used. (Or at least have both that app and the webapp have some external parameter telling them respectively what to send and what to expect).

I'm not sure how you would modify an entire application to
"fix" this code everywhere, though.

Right. I was trying to find a magic shortcut. At first I was hoping that I could just do some kind of "string replace patch" with Notepad, directly on the compiled classes. Unfortunately, considering these byte tests in several places, I can't.

Thanks again for all the suggestions though.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to