On Fri, Sep 11, 2009 at 8:37 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:
Isn't that an argument in favor of having an explicit option to control
how we do the counting? otherwise we're still at risk of the scenerio i
discribed (ie: jetty fixes the byte conversion code, but we're still
: A code point (unicode character) outside of the BMP (basic
: multilingual plane, fits in 16 bits) is represented as two java chars
: - a surrogate pair. It's a single logical character - see
: String.codePointAt(). In correct UTF-8 it should be encoded as a
: single code point... but Jetty is
On Fri, Sep 11, 2009 at 5:06 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:
I must be missunderstanding something still ... based on your description,
it sounds like it shouldn't matter if the encoder knows that it's one
logical character or not, either way it should wind up outputing the
On Fri, Sep 11, 2009 at 5:06 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:
why don't we just output the raw bytes ourselves?
That would require writing TextResponseWriter and friends as binary
writers, right? Or did you have a different way in mind for injecting
bytes into the output
: why don't we just output the raw bytes ourselves?
:
: That would require writing TextResponseWriter and friends as binary
: writers, right? Or did you have a different way in mind for injecting
: bytes into the output stream?
Grr you're right. i got so turned arround thinking about
On Fri, Sep 11, 2009 at 6:21 PM, Donovan Jimenez
djime...@conduit-it.com wrote:
Is it possible (and would it even help) to normalize all strings with
regards to surrogate pairs at indexing time instead?
Already done, in a way... there's only one way to represent a
character outside the BMP in
On Tue, Sep 8, 2009 at 7:46 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:
if the container can't correctly output
some characters, i see no reason to hide the bug
Another problem is that it won't reliably break. The bug breaks our
encapsulation (before the patch) and thus the client
you are correct, it was my misunderstanding of the problem - now that
I've read more than I ever wanted to know about UCS-2, UTF-16 and
modified UTF-8, I'm more upto speed.
Thanks for the patience.
On Sep 11, 2009, at 6:32 PM, Yonik Seeley wrote:
On Fri, Sep 11, 2009 at 6:21 PM, Donovan
: if the container can't correctly output
: some characters, i see no reason to hide the bug
:
: Another problem is that it won't reliably break. The bug breaks our
: encapsulation (before the patch) and thus the client reads the wrong
: number of chars for the string, and who knows what
: : + static boolean modifiedUTF8 = System.getProperty(jetty.home) != null;
:
: ...that seems really hackish to me, particularly since for all we know
: there are other servlet containers that might have the same problem.
:
: Yeah, it is.
: But it's not really a valid option, it's a
On Tue, Sep 8, 2009 at 7:46 PM, Chris Hostetterhossman_luc...@fucit.org wrote:
The modifiedUTF8 boolean only influence the numeric length returned as the
s option ... the actaully val string is still written as is by the
servlet container.
Yep.
A code point (unicode character) outside of the
On Thu, Sep 3, 2009 at 8:24 PM, Chris Hostetterhossman_luc...@fucit.org wrote:
: +61. SOLR-1091: Jetty's use of CESU-8 for code points outside the BMP
: + resulted in invalid output from the serialized PHP writer. (yonik)
...
: + static boolean modifiedUTF8 =
: +61. SOLR-1091: Jetty's use of CESU-8 for code points outside the BMP
: +resulted in invalid output from the serialized PHP writer. (yonik)
...
: + static boolean modifiedUTF8 = System.getProperty(jetty.home) != null;
...that seems really hackish to me, particularly since for
13 matches
Mail list logo