On Fri, Sep 11, 2009 at 8:37 PM, Chris Hostetter
wrote:
> Isn't that an argument in favor of having an explicit option to control
> how we do the counting? otherwise we're still at risk of the scenerio i
> discribed (ie: jetty fixes the byte conversion code, but we're still
> counting the bytes "w
: > if the container can't correctly output
: > some characters, i see no reason to hide the bug
:
: Another problem is that it won't reliably break. The bug breaks our
: encapsulation (before the patch) and thus the client reads the wrong
: number of chars for the string, and who knows what hap
you are correct, it was my misunderstanding of the problem - now that
I've read more than I ever wanted to know about UCS-2, UTF-16 and
modified UTF-8, I'm more upto speed.
Thanks for the patience.
On Sep 11, 2009, at 6:32 PM, Yonik Seeley wrote:
On Fri, Sep 11, 2009 at 6:21 PM, Donovan Ji
On Tue, Sep 8, 2009 at 7:46 PM, Chris Hostetter
wrote:
> if the container can't correctly output
> some characters, i see no reason to hide the bug
Another problem is that it won't reliably break. The bug breaks our
encapsulation (before the patch) and thus the client reads the wrong
number of c
On Fri, Sep 11, 2009 at 6:21 PM, Donovan Jimenez
wrote:
> Is it possible (and would it even help) to normalize all strings with
> regards to surrogate pairs at indexing time instead?
Already done, in a way... there's only one way to represent a
character outside the BMP in UTF-16 (which is the i
Is it possible (and would it even help) to normalize all strings
with regards to surrogate pairs at indexing time instead? or will
container still possibly differ in byte for byte output?
- Donovan
On Sep 11, 2009, at 5:34 PM, Chris Hostetter wrote:
: > why don't we just output the raw
: > why don't we just output the raw bytes ourselves?
:
: That would require writing TextResponseWriter and friends as binary
: writers, right? Or did you have a different way in mind for injecting
: bytes into the output stream?
Grr you're right. i got so turned arround thinking about
co
On Fri, Sep 11, 2009 at 5:06 PM, Chris Hostetter
wrote:
> why don't we just output the raw bytes ourselves?
That would require writing TextResponseWriter and friends as binary
writers, right? Or did you have a different way in mind for injecting
bytes into the output stream?
-Yonik
http://www.l
On Fri, Sep 11, 2009 at 5:06 PM, Chris Hostetter
wrote:
> I must be missunderstanding something still ... based on your description,
> it sounds like it shouldn't matter if the encoder knows that it's one
> logical character or not, either way it should wind up outputing the same
> number of byte
: A code point (unicode character) outside of the BMP (basic
: multilingual plane, fits in 16 bits) is represented as two java chars
: - a surrogate pair. It's a single logical character - see
: String.codePointAt(). In correct UTF-8 it should be encoded as a
: single code point... but Jetty is
On Tue, Sep 8, 2009 at 7:46 PM, Chris Hostetter wrote:
> The modifiedUTF8 boolean only influence the numeric length returned as the
> "s" option ... the actaully "val" string is still written "as is" by the
> servlet container.
Yep.
A code point (unicode character) outside of the BMP (basic
multi
: > : + static boolean modifiedUTF8 = System.getProperty("jetty.home") != null;
: >
: > ...that seems really hackish to me, particularly since for all we know
: > there are other servlet containers that might have the same problem.
:
: Yeah, it is.
: But it's not really a valid option, it's a bug
On Thu, Sep 3, 2009 at 8:24 PM, Chris Hostetter wrote:
>
> : +61. SOLR-1091: Jetty's use of CESU-8 for code points outside the BMP
> : + resulted in invalid output from the serialized PHP writer. (yonik)
>
> ...
>
> : + static boolean modifiedUTF8 = System.getProperty("jetty.home") != nu
: +61. SOLR-1091: Jetty's use of CESU-8 for code points outside the BMP
: +resulted in invalid output from the serialized PHP writer. (yonik)
...
: + static boolean modifiedUTF8 = System.getProperty("jetty.home") != null;
...that seems really hackish to me, particularly since for a
14 matches
Mail list logo