Re: svn commit: r808988 - in /lucene/solr/trunk: CHANGES.txt src/java/org/apache/solr/request/PHPSerializedResponseWriter.java

2009-09-13 Thread Yonik Seeley
On Fri, Sep 11, 2009 at 8:37 PM, Chris Hostetter hossman_luc...@fucit.org wrote: Isn't that an argument in favor of having an explicit option to control how we do the counting? otherwise we're still at risk of the scenerio i discribed (ie: jetty fixes the byte conversion code, but we're still

Re: svn commit: r808988 - in /lucene/solr/trunk: CHANGES.txt src/java/org/apache/solr/request/PHPSerializedResponseWriter.java

2009-09-11 Thread Chris Hostetter
: A code point (unicode character) outside of the BMP (basic : multilingual plane, fits in 16 bits) is represented as two java chars : - a surrogate pair. It's a single logical character - see : String.codePointAt(). In correct UTF-8 it should be encoded as a : single code point... but Jetty is

Re: svn commit: r808988 - in /lucene/solr/trunk: CHANGES.txt src/java/org/apache/solr/request/PHPSerializedResponseWriter.java

2009-09-11 Thread Robert Muir
On Fri, Sep 11, 2009 at 5:06 PM, Chris Hostetter hossman_luc...@fucit.org wrote: I must be missunderstanding something still ... based on your description, it sounds like it shouldn't matter if the encoder knows that it's one logical character or not, either way it should wind up outputing the

Re: svn commit: r808988 - in /lucene/solr/trunk: CHANGES.txt src/java/org/apache/solr/request/PHPSerializedResponseWriter.java

2009-09-11 Thread Yonik Seeley
On Fri, Sep 11, 2009 at 5:06 PM, Chris Hostetter hossman_luc...@fucit.org wrote: why don't we just output the raw bytes ourselves? That would require writing TextResponseWriter and friends as binary writers, right? Or did you have a different way in mind for injecting bytes into the output

Re: svn commit: r808988 - in /lucene/solr/trunk: CHANGES.txt src/java/org/apache/solr/request/PHPSerializedResponseWriter.java

2009-09-11 Thread Chris Hostetter
: why don't we just output the raw bytes ourselves? : : That would require writing TextResponseWriter and friends as binary : writers, right? Or did you have a different way in mind for injecting : bytes into the output stream? Grr you're right. i got so turned arround thinking about

Re: svn commit: r808988 - in /lucene/solr/trunk: CHANGES.txt src/java/org/apache/solr/request/PHPSerializedResponseWriter.java

2009-09-11 Thread Yonik Seeley
On Fri, Sep 11, 2009 at 6:21 PM, Donovan Jimenez djime...@conduit-it.com wrote: Is it possible (and would it even help)  to normalize all strings with regards to surrogate pairs at indexing time instead? Already done, in a way... there's only one way to represent a character outside the BMP in

Re: svn commit: r808988 - in /lucene/solr/trunk: CHANGES.txt src/java/org/apache/solr/request/PHPSerializedResponseWriter.java

2009-09-11 Thread Yonik Seeley
On Tue, Sep 8, 2009 at 7:46 PM, Chris Hostetter hossman_luc...@fucit.org wrote: if the container can't correctly output some characters, i see no reason to hide the bug Another problem is that it won't reliably break. The bug breaks our encapsulation (before the patch) and thus the client

Re: svn commit: r808988 - in /lucene/solr/trunk: CHANGES.txt src/java/org/apache/solr/request/PHPSerializedResponseWriter.java

2009-09-11 Thread Donovan Jimenez
you are correct, it was my misunderstanding of the problem - now that I've read more than I ever wanted to know about UCS-2, UTF-16 and modified UTF-8, I'm more upto speed. Thanks for the patience. On Sep 11, 2009, at 6:32 PM, Yonik Seeley wrote: On Fri, Sep 11, 2009 at 6:21 PM, Donovan

Re: svn commit: r808988 - in /lucene/solr/trunk: CHANGES.txt src/java/org/apache/solr/request/PHPSerializedResponseWriter.java

2009-09-11 Thread Chris Hostetter
: if the container can't correctly output : some characters, i see no reason to hide the bug : : Another problem is that it won't reliably break. The bug breaks our : encapsulation (before the patch) and thus the client reads the wrong : number of chars for the string, and who knows what

Re: svn commit: r808988 - in /lucene/solr/trunk: CHANGES.txt src/java/org/apache/solr/request/PHPSerializedResponseWriter.java

2009-09-08 Thread Chris Hostetter
: : +  static boolean modifiedUTF8 = System.getProperty(jetty.home) != null; : : ...that seems really hackish to me, particularly since for all we know : there are other servlet containers that might have the same problem. : : Yeah, it is. : But it's not really a valid option, it's a

Re: svn commit: r808988 - in /lucene/solr/trunk: CHANGES.txt src/java/org/apache/solr/request/PHPSerializedResponseWriter.java

2009-09-08 Thread Yonik Seeley
On Tue, Sep 8, 2009 at 7:46 PM, Chris Hostetterhossman_luc...@fucit.org wrote: The modifiedUTF8 boolean only influence the numeric length returned as the s option ... the actaully val string is still written as is by the servlet container. Yep. A code point (unicode character) outside of the

Re: svn commit: r808988 - in /lucene/solr/trunk: CHANGES.txt src/java/org/apache/solr/request/PHPSerializedResponseWriter.java

2009-09-05 Thread Yonik Seeley
On Thu, Sep 3, 2009 at 8:24 PM, Chris Hostetterhossman_luc...@fucit.org wrote: : +61. SOLR-1091: Jetty's use of CESU-8 for code points outside the BMP : +    resulted in invalid output from the serialized PHP writer. (yonik)        ... : +  static boolean modifiedUTF8 =

Re: svn commit: r808988 - in /lucene/solr/trunk: CHANGES.txt src/java/org/apache/solr/request/PHPSerializedResponseWriter.java

2009-09-03 Thread Chris Hostetter
: +61. SOLR-1091: Jetty's use of CESU-8 for code points outside the BMP : +resulted in invalid output from the serialized PHP writer. (yonik) ... : + static boolean modifiedUTF8 = System.getProperty(jetty.home) != null; ...that seems really hackish to me, particularly since for