Re: svn commit: r808988 - in /lucene/solr/trunk: CHANGES.txt src/java/org/apache/solr/request/PHPSerializedResponseWriter.java

2009-09-13 Thread Yonik Seeley
On Fri, Sep 11, 2009 at 8:37 PM, Chris Hostetter wrote: > Isn't that an argument in favor of having an explicit option to control > how we do the counting? otherwise we're still at risk of the scenerio i > discribed (ie: jetty fixes the byte conversion code, but we're still > counting the bytes "w

Re: svn commit: r808988 - in /lucene/solr/trunk: CHANGES.txt src/java/org/apache/solr/request/PHPSerializedResponseWriter.java

2009-09-11 Thread Chris Hostetter
: > if the container can't correctly output : > some characters, i see no reason to hide the bug : : Another problem is that it won't reliably break. The bug breaks our : encapsulation (before the patch) and thus the client reads the wrong : number of chars for the string, and who knows what hap

Re: svn commit: r808988 - in /lucene/solr/trunk: CHANGES.txt src/java/org/apache/solr/request/PHPSerializedResponseWriter.java

2009-09-11 Thread Donovan Jimenez
you are correct, it was my misunderstanding of the problem - now that I've read more than I ever wanted to know about UCS-2, UTF-16 and modified UTF-8, I'm more upto speed. Thanks for the patience. On Sep 11, 2009, at 6:32 PM, Yonik Seeley wrote: On Fri, Sep 11, 2009 at 6:21 PM, Donovan Ji

Re: svn commit: r808988 - in /lucene/solr/trunk: CHANGES.txt src/java/org/apache/solr/request/PHPSerializedResponseWriter.java

2009-09-11 Thread Yonik Seeley
On Tue, Sep 8, 2009 at 7:46 PM, Chris Hostetter wrote: > if the container can't correctly output > some characters, i see no reason to hide the bug Another problem is that it won't reliably break. The bug breaks our encapsulation (before the patch) and thus the client reads the wrong number of c

Re: svn commit: r808988 - in /lucene/solr/trunk: CHANGES.txt src/java/org/apache/solr/request/PHPSerializedResponseWriter.java

2009-09-11 Thread Yonik Seeley
On Fri, Sep 11, 2009 at 6:21 PM, Donovan Jimenez wrote: > Is it possible (and would it even help)  to normalize all strings with > regards to surrogate pairs at indexing time instead? Already done, in a way... there's only one way to represent a character outside the BMP in UTF-16 (which is the i

Re: svn commit: r808988 - in /lucene/solr/trunk: CHANGES.txt src/java/org/apache/solr/request/PHPSerializedResponseWriter.java

2009-09-11 Thread Donovan Jimenez
Is it possible (and would it even help) to normalize all strings with regards to surrogate pairs at indexing time instead? or will container still possibly differ in byte for byte output? - Donovan On Sep 11, 2009, at 5:34 PM, Chris Hostetter wrote: : > why don't we just output the raw

Re: svn commit: r808988 - in /lucene/solr/trunk: CHANGES.txt src/java/org/apache/solr/request/PHPSerializedResponseWriter.java

2009-09-11 Thread Chris Hostetter
: > why don't we just output the raw bytes ourselves? : : That would require writing TextResponseWriter and friends as binary : writers, right? Or did you have a different way in mind for injecting : bytes into the output stream? Grr you're right. i got so turned arround thinking about co

Re: svn commit: r808988 - in /lucene/solr/trunk: CHANGES.txt src/java/org/apache/solr/request/PHPSerializedResponseWriter.java

2009-09-11 Thread Yonik Seeley
On Fri, Sep 11, 2009 at 5:06 PM, Chris Hostetter wrote: > why don't we just output the raw bytes ourselves? That would require writing TextResponseWriter and friends as binary writers, right? Or did you have a different way in mind for injecting bytes into the output stream? -Yonik http://www.l

Re: svn commit: r808988 - in /lucene/solr/trunk: CHANGES.txt src/java/org/apache/solr/request/PHPSerializedResponseWriter.java

2009-09-11 Thread Robert Muir
On Fri, Sep 11, 2009 at 5:06 PM, Chris Hostetter wrote: > I must be missunderstanding something still ... based on your description, > it sounds like it shouldn't matter if the encoder knows that it's one > logical character or not, either way it should wind up outputing the same > number of byte

Re: svn commit: r808988 - in /lucene/solr/trunk: CHANGES.txt src/java/org/apache/solr/request/PHPSerializedResponseWriter.java

2009-09-11 Thread Chris Hostetter
: A code point (unicode character) outside of the BMP (basic : multilingual plane, fits in 16 bits) is represented as two java chars : - a surrogate pair. It's a single logical character - see : String.codePointAt(). In correct UTF-8 it should be encoded as a : single code point... but Jetty is

Re: svn commit: r808988 - in /lucene/solr/trunk: CHANGES.txt src/java/org/apache/solr/request/PHPSerializedResponseWriter.java

2009-09-08 Thread Yonik Seeley
On Tue, Sep 8, 2009 at 7:46 PM, Chris Hostetter wrote: > The modifiedUTF8 boolean only influence the numeric length returned as the > "s" option ... the actaully "val" string is still written "as is" by the > servlet container. Yep. A code point (unicode character) outside of the BMP (basic multi

Re: svn commit: r808988 - in /lucene/solr/trunk: CHANGES.txt src/java/org/apache/solr/request/PHPSerializedResponseWriter.java

2009-09-08 Thread Chris Hostetter
: > : +  static boolean modifiedUTF8 = System.getProperty("jetty.home") != null; : > : > ...that seems really hackish to me, particularly since for all we know : > there are other servlet containers that might have the same problem. : : Yeah, it is. : But it's not really a valid option, it's a bug

Re: svn commit: r808988 - in /lucene/solr/trunk: CHANGES.txt src/java/org/apache/solr/request/PHPSerializedResponseWriter.java

2009-09-05 Thread Yonik Seeley
On Thu, Sep 3, 2009 at 8:24 PM, Chris Hostetter wrote: > > : +61. SOLR-1091: Jetty's use of CESU-8 for code points outside the BMP > : +    resulted in invalid output from the serialized PHP writer. (yonik) > >        ... > > : +  static boolean modifiedUTF8 = System.getProperty("jetty.home") != nu

Re: svn commit: r808988 - in /lucene/solr/trunk: CHANGES.txt src/java/org/apache/solr/request/PHPSerializedResponseWriter.java

2009-09-03 Thread Chris Hostetter
: +61. SOLR-1091: Jetty's use of CESU-8 for code points outside the BMP : +resulted in invalid output from the serialized PHP writer. (yonik) ... : + static boolean modifiedUTF8 = System.getProperty("jetty.home") != null; ...that seems really hackish to me, particularly since for a