[ 
https://issues.apache.org/jira/browse/SOLR-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-1091:
-------------------------------

    Attachment: SOLR-1091.patch

Here's a patch that can handle the modified UTF8 that Jetty puts out, as well 
as speeding up the normal UTF8 case using Lucene's UTF8 encoding.

modified UTF8 support is switched on if the jetty.home property is set (jetty 
does this by default).

> "phps" (serialized PHP) writer produces invalid output
> ------------------------------------------------------
>
>                 Key: SOLR-1091
>                 URL: https://issues.apache.org/jira/browse/SOLR-1091
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>    Affects Versions: 1.3
>         Environment: Sun JRE 1.6.0 on Centos 5
>            Reporter: frank farmer
>            Priority: Minor
>             Fix For: 1.4
>
>         Attachments: SOLR-1091.patch
>
>
> The serialized PHP output writer can outputs invalid string lengths for 
> certain (unusual) input values.  Specifically, I had a document containing 
> the following 6 byte character sequence: \xED\xAF\x80\xED\xB1\xB8
> I was able to create a document in the index containing this value without 
> issue; however, when fetching the document back out using the serialized PHP 
> writer, it returns a string like the following:
> s:4:"􀁸";
> Note that the string length specified is 4, while the string is actually 6 
> bytes long.
> When using PHP's native serialize() function, it correctly sets the length to 
> 6:
> # php -r 'var_dump(serialize("\xED\xAF\x80\xED\xB1\xB8"));'
> string(13) "s:6:"􀁸";"
> The "wt=php" writer, which produces output to be parsed with eval(), doesn't 
> have any trouble with this string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to