[ 
https://issues.apache.org/jira/browse/SOLR-231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494556
 ] 

Hoss Man commented on SOLR-231:
-------------------------------

Yonik: agreed that the XML parsing should (eventually) use the raw InputStream 
instead of a Reader if no explicit charset is declaured in teh content type ... 
but that's a seperate issue (SOLR-96) specific to XmlUpdateRequestHandler.

Independent of that is the question: "what should an arbitrary request handler 
get if it calls ContentStream.getReader and the ContentStream doesn't know 
explicitly know the charset of the InputStream it has?"

The patch seems clean to me.

> By default, use UTF-8 for posted content streams
> ------------------------------------------------
>
>                 Key: SOLR-231
>                 URL: https://issues.apache.org/jira/browse/SOLR-231
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Ryan McKinley
>             Fix For: 1.2
>
>         Attachments: SOLR-231-ContentType-UTF8.patch, 
> SOLR-231-ContentType-UTF8.patch
>
>
> Solr should assume UTF-8 encoding unless the contentType says otherwise.  To 
> change the contentType and encoding set the header value with contentType 
> ="text/xml; charset=utf-8"
> likewise, with stream.body=xxxx, will default to UTF-8 unless the 
> stream.contentType says otherwise.
>  
> For previous discussion, see:
> http://www.nabble.com/resin-and-UTF-8-in-URLs-tf3152910.html
> http://www.nabble.com/charset-in-POST-from-browser-tf3153057.html
> http://www.nabble.com/Re%3A-svn-commit%3A-r536048----lucene-solr-trunk-src-webapp-src-org-apache-solr-servlet-SolrRequestParsers.java-tf3712816.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to