SolrJ Unicode problem

Hugh Cayless Fri, 28 May 2010 09:51:17 -0700

Hi, I'm a solr newbie, and I'm hoping someone can point me in the right 
direction.


I'm trying to index a bunch of documents with Greek text in them.  I can 
successfully index documents by generating add xml and using curl to send them 
to my server, but when I use solrj to create and send documents, the encoding 
gets throughly messed up.

Instead of the result (from an add doc posted with curl):

<result name="response" numFound="1" start="0">
  <doc>
    <str name="id">c.etiq.mom;;2077</str>
    <str name="transcription">Της Βησο ς Χρη εις Πανοπολίτης</str>
  </doc>
</result>

I get (from a SolrInputDocument loaded with solrj):

<result name="response" numFound="1" start="0"> 
 <doc> 
  <str name="id">c.etiq.mom;;2077</str> 
  <str name="transcription">??? ???? ? ??? ??? ????�??????</str> 
 </doc> 
</result>

I can confirm that the SolrInputDocument's transcription field contains Greek 
text before I call .add(documents) on the StreamingUpdateSolrServer (i.e., I 
can get Greek back out of it).  So I don't know what to do next.  Any ideas?

Thanks,
Hugh

SolrJ Unicode problem

Reply via email to