If you are running on Windows, it does not default to UTF-8. It has a java 
property that changes it to UTF-8. Unfortunately, not all libraries get this 
information, and some of the String converters don't have a character-encoding 
argument. I learned this the hard way.

  _____  

From: Ben Shlomo, Yatir [mailto:[EMAIL PROTECTED] 
Sent: Monday, August 20, 2007 8:40 AM
To: solr-user@lucene.apache.org
Subject: problem with quering solr after indexing UTF-8 encoded CSV files



Hi!

 

I have utf-8 encoded data inside a csv file (actually it’s a tab separated file 
- attached)

I can index it with no apparent errors

I did not forget to set this in my tomcat configuration

 

 

<Server ...>
 <Service ...>
   <Connector ... URIEncoding="UTF-8"/>

 

When I query  a document using the UTF-8 text I get zero matches: 

 

  <?xml version="1.0" encoding="UTF-8" ?> 

 
<http://localhost:8080/apache-solr-1.2.1-dev/select/?q=%D7%99%D7%AA%D7%99%D7%A8&version=2.2&start=0&rows=10&indent=on##>
 - <response>

 
<http://localhost:8080/apache-solr-1.2.1-dev/select/?q=%D7%99%D7%AA%D7%99%D7%A8&version=2.2&start=0&rows=10&indent=on##>
 - <lst name="responseHeader">

  <int name="status">0</int> 

  <int name="QTime">0</int> 

 
<http://localhost:8080/apache-solr-1.2.1-dev/select/?q=%D7%99%D7%AA%D7%99%D7%A8&version=2.2&start=0&rows=10&indent=on##>
 - <lst name="params">

  <str name="indent">on</str> 

  <str name="start">0</str> 

<ststr name="q">יתיר</str>         // Note that - I can see the correct UTF-8 
text in it (hebrew characters)

  <str name="rows">10</str> 

  <str name="version">2.2</str> 

  </lst>

  </lst>

  <result name="response" numFound="0" start="0" /> 

  </response>

 

 

When I observe this text in the response by querinig for *:*

I notice that the text does not appear as desired: יתיר instead of יתיר

Do you have any ideas?

Thanks…

 

Here is the response :

 

  <?xml version="1.0" encoding="UTF-8" ?> 

 
<http://localhost:8080/apache-solr-1.2.1-dev/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on##>
 - <response>

 
<http://localhost:8080/apache-solr-1.2.1-dev/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on##>
 - <lst name="responseHeader">

  <int name="status">0</int> 

  <int name="QTime">0</int> 

 
<http://localhost:8080/apache-solr-1.2.1-dev/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on##>
 - <lst name="params">

  <str name="indent">on</str> 

  <str name="start">0</str> 

  <str name="q">*:*</str> 

  <str name="rows">10</str> 

  <str name="version">2.2</str> 

  </lst>

  </lst>

 
<http://localhost:8080/apache-solr-1.2.1-dev/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on##>
 - <result name="response" numFound="1" start="0">

 
<http://localhost:8080/apache-solr-1.2.1-dev/select/?q=*%3A*&version=2.2&start=0&rows=10&indent=on##>
 - <doc>

  <str name="country">1</str> 

  <str name="desc">desc is a very good camera</str> 

  <str name="dispname">display is יתיר ABC res123 </str> 

  <str name="form">1</str> 

  <str name="lang">1</str> 

  <str name="manu">ABC</str> 

  <str name="model"> res123 </str> 

  <str name="pn">C123</str> 

  <str name="productid">123456</str> 

  <str name="upc">72900010123</str> 

  </doc>

  </result>

  </response>

 

 

yatir

Reply via email to