Hi Danier,

Ensure that UTF-8 is everywhere... SOLR, WebServer, AppServer, HTTP Headers, etc.

And do not use q=Бамбарбиа Киркуду
use this instead (encoded URL):
q=%D0%91%D0%B0%D0%BC%D0%B1%D0%B0%D1%80%D0%B1%D0%B8%D0%B0+%D0%9A%D0%B8%D1%80%D0%BA%D1%83%D0%B4%D1%83

http://www.tokenizer.org is a search engine, SOLR powered... I need to add some large Internet shops to the crawler, from Russia...

Quoting Daniel Alheiros:

Hi

I'm in trouble now about how to issue queries against Solr using in my "q"
parameter content in Russian (it applies to Chinese and Arabic as well).

The problem is I can't send any Russian special character in URL's because
they don't fit in ASCII domain, so I'm doing a POST to accomplish that.

My application gets the request and logs it (and the Russian characters
appear correctly on my logs) and then calls the Solr server and Solr is not
receiving it correctly... I can just see in the Solr log the special
characters as question marks...

Did anyone faced problems like that? My whole system is set to work in UTF-8
(browser, application servers).

Regards,
Daniel


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.




Reply via email to