I know Russian better than Russians ;)
I currently use default configuration for "dismax" provided by SOLR
1.1; I can add few URLs tonight to the crawler to see what happens. As
I know, Lucene/Nutch can even define web page (pdf, txt, html)
language by checking raw bytearray (raw HTTP Respon
Thanks.
Yes I will do it.
So you may be the best person to talk about the Russian content indexing. :)
My indexing process follows:
1. RussianTokenizer
2. RussianLowerCaseFilter
3. RussianStopFilter
4. RussianStemFilter
Seems OK to me as I'm using the same structure used by the
Thanks a lot!
Now it is working. It was the Tomcat connector setup
Regards,
Daniel
On 28.06.2007 17:19, "Chris Hostetter" <[EMAIL PROTECTED]> wrote:
>
> : You can also ensure the browser sends an utf8 encoded post by
> : : It works even if the page the form is in is not an UTF-8 page.
>
Hi Danier,
Ensure that UTF-8 is everywhere... SOLR, WebServer, AppServer, HTTP
Headers, etc.
And do not use
q=Бамбарбиа
Киркуду
use this instead (encoded URL):
q=%D0%91%D0%B0%D0%BC%D0%B1%D0%B0%D1%80%D0%B1%D0%B8%D0%B0+%D0%9A%D0%B8%D1%80%D0%BA%D1%83%D0%B4%D1%83
http://www.tokenizer.org is
: You can also ensure the browser sends an utf8 encoded post by
: http://www.nabble.com/Cyrillic-characters-t1963293.html#a5402562
http://wiki.apache.org/solr/SolrTomcat (see URI charset section)
-Hoss
On 6/28/07, Yonik Seeley <[EMAIL PROTECTED]> wrote:
On 6/28/07, Daniel Alheiros <[EMAIL PROTECTED]> wrote:
> I'm in trouble now about how to issue queries against Solr using in my "q"
> parameter content in Russian (it applies to Chinese and Arabic as well).
>
> The problem is I can't send any Ru
On 6/28/07, Daniel Alheiros <[EMAIL PROTECTED]> wrote:
I'm in trouble now about how to issue queries against Solr using in my "q"
parameter content in Russian (it applies to Chinese and Arabic as well).
The problem is I can't send any Russian special character in URL's because
they don't fit in
Hi
I'm in trouble now about how to issue queries against Solr using in my "q"
parameter content in Russian (it applies to Chinese and Arabic as well).
The problem is I can't send any Russian special character in URL's because
they don't fit in ASCII domain, so I'm doing a POST to accomplish that.