Another way to do multi-lingual indexing is to have a separate field for each language. Solr/Lucene have custom processing for some languages.
On Sun, Oct 18, 2009 at 12:25 PM, Germán Biozzoli <germanbiozz...@gmail.com> wrote: > Thanks Ahmet. Definitely using analyzer appears the english porter as > the killer ;) > Regards > German > > On Sun, Oct 18, 2009 at 7:30 AM, AHMET ARSLAN <iori...@yahoo.com> wrote: >> >>> Hi everybody >>> >>> I have a simple but (for me) annoying problem. I'm happy >>> user of Solr >>> 1.4 with a small collection of documents. Today one of the >>> users has >>> reported that a query returns documents that are >>> non-pertinent to the >>> expression. I have spanish, portuguese and english text >>> inside the >>> collection. Using the Solr administration interface I've >>> found that >>> she was right, if I search for the spanish term >>> "represion", I found >>> just only the word root, I mean it returns every document >>> with the >>> term "repres". Using the admin-debug search I found this: >>> >>> >>> <lst name="debug"> >>> <str >>> name="rawquerystring">description:represion</str> >>> <str >>> name="querystring">description:represion</str> >>> <str >>> name="parsedquery">description:repres</str> >>> <str >>> name="parsedquery_toString">description:repres</str> >>> >>> the "ion" part of the term was deleted by the query parser. >>> The first >>> question is: I don´t know now where should I see to >>> correct this, at >>> the schema.xml or at the solrconfig.xml. >> >>> The only thing that is suspicious to me is the >>> EnglishPorter. >> >> Yes you are right. "ion" part of the term was deleted by it. You can verify >> this using /admin/analysis.jsp page. It will tell you which >> TokenFilterFactory removes it. >> >>> I've deleted from the configuration but nothing changes. Should >>> I reindex the collection to see the changes? >> >> Yes re-index is necessary. >> >>> Should I delete also from the index section? >> >> You should remove English porter from both query and index analyzer. >> >>> What I will loose deleting English porter? >> >> You will lose stemming functionality. But since you have spanish, portuguese >> and english documents using English porter for all the documents is not >> meaningful. >> >> >> >> >> > -- Lance Norskog goks...@gmail.com