Another way to do multi-lingual indexing is to have a separate field
for each language. Solr/Lucene have custom processing for some

On Sun, Oct 18, 2009 at 12:25 PM, Germán Biozzoli
<> wrote:
> Thanks Ahmet. Definitely using analyzer appears the english porter as
> the killer ;)
> Regards
> German
> On Sun, Oct 18, 2009 at 7:30 AM, AHMET ARSLAN <> wrote:
>>> Hi everybody
>>> I have a simple but (for me) annoying problem. I'm happy
>>> user of Solr
>>> 1.4 with a small collection of documents. Today one of the
>>> users has
>>> reported that a query returns documents that are
>>> non-pertinent to the
>>> expression. I have spanish, portuguese and english text
>>> inside the
>>> collection. Using the Solr administration interface I've
>>> found that
>>> she was right, if I search for the spanish term
>>> "represion", I found
>>> just only the word root, I mean it returns every document
>>> with the
>>> term "repres". Using the admin-debug search I found this:
>>> <lst name="debug">
>>> <str
>>> name="rawquerystring">description:represion</str>
>>> <str
>>> name="querystring">description:represion</str>
>>> <str
>>> name="parsedquery">description:repres</str>
>>> <str
>>> name="parsedquery_toString">description:repres</str>
>>> the "ion" part of the term was deleted by the query parser.
>>> The first
>>> question is: I don´t know now where should I see to
>>> correct this, at
>>> the schema.xml or at the solrconfig.xml.
>>> The only thing that is suspicious to me is the
>>> EnglishPorter.
>> Yes you are right. "ion" part of the term was deleted by it. You can verify 
>> this using /admin/analysis.jsp page. It will tell you which 
>> TokenFilterFactory removes it.
>>> I've deleted from the configuration but nothing changes. Should
>>> I reindex the collection to see the changes?
>> Yes re-index is necessary.
>>> Should I delete also from the index section?
>> You should remove English porter from both query and index analyzer.
>>> What I will loose deleting English porter?
>> You will lose stemming functionality. But since you have spanish, portuguese 
>> and english documents using English porter for all the documents is not 
>> meaningful.

Lance Norskog

Reply via email to