Re: Searching with or without diacritics

Koji Sekiguchi Thu, 17 Sep 2009 09:02:47 -0700

The sequence of the TokenizerChain is not correct... Filters must beafter tokenizer:


     <analyzer>
       <!-- <tokenizer class='solr.StandardTokenizerFactory' /> -->
       <charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
       <filter class='solr.StandardFilterFactory' />
       <filter class='solr.LowerCaseFilterFactory' />
     </analyzer>



Koji


György Frivolt wrote:

I tried to use ISOLatin1AccentFilterFactory under solr 1.3 . It partly
works, but does not recognize most of the characters I need to map. So I
tried to use MappingCharFilterFactory based on the documentation it needs a
different tokenizer, I set it, and also a mapping file, this is a simple txt
with char mappings. This would be fine for me, I tried it but does nothing.
I suspect that it cannot locate the mapping file.

The mapping-ISOLatin1Accent.txt is placed to my conf. I tried to change the
path in the schema, but nothing happens. How can I tell solr to read this
mapping file?

This is my schema.xml:

<?xml version='1.0' encoding='utf-8' ?>
<schema name='sunspot' version='0.9'>
  <types>
    <fieldtype class='solr.TextField' name='text'
positionIncrementGap='100'>
      <analyzer>
        <!-- <tokenizer class='solr.StandardTokenizerFactory' /> -->
        <filter class='solr.StandardFilterFactory' />
        <filter class='solr.LowerCaseFilterFactory' />
        <charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      </analyzer>
    </fieldtype>
    <fieldtype class='solr.RandomSortField' name='rand'></fieldtype>
    <fieldtype class='solr.BoolField' name='boolean' omitNorms='true' />
    <fieldtype class='solr.SortableFloatField' name='sfloat'
omitNorms='true' />
    <fieldtype class='solr.DateField' name='date' omitNorms='true' />
    <fieldtype class='solr.SortableIntField' name='sint' omitNorms='true' />
    <fieldtype class='solr.StrField' name='string' omitNorms='true' />
  </types>
  <fields>
    <field indexed='true' multiValued='false' name='id' stored='true'
type='string' />
    <field indexed='true' multiValued='true' name='type' stored='false'
type='string' />
    <field indexed='true' multiValued='false' name='class_name'
stored='false' type='string' />
    <field indexed='true' multiValued='true' name='text' stored='false'
type='text' />
    <dynamicField indexed='true' multiValued='true' name='*_text'
stored='false' type='text' />
    <dynamicField indexed='true' name='random_*' stored='false' type='rand'
/>
    <dynamicField indexed='true' multiValued='false' name='*_b'
stored='false' type='boolean' />
    <dynamicField indexed='true' multiValued='false' name='*_f'
stored='false' type='sfloat' />
    <dynamicField indexed='true' multiValued='false' name='*_d'
stored='false' type='date' />
    <dynamicField indexed='true' multiValued='false' name='*_i'
stored='false' type='sint' />
    <dynamicField indexed='true' multiValued='false' name='*_s'
stored='false' type='string' />
    <dynamicField indexed='true' multiValued='true' name='*_bm'
stored='false' type='boolean' />
    <dynamicField indexed='true' multiValued='true' name='*_fm'
stored='false' type='sfloat' />
    <dynamicField indexed='true' multiValued='true' name='*_dm'
stored='false' type='date' />
    <dynamicField indexed='true' multiValued='true' name='*_im'
stored='false' type='sint' />
    <dynamicField indexed='true' multiValued='true' name='*_sm'
stored='false' type='string' />
    <dynamicField indexed='true' multiValued='false' name='*_bs'
stored='true' type='boolean' />
    <dynamicField indexed='true' multiValued='false' name='*_fs'
stored='true' type='sfloat' />
    <dynamicField indexed='true' multiValued='false' name='*_ds'
stored='true' type='date' />
    <dynamicField indexed='true' multiValued='false' name='*_is'
stored='true' type='sint' />
    <dynamicField indexed='true' multiValued='false' name='*_ss'
stored='true' type='string' />
    <dynamicField indexed='true' multiValued='true' name='*_bms'
stored='true' type='boolean' />
    <dynamicField indexed='true' multiValued='true' name='*_fms'
stored='true' type='sfloat' />
    <dynamicField indexed='true' multiValued='true' name='*_dms'
stored='true' type='date' />
    <dynamicField indexed='true' multiValued='true' name='*_ims'
stored='true' type='sint' />
    <dynamicField indexed='true' multiValued='true' name='*_sms'
stored='true' type='string' />
  </fields>
  <uniqueKey>id</uniqueKey>
  <defaultSearchField>text</defaultSearchField>
  <solrQueryParser defaultOperator='AND' />
  <copyField dest='text' source='*_text' />
</schema>

On Fri, Sep 4, 2009 at 1:14 AM, Chris Hostetter <hossman_luc...@fucit.org>wrote:

Take a look at the MappingCharFilterFactory (in Solr 1.4) and/or the
ISOLatin1AccentFilterFactory.

: Date: Thu, 27 Aug 2009 16:30:08 +0200
: From: "[ISO-8859-1] György Frivolt" <gyorgy.friv...@gmail.com>
: Reply-To: solr-user@lucene.apache.org
: To: solr-user <solr-user@lucene.apache.org>
: Subject: Searching with or without diacritics
:
: Hello,
:
:      I started to use solr only recently using the ruby/rails
sunspot-solr
: client. I use solr on a slovak/czech data set and realized one not wanted
: behaviour of the search. When the user searches an expression or word
which
: contains dicritics, letters like š, č, ť, ä, ô,... usually the special
: characters are omitted in the search query. In this case solr does not
: return records which contain the expression intended to be found by the
: user.
:      How can I configure solr in a way, that it founds records containing
: special characters, even if they are without special accents in the
query?
:
:      Some info about my solr instance: Solr Specification Version:
1.3.0Solr
: Implementation Version: 1.3.0 694707 - grantingersoll - 2008-09-12
: 11:06:47Lucene Specification Version: 2.4-devLucene Implementation
Version:
: 2.4-dev 691741 - 2008-09-03 15:25:16
:
: Thank for your help, regards,
:
:      Georg
:



-Hoss

Re: Searching with or without diacritics

Reply via email to