The sequence of the TokenizerChain is not correct... Filters must be
after tokenizer:
<analyzer>
<!-- <tokenizer class='solr.StandardTokenizerFactory' /> -->
<charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class='solr.StandardFilterFactory' />
<filter class='solr.LowerCaseFilterFactory' />
</analyzer>
Koji
György Frivolt wrote:
I tried to use ISOLatin1AccentFilterFactory under solr 1.3 . It partly
works, but does not recognize most of the characters I need to map. So I
tried to use MappingCharFilterFactory based on the documentation it needs a
different tokenizer, I set it, and also a mapping file, this is a simple txt
with char mappings. This would be fine for me, I tried it but does nothing.
I suspect that it cannot locate the mapping file.
The mapping-ISOLatin1Accent.txt is placed to my conf. I tried to change the
path in the schema, but nothing happens. How can I tell solr to read this
mapping file?
This is my schema.xml:
<?xml version='1.0' encoding='utf-8' ?>
<schema name='sunspot' version='0.9'>
<types>
<fieldtype class='solr.TextField' name='text'
positionIncrementGap='100'>
<analyzer>
<!-- <tokenizer class='solr.StandardTokenizerFactory' /> -->
<filter class='solr.StandardFilterFactory' />
<filter class='solr.LowerCaseFilterFactory' />
<charFilter class="solr.MappingCharFilterFactory"
mapping="mapping-ISOLatin1Accent.txt"/>
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
</analyzer>
</fieldtype>
<fieldtype class='solr.RandomSortField' name='rand'></fieldtype>
<fieldtype class='solr.BoolField' name='boolean' omitNorms='true' />
<fieldtype class='solr.SortableFloatField' name='sfloat'
omitNorms='true' />
<fieldtype class='solr.DateField' name='date' omitNorms='true' />
<fieldtype class='solr.SortableIntField' name='sint' omitNorms='true' />
<fieldtype class='solr.StrField' name='string' omitNorms='true' />
</types>
<fields>
<field indexed='true' multiValued='false' name='id' stored='true'
type='string' />
<field indexed='true' multiValued='true' name='type' stored='false'
type='string' />
<field indexed='true' multiValued='false' name='class_name'
stored='false' type='string' />
<field indexed='true' multiValued='true' name='text' stored='false'
type='text' />
<dynamicField indexed='true' multiValued='true' name='*_text'
stored='false' type='text' />
<dynamicField indexed='true' name='random_*' stored='false' type='rand'
/>
<dynamicField indexed='true' multiValued='false' name='*_b'
stored='false' type='boolean' />
<dynamicField indexed='true' multiValued='false' name='*_f'
stored='false' type='sfloat' />
<dynamicField indexed='true' multiValued='false' name='*_d'
stored='false' type='date' />
<dynamicField indexed='true' multiValued='false' name='*_i'
stored='false' type='sint' />
<dynamicField indexed='true' multiValued='false' name='*_s'
stored='false' type='string' />
<dynamicField indexed='true' multiValued='true' name='*_bm'
stored='false' type='boolean' />
<dynamicField indexed='true' multiValued='true' name='*_fm'
stored='false' type='sfloat' />
<dynamicField indexed='true' multiValued='true' name='*_dm'
stored='false' type='date' />
<dynamicField indexed='true' multiValued='true' name='*_im'
stored='false' type='sint' />
<dynamicField indexed='true' multiValued='true' name='*_sm'
stored='false' type='string' />
<dynamicField indexed='true' multiValued='false' name='*_bs'
stored='true' type='boolean' />
<dynamicField indexed='true' multiValued='false' name='*_fs'
stored='true' type='sfloat' />
<dynamicField indexed='true' multiValued='false' name='*_ds'
stored='true' type='date' />
<dynamicField indexed='true' multiValued='false' name='*_is'
stored='true' type='sint' />
<dynamicField indexed='true' multiValued='false' name='*_ss'
stored='true' type='string' />
<dynamicField indexed='true' multiValued='true' name='*_bms'
stored='true' type='boolean' />
<dynamicField indexed='true' multiValued='true' name='*_fms'
stored='true' type='sfloat' />
<dynamicField indexed='true' multiValued='true' name='*_dms'
stored='true' type='date' />
<dynamicField indexed='true' multiValued='true' name='*_ims'
stored='true' type='sint' />
<dynamicField indexed='true' multiValued='true' name='*_sms'
stored='true' type='string' />
</fields>
<uniqueKey>id</uniqueKey>
<defaultSearchField>text</defaultSearchField>
<solrQueryParser defaultOperator='AND' />
<copyField dest='text' source='*_text' />
</schema>
On Fri, Sep 4, 2009 at 1:14 AM, Chris Hostetter <hossman_luc...@fucit.org>wrote:
Take a look at the MappingCharFilterFactory (in Solr 1.4) and/or the
ISOLatin1AccentFilterFactory.
: Date: Thu, 27 Aug 2009 16:30:08 +0200
: From: "[ISO-8859-1] György Frivolt" <gyorgy.friv...@gmail.com>
: Reply-To: solr-user@lucene.apache.org
: To: solr-user <solr-user@lucene.apache.org>
: Subject: Searching with or without diacritics
:
: Hello,
:
: I started to use solr only recently using the ruby/rails
sunspot-solr
: client. I use solr on a slovak/czech data set and realized one not wanted
: behaviour of the search. When the user searches an expression or word
which
: contains dicritics, letters like š, č, ť, ä, ô,... usually the special
: characters are omitted in the search query. In this case solr does not
: return records which contain the expression intended to be found by the
: user.
: How can I configure solr in a way, that it founds records containing
: special characters, even if they are without special accents in the
query?
:
: Some info about my solr instance: Solr Specification Version:
1.3.0Solr
: Implementation Version: 1.3.0 694707 - grantingersoll - 2008-09-12
: 11:06:47Lucene Specification Version: 2.4-devLucene Implementation
Version:
: 2.4-dev 691741 - 2008-09-03 15:25:16
:
: Thank for your help, regards,
:
: Georg
:
-Hoss