Re: Problem with UTF-8 and Solr ISOLatin1AccentFilterFactory

aerox7 Fri, 20 Mar 2009 02:56:10 -0700

I add :
"Ã¨" => "e" to mapping-ISOLatin1Accent.txt


and add the following fieldType: 

<fieldType name="textCharNorm" class="solr.TextField" 
positionIncrementGap="100" > 
  <analyzer> 
    <charFilter class="solr.MappingCharFilterFactory" 
mapping="mapping-ISOLatin1Accent.txt"/> 
    <tokenizer class="solr.CharStreamAwareWhitespaceTokenizerFactory"/> 
  </analyzer> 
</fieldType> 

By still have the same probleme ! it's only work when i store ISO string
into UTF-8 data base (ex: store solène not solÃ¨ne)............ :,(




aerox7 wrote:
> 
> ==> where are you seeing it as ""SolÃ¨ne" as opposed to the   
> correct way of solène? 
> 
> I have "SolÃ¨ne" in my Mysql DATA BASE ! so i don't know if this is
> correct or not ? i gess that "SolÃ¨ne" is solène in UTF-8 ?!
> 
> I'vz tryed analysis in http://localhost:8983/solr/admin/analysis.jsp, so
> when i try with solène everything is ok ! but when i try with SolÃ¨ne
> (like what i have in DB) analysis convert Ã in A delete ¨ so i get SolAne
> !!!
> 
> I think that ISOLatin1AccentFilterFactory take only string with Charset
> ISO-8859-1 .
> 
> So any solution to transform my string to ISO-8859-1 before indexing
> process. May be by creating transformer in DataImportHandler ? (Never code
> in java :( )
> 
> Thank you all.
> 
> 
> Koji Sekiguchi-2 wrote:
>> 
>> aerox7 wrote:
>>> Hi,
>>> I have a mysql data base in UTF-8. I have a row with "SolÃ¨ne" (solène).
>>> I
>>> want to transforme this to solene, so i use Solr
>>> ISOLatin1AccentFilterFactory to perform this task but it dosn't work ?!!
>>>
>>> i gess that "SolÃ¨ne" is "solène" in UTF-8 ?! i also set tomcat to utf-8
>>> so
>>> normaly ISOLatin1AccentFilterFactory have to replace the accent .......
>>>
>>> any ideas ?
>>>
>>> i use DataImportHandler.
>>>   
>> 
>> If a mapping rule "Ã¨" to "e" is always true in your field, you can try 
>> to use MappingCharFilter
>> instead of ISOLatin1AccentFilter. Add the following line to 
>> mapping-ISOLatin1Accent.txt:
>> 
>> "Ã¨" => "e"
>> 
>> and add the following fieldType:
>> 
>> <fieldType name="textCharNorm" class="solr.TextField" 
>> positionIncrementGap="100" >
>>   <analyzer>
>>     <charFilter class="solr.MappingCharFilterFactory" 
>> mapping="mapping-ISOLatin1Accent.txt"/>
>>     <tokenizer class="solr.CharStreamAwareWhitespaceTokenizerFactory"/>
>>   </analyzer>
>> </fieldType>
>> 
>> MappingCharFilter and mapping-ISOLatin1Accent.txt are in nightly build.
>> 
>> Koji
>> 
>> 
>> 
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Problem-with-UTF-8-and-Solr-ISOLatin1AccentFilterFactory-tp22607642p22617278.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Problem with UTF-8 and Solr ISOLatin1AccentFilterFactory

Reply via email to