Re: indexing Chienese langage

Fer-Bj Wed, 03 Jun 2009 23:20:35 -0700

We are trying SOLR 1.3 with Paoding Chinese Analyzer , and after reindexing
the index size went from 1.5 Gb to 2.7 Gb.


Is that some expected behavior ?

Is there any switch or trick to avoid having a double + index file size?

Koji Sekiguchi-2 wrote:
> 
> CharFilter can normalize (convert) traditional chinese to simplified 
> chinese or vice versa,
> if you define mapping.txt. Here is the sample of Chinese character 
> normalization:
> 
> https://issues.apache.org/jira/secure/attachment/12392639/character-normalization.JPG
> 
> See SOLR-822 for the detail:
> 
> https://issues.apache.org/jira/browse/SOLR-822
> 
> Koji
> 
> 
> revathy arun wrote:
>> Hi,
>>
>> When I index chinese content using chinese tokenizer and analyzer in solr
>> 1.3 ,some of the chinese text files are getting indexed but others are
>> not.
>>
>> Since chinese has got many different language subtypes as in standard
>> chinese,simplified chinese etc which of these does the chinese tokenizer
>> support and is there any method to find the type of  chiense language 
>> from
>> the file?
>>
>> Rgds
>>
>>   
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/indexing-Chienese-langage-tp22033302p23864358.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: indexing Chienese langage

Reply via email to