I can't tell what that analyzer does, but I'm guessing it uses n-grams? Maybe consider trying https://issues.apache.org/jira/browse/LUCENE-1629 instead?
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: Fer-Bj <fernando.b...@gmail.com> > To: solr-user@lucene.apache.org > Sent: Thursday, June 4, 2009 2:20:03 AM > Subject: Re: indexing Chienese langage > > > We are trying SOLR 1.3 with Paoding Chinese Analyzer , and after reindexing > the index size went from 1.5 Gb to 2.7 Gb. > > Is that some expected behavior ? > > Is there any switch or trick to avoid having a double + index file size? > > Koji Sekiguchi-2 wrote: > > > > CharFilter can normalize (convert) traditional chinese to simplified > > chinese or vice versa, > > if you define mapping.txt. Here is the sample of Chinese character > > normalization: > > > > > https://issues.apache.org/jira/secure/attachment/12392639/character-normalization.JPG > > > > See SOLR-822 for the detail: > > > > https://issues.apache.org/jira/browse/SOLR-822 > > > > Koji > > > > > > revathy arun wrote: > >> Hi, > >> > >> When I index chinese content using chinese tokenizer and analyzer in solr > >> 1.3 ,some of the chinese text files are getting indexed but others are > >> not. > >> > >> Since chinese has got many different language subtypes as in standard > >> chinese,simplified chinese etc which of these does the chinese tokenizer > >> support and is there any method to find the type of chiense language > >> from > >> the file? > >> > >> Rgds > >> > >> > > > > > > > > -- > View this message in context: > http://www.nabble.com/indexing-Chienese-langage-tp22033302p23864358.html > Sent from the Solr - User mailing list archive at Nabble.com.