Re: indexing Chienese langage

Otis Gospodnetic Thu, 04 Jun 2009 14:37:44 -0700

I can't tell what that analyzer does, but I'm guessing it uses n-grams?
Maybe consider trying https://issues.apache.org/jira/browse/LUCENE-1629 instead?


 Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----
> From: Fer-Bj <fernando.b...@gmail.com>
> To: solr-user@lucene.apache.org
> Sent: Thursday, June 4, 2009 2:20:03 AM
> Subject: Re: indexing Chienese langage
> 
> 
> We are trying SOLR 1.3 with Paoding Chinese Analyzer , and after reindexing
> the index size went from 1.5 Gb to 2.7 Gb.
> 
> Is that some expected behavior ?
> 
> Is there any switch or trick to avoid having a double + index file size?
> 
> Koji Sekiguchi-2 wrote:
> > 
> > CharFilter can normalize (convert) traditional chinese to simplified 
> > chinese or vice versa,
> > if you define mapping.txt. Here is the sample of Chinese character 
> > normalization:
> > 
> > 
> https://issues.apache.org/jira/secure/attachment/12392639/character-normalization.JPG
> > 
> > See SOLR-822 for the detail:
> > 
> > https://issues.apache.org/jira/browse/SOLR-822
> > 
> > Koji
> > 
> > 
> > revathy arun wrote:
> >> Hi,
> >>
> >> When I index chinese content using chinese tokenizer and analyzer in solr
> >> 1.3 ,some of the chinese text files are getting indexed but others are
> >> not.
> >>
> >> Since chinese has got many different language subtypes as in standard
> >> chinese,simplified chinese etc which of these does the chinese tokenizer
> >> support and is there any method to find the type of  chiense language 
> >> from
> >> the file?
> >>
> >> Rgds
> >>
> >>  
> > 
> > 
> > 
> 
> -- 
> View this message in context: 
> http://www.nabble.com/indexing-Chienese-langage-tp22033302p23864358.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: indexing Chienese langage

Reply via email to