Re: indexing Chienese langage

Erick Erickson Thu, 04 Jun 2009 06:37:43 -0700

Hmmm, are you quite sure that you emptied the index first and didn'tjust add
all the documents a second time to the index?


Also, when you say the index almost doubled, were you looking only
at the size of the *directory*? SOLR might have been holding a copy
of the old index open while you built a new one...

Best
Erick

On Thu, Jun 4, 2009 at 2:20 AM, Fer-Bj <fernando.b...@gmail.com> wrote:

>
> We are trying SOLR 1.3 with Paoding Chinese Analyzer , and after reindexing
> the index size went from 1.5 Gb to 2.7 Gb.
>
> Is that some expected behavior ?
>
> Is there any switch or trick to avoid having a double + index file size?
>
> Koji Sekiguchi-2 wrote:
> >
> > CharFilter can normalize (convert) traditional chinese to simplified
> > chinese or vice versa,
> > if you define mapping.txt. Here is the sample of Chinese character
> > normalization:
> >
> >
> https://issues.apache.org/jira/secure/attachment/12392639/character-normalization.JPG
> >
> > See SOLR-822 for the detail:
> >
> > https://issues.apache.org/jira/browse/SOLR-822
> >
> > Koji
> >
> >
> > revathy arun wrote:
> >> Hi,
> >>
> >> When I index chinese content using chinese tokenizer and analyzer in
> solr
> >> 1.3 ,some of the chinese text files are getting indexed but others are
> >> not.
> >>
> >> Since chinese has got many different language subtypes as in standard
> >> chinese,simplified chinese etc which of these does the chinese tokenizer
> >> support and is there any method to find the type of  chiense language
> >> from
> >> the file?
> >>
> >> Rgds
> >>
> >>
> >
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/indexing-Chienese-langage-tp22033302p23864358.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: indexing Chienese langage

Reply via email to