I'm looking for doing CJK applications by mid next year, also Euro/Russian. Are the analyzers for all those up and running?
Dennis Gearon Signature Warning ---------------- EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Fri, 9/24/10, Andy <angelf...@yahoo.com> wrote: > From: Andy <angelf...@yahoo.com> > Subject: RE: bi-grams for common terms - any analyzers do that? > To: solr-user@lucene.apache.org > Date: Friday, September 24, 2010, 10:04 PM > > --- On Thu, 9/23/10, Burton-West, Tom <tburt...@umich.edu> > wrote: > > > It also splits on whitespace which causes all CJK > queries > > to be treated as phrase queries regardless of the CJK > > tokenizer you use. > > But I thought specialized analyzers like CJKAnalyzer are > designed for those languages, which don't use whitespace to > separate words. > > Isn't it up to the tokenizer, not the QueryParser, to > decide how to split the query into tokens? > > I'm really confused. > > If Solr's QueryParser will only split on whitespace no > matter what then what is the point of using CJKAnalyzer? > > It sounds like Solr would be pretty useless for languages > like CJK. Is there any work around for this? Any CJK sites > using Solr? > > > >