Wow, I never heard of autoGeneratePhraseQueries before. Is there any documentation of what it does?
My initial reaction is being confused because this sounds kind of like the opposite of hte original issue. The original issue is that the query parsers are splitting on whitespace _before_ they give tokens to the field analyzers. The query parsers actually do this only with queries that are NOT explicit phrase queries. I woudln't call this behavior "automatically generating phrase queries" exactly, and wouldn't expect that turning off "automatic generating of phrase queries" would prevent the pre-tokenization by the query parser. But... it does somehow? Can anyone point me to more info about what autoGeneratePhraseQueries does exactly? If I can use it to turn off that behavior (in a way that only turns it off for some fields but not others even in a multi-field dismax query somehow?) that would be pretty darn useful, I've been struggling with that for a while. Jonathan ________________________________________ From: Robert Muir [rcm...@gmail.com] Sent: Saturday, September 25, 2010 6:46 AM To: solr-user@lucene.apache.org Subject: Re: bi-grams for common terms - any analyzers do that? On Sat, Sep 25, 2010 at 1:04 AM, Andy <angelf...@yahoo.com> wrote: > > But I thought specialized analyzers like CJKAnalyzer are designed for those > languages, which don't use whitespace to separate words. > yes > > Isn't it up to the tokenizer, not the QueryParser, to decide how to split > the query into tokens? > yes > I'm really confused. > actually it sounds like you understand the situation perfectly!! > If Solr's QueryParser will only split on whitespace no matter what then > what is the point of using CJKAnalyzer? > It sounds like Solr would be pretty useless for languages like CJK. Is > there any work around for this? Any CJK sites using Solr? > if you do not want all queries to be phrasequeries, you should use: <fieldType name="text" class="solr.TextField" autoGeneratePhraseQueries="false"> then the lack of whitespace between words will not cause phrase queries. if you use this option, phrase queries will only be caused if the user explicitly puts terms in double quotes. -- Robert Muir rcm...@gmail.com