Wow, I never heard of autoGeneratePhraseQueries before. Is there any 
documentation of what it does?  

My initial reaction is being confused because this sounds kind of like the 
opposite of hte original issue. The original issue is that the query parsers 
are splitting on whitespace _before_ they give tokens to the field analyzers.  
The query parsers actually do this only with queries that are NOT explicit 
phrase queries.  I woudln't call this behavior "automatically generating phrase 
queries" exactly, and wouldn't expect that turning off "automatic generating of 
phrase queries" would prevent the pre-tokenization by the query parser.  But... 
it does somehow?

Can anyone point me to more info about what autoGeneratePhraseQueries does 
exactly?  If I can use it to turn off that behavior (in a way that only turns 
it off for some fields but not others even in a multi-field dismax query 
somehow?) that would be pretty darn useful, I've been struggling with that for 
a while. 

Jonathan
________________________________________
From: Robert Muir [rcm...@gmail.com]
Sent: Saturday, September 25, 2010 6:46 AM
To: solr-user@lucene.apache.org
Subject: Re: bi-grams for common terms - any analyzers do that?

On Sat, Sep 25, 2010 at 1:04 AM, Andy <angelf...@yahoo.com> wrote:

>
> But I thought specialized analyzers like CJKAnalyzer are designed for those
> languages, which don't use whitespace to separate words.
>

yes


>
> Isn't it up to the tokenizer, not the QueryParser, to decide how to split
> the query into tokens?
>

yes


> I'm really confused.
>

actually it sounds like you understand the situation perfectly!!


> If Solr's QueryParser will only split on whitespace no matter what then
> what is the point of using CJKAnalyzer?


> It sounds like Solr would be pretty useless for languages like CJK. Is
> there any work around for this? Any CJK sites using Solr?
>

if you do not want all queries to be phrasequeries, you should use:

<fieldType name="text" class="solr.TextField"
autoGeneratePhraseQueries="false">

then the lack of whitespace between words will not cause phrase queries. if
you use this option, phrase queries will only be caused if the user
explicitly puts terms in double quotes.

--
Robert Muir
rcm...@gmail.com

Reply via email to