Re: analyzer, indexAnalyzer and queryAnalyzer

Doug Turnbull Wed, 29 Apr 2015 18:26:21 -0700

So Solr has the idea of a query parser. The query parser is a convenient
way of passing a search string to Solr and having Solr parse it into
underlying Lucene queries: You can see a list of query parsers here
http://wiki.apache.org/solr/QueryParser


What this means is that the query parser does work to pull terms into
individual clauses *before* analysis is run. It's a parsing layer that sits
outside the analysis chain. This creates problems like the "sea biscuit"
problem, whereby we declare "sea biscuit" as a query time synonym of
"seabiscuit". As you may know synonyms are checked during analysis.
However, if the query parser splits up "sea" from "biscuit" before running
analysis, the query time analyzer will fail. The string "sea" is brought by
itself to the query time analyzer and of course won't match "sea biscuit".
Same with the string "biscuit" in isolation. If the full string "sea
biscuit" was brought to the analyzer, it would see [sea] next to [biscuit]
and declare it a synonym of seabiscuit. Thanks to the query parser, the
analyzer has lost the association between the terms, and both terms aren't
brought together to the analyzer.

My colleague John Berryman wrote a pretty good blog post on this
http://opensourceconnections.com/blog/2013/10/27/why-is-multi-term-synonyms-so-hard-in-solr/

There's several solutions out there that attempt to address this problem.
One from Ted Sullivan at Lucidworks
https://lucidworks.com/blog/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/

Another popular one is the hon-lucene-synonyms plugin:
http://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/search/FieldQParserPlugin.html

Yet another work-around is to use the field query parser:
http://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/search/FieldQParserPlugin.html

I also tend to write my own query parsers, so on the one hand its annoying
that query parsers have the problems above, on the flipside Solr makes it
very easy to implement whatever parsing you think is appropriatte with a
small bit of Java/Lucene knowledge.

Hopefully that explanation wasn't too deep, but its an important thing to
know about Solr. Are you asking out of curiosity, or do you have a specific
problem?

Thanks
-Doug

On Wed, Apr 29, 2015 at 6:32 PM, Steven White <swhite4...@gmail.com> wrote:

> Hi Doug,
>
> I don't understand what you mean by the following:
>
> > For example, if a user searches for q=hot dogs&defType=edismax&qf=title
> > body the *query parser* *not* the *analyzer* first turns the query into:
>
> If I have indexAnalyzer and queryAnalyzer in a fieldType that are 100%
> identical, the example you provided, does it stand?  If so, why?  Or do you
> mean something totally different by "query parser"?
>
> Thanks
>
> Steve
>
>
> On Wed, Apr 29, 2015 at 4:18 PM, Doug Turnbull <
> dturnb...@opensourceconnections.com> wrote:
>
> > *> 1) If the content of indexAnalyzer and queryAnalyzer are exactly the
> > same,that's the same as if I have an analyzer only, right?*
> > 1) Yes
> >
> > *>  2) Under the hood, all three are the same thing when it comes to what
> > kind*
> > *of data and configuration attributes can take, right?*
> > 2) Yes. Both take in text and output a token stream.
> >
> > *>What I'm trying to figure out is this: beside being able to configure
> a*
> >
> > *fieldType to have different analyzer setting at index and query time,
> > thereis nothing else that's unique about each.*
> >
> > The only thing to look out for in Solr land is the query parser. Most
> Solr
> > query parsers treat whitespace as meaningful.
> >
> > For example, if a user searches for q=hot dogs&defType=edismax&qf=title
> > body the *query parser* *not* the *analyzer* first turns the query into:
> >
> > (title:hot title:dog) | (body:hot body:dog)
> >
> > each word which *then *gets analyzed. This is because the query parser
> > tries to be smart and turn "hot dog" into hot OR dog, or more
> specifically
> > making them two must clauses.
> >
> > This trips quite a few folks up, you can use the field query parser which
> > uses the field as a phrase query. Hope that helps
> >
> >
> > --
> > *Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections,
> > LLC | 240.476.9983 | http://www.opensourceconnections.com
> > Author: Taming Search <http://manning.com/turnbull> from Manning
> > Publications
> > This e-mail and all contents, including attachments, is considered to be
> > Company Confidential unless explicitly stated otherwise, regardless
> > of whether attachments are marked as such.
> > On Wed, Apr 29, 2015 at 3:41 PM, Steven White <swhite4...@gmail.com>
> > wrote:
> >
> > > Hi Everyone,
> > >
> > > Looking at Solr's schema.xml, there are three kind of analyzers:
> > analyzer,
> > > indexAnalyzer and queryAnalyzer.  I have two questions about them:
> > >
> > > 1) If the content of indexAnalyzer and queryAnalyzer are exactly the
> > same,
> > > that's the same as if I have an analyzer only, right?
> > >
> > > 2) Under the hood, all three are the same thing when it comes to what
> > kind
> > > of data and configuration attributes can take, right?
> > >
> > > What I'm trying to figure out is this: beside being able to configure a
> > > fieldType to have different analyzer setting at index and query time,
> > there
> > > is nothing else that's unique about each.
> > >
> > > Thanks
> > >
> > > Steve
> > >
> >
>



-- 
*Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections,
LLC | 240.476.9983 | http://www.opensourceconnections.com
Author: Taming Search <http://manning.com/turnbull> from Manning
Publications
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.

Re: analyzer, indexAnalyzer and queryAnalyzer

Reply via email to