Re: analyzer, indexAnalyzer and queryAnalyzer

Doug Turnbull Thu, 30 Apr 2015 08:35:07 -0700

- You write your own QParser plugins - can one keep the features of edismax
for field boosting/phrase-match boosting by subclassing edismax?   Assuming
yes...


hon-lucene-synonyms does this, but largely by copy pasting the code (sorry
about the broken link!)

pf2 and pf3 take the query "hello my name is doug" and chop it up into two
word phrase searches and three word phrase searches respectively.

For example, with q=hello my name is doug&pf2=title body does

title:"hello my" title:"my name" title:"name is" ... body:"hello my" and so
on

pf3 does the same for three word phrases.

-Doug





On Thu, Apr 30, 2015 at 10:58 AM, Dan Davis <dansm...@gmail.com> wrote:

> Hi Doug, nice write-up and 2 questions:
>
> - You write your own QParser plugins - can one keep the features of edismax
> for field boosting/phrase-match boosting by subclassing edismax?   Assuming
> yes...
>
> - What do pf2 and pf3 do in the edismax query parser?
>
> hon-lucene-synonyms plugin links corrections:
>
> http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
> https://github.com/healthonnet/hon-lucene-synonyms
>
>
> On Wed, Apr 29, 2015 at 9:24 PM, Doug Turnbull <
> dturnb...@opensourceconnections.com> wrote:
>
> > So Solr has the idea of a query parser. The query parser is a convenient
> > way of passing a search string to Solr and having Solr parse it into
> > underlying Lucene queries: You can see a list of query parsers here
> > http://wiki.apache.org/solr/QueryParser
> >
> > What this means is that the query parser does work to pull terms into
> > individual clauses *before* analysis is run. It's a parsing layer that
> sits
> > outside the analysis chain. This creates problems like the "sea biscuit"
> > problem, whereby we declare "sea biscuit" as a query time synonym of
> > "seabiscuit". As you may know synonyms are checked during analysis.
> > However, if the query parser splits up "sea" from "biscuit" before
> running
> > analysis, the query time analyzer will fail. The string "sea" is brought
> by
> > itself to the query time analyzer and of course won't match "sea
> biscuit".
> > Same with the string "biscuit" in isolation. If the full string "sea
> > biscuit" was brought to the analyzer, it would see [sea] next to
> [biscuit]
> > and declare it a synonym of seabiscuit. Thanks to the query parser, the
> > analyzer has lost the association between the terms, and both terms
> aren't
> > brought together to the analyzer.
> >
> > My colleague John Berryman wrote a pretty good blog post on this
> >
> >
> http://opensourceconnections.com/blog/2013/10/27/why-is-multi-term-synonyms-so-hard-in-solr/
> >
> > There's several solutions out there that attempt to address this problem.
> > One from Ted Sullivan at Lucidworks
> >
> >
> https://lucidworks.com/blog/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
> >
> > Another popular one is the hon-lucene-synonyms plugin:
> >
> >
> http://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/search/FieldQParserPlugin.html
> >
> > Yet another work-around is to use the field query parser:
> >
> >
> http://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/search/FieldQParserPlugin.html
> >
> > I also tend to write my own query parsers, so on the one hand its
> annoying
> > that query parsers have the problems above, on the flipside Solr makes it
> > very easy to implement whatever parsing you think is appropriatte with a
> > small bit of Java/Lucene knowledge.
> >
> > Hopefully that explanation wasn't too deep, but its an important thing to
> > know about Solr. Are you asking out of curiosity, or do you have a
> specific
> > problem?
> >
> > Thanks
> > -Doug
> >
> > On Wed, Apr 29, 2015 at 6:32 PM, Steven White <swhite4...@gmail.com>
> > wrote:
> >
> > > Hi Doug,
> > >
> > > I don't understand what you mean by the following:
> > >
> > > > For example, if a user searches for q=hot
> dogs&defType=edismax&qf=title
> > > > body the *query parser* *not* the *analyzer* first turns the query
> > into:
> > >
> > > If I have indexAnalyzer and queryAnalyzer in a fieldType that are 100%
> > > identical, the example you provided, does it stand?  If so, why?  Or do
> > you
> > > mean something totally different by "query parser"?
> > >
> > > Thanks
> > >
> > > Steve
> > >
> > >
> > > On Wed, Apr 29, 2015 at 4:18 PM, Doug Turnbull <
> > > dturnb...@opensourceconnections.com> wrote:
> > >
> > > > *> 1) If the content of indexAnalyzer and queryAnalyzer are exactly
> the
> > > > same,that's the same as if I have an analyzer only, right?*
> > > > 1) Yes
> > > >
> > > > *>  2) Under the hood, all three are the same thing when it comes to
> > what
> > > > kind*
> > > > *of data and configuration attributes can take, right?*
> > > > 2) Yes. Both take in text and output a token stream.
> > > >
> > > > *>What I'm trying to figure out is this: beside being able to
> configure
> > > a*
> > > >
> > > > *fieldType to have different analyzer setting at index and query
> time,
> > > > thereis nothing else that's unique about each.*
> > > >
> > > > The only thing to look out for in Solr land is the query parser. Most
> > > Solr
> > > > query parsers treat whitespace as meaningful.
> > > >
> > > > For example, if a user searches for q=hot
> dogs&defType=edismax&qf=title
> > > > body the *query parser* *not* the *analyzer* first turns the query
> > into:
> > > >
> > > > (title:hot title:dog) | (body:hot body:dog)
> > > >
> > > > each word which *then *gets analyzed. This is because the query
> parser
> > > > tries to be smart and turn "hot dog" into hot OR dog, or more
> > > specifically
> > > > making them two must clauses.
> > > >
> > > > This trips quite a few folks up, you can use the field query parser
> > which
> > > > uses the field as a phrase query. Hope that helps
> > > >
> > > >
> > > > --
> > > > *Doug Turnbull **| *Search Relevance Consultant | OpenSource
> > Connections,
> > > > LLC | 240.476.9983 | http://www.opensourceconnections.com
> > > > Author: Taming Search <http://manning.com/turnbull> from Manning
> > > > Publications
> > > > This e-mail and all contents, including attachments, is considered to
> > be
> > > > Company Confidential unless explicitly stated otherwise, regardless
> > > > of whether attachments are marked as such.
> > > > On Wed, Apr 29, 2015 at 3:41 PM, Steven White <swhite4...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Everyone,
> > > > >
> > > > > Looking at Solr's schema.xml, there are three kind of analyzers:
> > > > analyzer,
> > > > > indexAnalyzer and queryAnalyzer.  I have two questions about them:
> > > > >
> > > > > 1) If the content of indexAnalyzer and queryAnalyzer are exactly
> the
> > > > same,
> > > > > that's the same as if I have an analyzer only, right?
> > > > >
> > > > > 2) Under the hood, all three are the same thing when it comes to
> what
> > > > kind
> > > > > of data and configuration attributes can take, right?
> > > > >
> > > > > What I'm trying to figure out is this: beside being able to
> > configure a
> > > > > fieldType to have different analyzer setting at index and query
> time,
> > > > there
> > > > > is nothing else that's unique about each.
> > > > >
> > > > > Thanks
> > > > >
> > > > > Steve
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > *Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections,
> > LLC | 240.476.9983 | http://www.opensourceconnections.com
> > Author: Taming Search <http://manning.com/turnbull> from Manning
> > Publications
> > This e-mail and all contents, including attachments, is considered to be
> > Company Confidential unless explicitly stated otherwise, regardless
> > of whether attachments are marked as such.
> >
>



-- 
*Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections,
LLC | 240.476.9983 | http://www.opensourceconnections.com
Author: Taming Search <http://manning.com/turnbull> from Manning
Publications
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.

Re: analyzer, indexAnalyzer and queryAnalyzer

Reply via email to