Re: Query parsing - difference between Analysis and parsedquery_toString output

Erick Erickson Sun, 19 Oct 2014 15:05:49 -0700

This trips _everybody_ up. Analysis doesn't happen until things get
through the query parser. So,
let's assume your query is
q=manufacture_t:The Hershey Company^100 OR title_t:The Hershey
Company^1000


The problem is that the query _parser_ doesn't understand that
your intent is that "the hershey company" be evaluated against
the manuracture_t field, and the title_t field. All it sees is
manufacture_t:the then, as a "naked" token, hershey and company.
So, it does the best it can and assumes that "hershey" and "company"
should be evaluated against your default text field, in this case "text".

You have two choices here:
1> form your query like maufacture_t:"The Hershey Company",or
manufacture_t:(The Hershey Company).

The first form requires that the words "The", "Hershey", and "Company"
appear in sequence, and the second form just requires that all three
appear in somewhere in the field in any order.

Actually, the second form requires that only one of the terms appears
in the field assuming your default q.op is OR. If you require all three
either define the default operator to be AND or enter it as
manuracture_t:(The AND Hershey AND company).

Best,
Erick

On Sun, Oct 19, 2014 at 4:49 PM, tinush <tanya.karpin...@gmail.com> wrote:
> Hi,
>
> I use Solr 4.9 and imported about 20K documents from CSV data.
>
> In schema there is following definition for text_general field which I want
> to process by tokenization, stop word removal, stemming.
>
>     <fieldType name="text_general" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> enablePositionIncrements="true" />
>                 <filter class="solr.ASCIIFoldingFilterFactory" />
>         <filter class="solr.SnowballPorterFilterFactory"
> language="English"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.StandardTokenizerFactory"/>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> enablePositionIncrements="true" />
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"/>
>                 <filter class="solr.ASCIIFoldingFilterFactory" />
>         <filter class="solr.SnowballPorterFilterFactory"
> language="English"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>     </fieldType>
>
> Using Solr Admin Analysis for that field type I see that both index and
> query value proceed as expected: Hershey's -> *hershey*, The Hershey's
> Company -> the *hershey* compani
>
> I was expected the same processing for select query, but it seems doesn't
> happen and no result found in below example:
>  "q": "manufacture_t:The Hershey Company^100 OR title_t:The Hershey
> Company^1000"
>  "parsedquery_toString": "manufacture_t:the text:Hershey text:Company^100.0
> title_t:the text:Hershey text:Company^1000.0",
>
> indexed document:
>    "docs": [
>       {
>         "id": "00010700501806",
>         "description_t": [
>           "Hershey's Whoppers Carton - 12 Pack "
>         ],
>         "title_t": [
>           "Whoppers Carton - 12 Pack"
>         ],
>         "manufacture_t": [
>           "Hershey's"
>         ],
>
> What do I miss?
>
> Thanks in advance,
> Tanya
>
>
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Query-parsing-difference-between-Analysis-and-parsedquery-toString-output-tp4164851.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Query parsing - difference between Analysis and parsedquery_toString output

Reply via email to