Re: Dismax: Impossible to search for a _phrase_ in tokenized and untokenized fields at the same time

Yonik Seeley Sat, 10 Oct 2009 05:03:09 -0700

On Sat, Oct 10, 2009 at 6:34 AM, Alex Baranov <alex.barano...@gmail.com> wrote:
>
> Hello,
>
> It seems to me that there is no way how I can use dismax handler for
> searching in both tokenized and untokenized fields while I'm searching for a
> phrase.
>
> Consider the next example. I have two fields in index: product_name and
> product_name_un. The schema looks like:
>
>        <fieldType name="string_ignore_case" class="solr.TextField"
> positionIncrementGap="100" omitNorms="true">
>      <analyzer>
>         <tokenizer class="solr.KeywordTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>      </analyzer>
>    </fieldType>
>
>    <fieldType name="text_no_stopwords_en" class="solr.TextField"
> positionIncrementGap="100">
>      <analyzer>
>        <tokenizer class="solr.StandardTokenizerFactory"/>
>        <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
>        <filter class="solr.LowerCaseFilterFactory"/>
>        <filter class="solr.ISOLatin1AccentFilterFactory"/>
>        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>        <filter class="solr.SnowballPorterFilterFactory"
> language="English"/>
>      </analyzer>
>        </fieldType>
>
>   <field name="product_name" type="text_no_stopwords_en" indexed="true"
> stored="true"/>
>   <field name="product_name_un" type="string_ignore_case" indexed="true"
> stored="true"/>
>
> <copyField source="product_name" dest="product_name_un"/>
>
> I'm using dismax to search in both of them at the same time:
> "defType=dismax&qf=product_name product_name_un^2.0". (this is done to bring
> on top of the results the products which name _equals_ the entered
> criteria).
>
> 1. When I'm searching for the phrase (two or more keywords), e.g. <blue
> car>, the input string is tokenized and even I have in the index
> product_name_un="blue car", the "product_name_un^2.0" part of the dismax
> config has no effect.


Hmmm, right.  This is due to the fact that the Lucene query parser
(still actually used in dismax) breaks things up by whitespace
*before* analysis (so the analyzer for the untokenized field never
sees the two tokens together).

> 2. When I enter <"blue car"> (in quotas) the string is not tokenized and
> "product_name_un^2.0" part works, but nothing could be found in product_name
> field.

Using explicit quotes will make a phrase query, so blue and car must
appear right next to eachother in product_name.
If it's OK to require both blue and car, in product_name then you can
just set a slop for explicit phrase queries with the qs parameter.

-Yonik
http://www.lucidimagination.com





> I.e. there is no way to have a proper search against two fields at the same
> time. The workaround that I found is using "bq" parameter for specifying the
> boost query for search in field product_name_un. But I don't think that this
> should be the only solution.
>
>
> Another note, related to that: when I set as a default field for search
> product_name_un, and query with the ../select/?q=blue car&rows=10&... I got
> empty results despite the fact that I have "blue car" value in the index in
> that field. I have to use quotas again to fix that... Shouldn't it determine
> the field type and apply corresponding analyzers/tokenizers/etc.?
>
> --
> View this message in context: 
> http://www.nabble.com/Dismax%3A-Impossible-to-search-for-a-_phrase_-in-tokenized-and-untokenized-fields-at-the-same-time-tp25832932p25832932.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>

Re: Dismax: Impossible to search for a _phrase_ in tokenized and untokenized fields at the same time

Reply via email to