On Sat, Oct 10, 2009 at 6:34 AM, Alex Baranov <alex.barano...@gmail.com> wrote: > > Hello, > > It seems to me that there is no way how I can use dismax handler for > searching in both tokenized and untokenized fields while I'm searching for a > phrase. > > Consider the next example. I have two fields in index: product_name and > product_name_un. The schema looks like: > > <fieldType name="string_ignore_case" class="solr.TextField" > positionIncrementGap="100" omitNorms="true"> > <analyzer> > <tokenizer class="solr.KeywordTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > </analyzer> > </fieldType> > > <fieldType name="text_no_stopwords_en" class="solr.TextField" > positionIncrementGap="100"> > <analyzer> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="1" generateNumberParts="1" catenateWords="0" > catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.ISOLatin1AccentFilterFactory"/> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > <filter class="solr.SnowballPorterFilterFactory" > language="English"/> > </analyzer> > </fieldType> > > <field name="product_name" type="text_no_stopwords_en" indexed="true" > stored="true"/> > <field name="product_name_un" type="string_ignore_case" indexed="true" > stored="true"/> > > <copyField source="product_name" dest="product_name_un"/> > > I'm using dismax to search in both of them at the same time: > "defType=dismax&qf=product_name product_name_un^2.0". (this is done to bring > on top of the results the products which name _equals_ the entered > criteria). > > 1. When I'm searching for the phrase (two or more keywords), e.g. <blue > car>, the input string is tokenized and even I have in the index > product_name_un="blue car", the "product_name_un^2.0" part of the dismax > config has no effect.
Hmmm, right. This is due to the fact that the Lucene query parser (still actually used in dismax) breaks things up by whitespace *before* analysis (so the analyzer for the untokenized field never sees the two tokens together). > 2. When I enter <"blue car"> (in quotas) the string is not tokenized and > "product_name_un^2.0" part works, but nothing could be found in product_name > field. Using explicit quotes will make a phrase query, so blue and car must appear right next to eachother in product_name. If it's OK to require both blue and car, in product_name then you can just set a slop for explicit phrase queries with the qs parameter. -Yonik http://www.lucidimagination.com > I.e. there is no way to have a proper search against two fields at the same > time. The workaround that I found is using "bq" parameter for specifying the > boost query for search in field product_name_un. But I don't think that this > should be the only solution. > > > Another note, related to that: when I set as a default field for search > product_name_un, and query with the ../select/?q=blue car&rows=10&... I got > empty results despite the fact that I have "blue car" value in the index in > that field. I have to use quotas again to fix that... Shouldn't it determine > the field type and apply corresponding analyzers/tokenizers/etc.? > > -- > View this message in context: > http://www.nabble.com/Dismax%3A-Impossible-to-search-for-a-_phrase_-in-tokenized-and-untokenized-fields-at-the-same-time-tp25832932p25832932.html > Sent from the Solr - User mailing list archive at Nabble.com. > >