Re: Question about Edismax - Solr 4.0

Jack Krupansky Thu, 16 May 2013 21:36:05 -0700

Ah... I think your issue is the preserveOriginal=1 on the query analyzer aswell as the fact that you have all of these catenatexx="1" options on thequery analyzer - I indicated that you should remove them all.

The problem is that the whitespace analyzer leaves the leading comma inplace, and the preserveOriginal="1" also generates an extra token for theterm, with the comma in place . But, with the space, the comma and "10" areseparate terms and get analyzed independently.

The query results probably indicate that you don't have that exactcombination of the term and leading punctuation - or that there is nostandalone comma in your input data.


Try the following replacement for the query-time WDF:

<filter class="solr.WordDelimiterFilterFactory" stemEnglishPossessive="0"generateWordParts="1" generateNumberParts="1"catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"splitOnNumerics="0" preserveOriginal="0" />


-- Jack Krupansky

-----Original Message-----From: Sandeep Mestry

Sent: Thursday, May 16, 2013 5:50 PM
To: solr-user@lucene.apache.org
Subject: Re: Question about Edismax - Solr 4.0

Hi Jack,

Thanks for your response again and for helping me out to get through this.

The URL is definitely encoded for spaces and it looks like below. As I
mentioned in my previous mail, I can't add it to query parameter as that
searches on multiple fields.

The title field is defined as below:
<field name="title" type="text_wc" indexed="true" stored="false"
multiValued="true"/>

q=countryside&rows=20&qt=assdismax&fq=%28title%3A%28,10%29%29&fq=collection:assets

<requestHandler name="assdismax" class="solr.SearchHandler">
<lst name="defaults">
<str name="defType">edismax</str>
<str name="echoParams">explicit</str>
<float name="tie">0.01</float>
<str name="qf">title^10 description^5 annotations^3 notes^2 categories</str>
<str name="pf">title</str>
<int name="ps">0</int>
<str name="q.alt">*:*</str>
<str name="fl">*,score</str>
<str name="mm">100%</str>
<str name="q.op">AND</str>
<str name="sort">score desc</str>
<str name="facet">true</str>
<str name="facet.limit">-1</str>
<str name="facet.mincount">1</str>
<str name="facet.field">uniq_subtype_id</str>
<str name="facet.field">component_type</str>
<str name="facet.field">genre_type</str>
</lst>
<lst name="appends">
<str name="fq">collection:assets</str>
</lst>
</requestHandler>

The term 'countryside' needs to be searched against multiple fields
including titles, descriptions, annotations, categories, notes but the UI
also has a feature to limit results by providing a title field.

I can see that the filter queries are always parsed by LuceneQueryParser
however I'd expect it to generate the parsed_filter_queries debug output in
every situation.

I have tried it as the main query with both edismax and lucene defType and
it gives me correct output and correct results.
But, there is some problem when this is used as a filter query as the the
parser is not able to parse a comma with a space.

Thanks again Jack, please let me know in case you need more inputs from my
side.

Best Regards,
Sandeep

On 16 May 2013 18:03, Jack Krupansky <j...@basetechnology.com> wrote:

Could you show us the full query URL - spaces must be encoded in URL query
parameters.

Also show the actual field XML - you omitted that.

Try the same query as a main query, using both defType=edismax and
defType=lucene.

Note that the filter query is parsed using the Lucene query parser, not
edismax, independent of the defType parameter. But you don't have any
edismax features in your fq anyway.

But you can stick {!edismax} in front of the query to force edismax to be
used for the fq, although it really shouldn't change anything:

Also, catenate is fine for indexing, but will mess up your queries at
query time, so set them to "0" in the query analyzer

Also, make sure you have autoGeneratePhraseQueries="**true" on the field
type, but that's not the issue here.


-- Jack Krupansky

-----Original Message----- From: Sandeep Mestry
Sent: Thursday, May 16, 2013 12:42 PM
To: solr-user@lucene.apache.org
Subject: Re: Question about Edismax - Solr 4.0


Thanks Jack for your reply..

The problem is, I'm finding results for fq=title:(,10) but not for
fq=title:(, 10) - apologies if that was not clear from my first mail.
I have already mentioned the debug analysis in my previous mail.

Additionally, the title field is defined as below:

<fieldType name="text_wc" class="solr.TextField"positionIncrementGap="100"


         <analyzer type="index">

               <tokenizer class="solr.**WhitespaceTokenizerFactory"/>
               <filter class="solr.**WordDelimiterFilterFactory"
stemEnglishPossessive="0" generateWordParts="1" generateNumberParts="1"

catenateWords="1" catenateNumbers="1" catenateAll="1"splitOnCaseChange="1"

splitOnNumerics="0" preserveOriginal="1" />
               <filter class="solr.**LowerCaseFilterFactory"/>
           </analyzer>
           <analyzer type="query">
               <tokenizer class="solr.**WhitespaceTokenizerFactory"/>
               <filter class="solr.**WordDelimiterFilterFactory"
stemEnglishPossessive="0" generateWordParts="1" generateNumberParts="1"

catenateWords="1" catenateNumbers="1" catenateAll="1"splitOnCaseChange="1"

splitOnNumerics="0" preserveOriginal="1" />
               <filter class="solr.**LowerCaseFilterFactory"/>
           </analyzer>
       </fieldType>

I have the set catenate options to 1 for all types.
I can understand if ',' getting ignored when it is on its own (title:(,
10)) but
- Why solr is not searching for 10 in that case just like it did when the
query was (title:(,10))?

- And why other filter queries did not show up (collection:assets) indebug

section?


Thanks,
Sandeep


On 16 May 2013 13:57, Jack Krupansky <j...@basetechnology.com> wrote:

 You haven't indicated any problem here! What is the symptom that you

actually think is a problem.

There is no comma operator in any of the Solr query parsers. Comma isjust

another character that may or may not be included or discarded depending
on
the specific field type and analyzer. For example, a white space analyzer
will keep commas, but the standard analyzer or the word delimiter filter
will discard them. If "title" were a "string" type, all punctuation would
be preserved, including commas and spaces (but spaces would need to be
escaped or the term text enclosed in parentheses.)

Let us know what your symptom is though, first.

I mean, the filter query looks perfectly reasonable from an abstract
perspective.

-- Jack Krupansky

-----Original Message----- From: Sandeep Mestry
Sent: Thursday, May 16, 2013 6:51 AM
To: solr-user@lucene.apache.org
Subject: Question about Edismax - Solr 4.0

-- *Edismax and Filter Queries with Commas and spaces* --


Dear Experts,

This appears to be a bug, please suggest if I'm wrong.

If I search with the following filter query,

1) fq=title:(, 10)

- I get no results.
- The debug output does NOT show the section containing
parsed_filter_queries

if I carry a search with the filter query,

2) fq=title:(,10) - (No space between , and 10)

- I get results and the debug output shows the parsed filter queries
section as,
<arr name="filter_queries">
<str>(titles:(,10))</str>
<str>(collection:assets)</str>

As you can see above, I'm also passing in other filter queries
(collection:assets) which appear correctly but they do not appear in case
1
above.

I can't make this as part of the query parameter as that needs to be
searched against multiple fields.

Can someone suggest a fix in this case please. I'm using Solr 4.0.

Many Thanks,
Sandeep

Re: Question about Edismax - Solr 4.0

Reply via email to