You have your index and query time analysis chains defined much
differently. Omitting the WordDelimiterFilterFactory from the
query-time analysis chain will lead to endless problems.

With the definition you have, here are the terms in the index and
their term positions as  below. This is available from the
admin/analysis page if you click the "verbose" checkbox, although I
admit it's kind of hard to read:
1         2                       3            4
fatty  acid-binding     binding    protein
         acid

But at query time, this is how they're being analyzed
1             2                   3
fatty    acid-binding    protein

So searching for "fatty acid-binding protein" requires that the tokens
"fatty" "acid-binding" and "protein" appear in term positions 1, 2, 3
rather  than where they actually are (1, 2, 4). Searching for "fatty
acid-binding protein"~1 would actually find this, the "~1" means allow
one gap in there.

HOWEVER, that's the least of your problems. WordDelimiterFilterFactory
will _also_ "split on intra-word delimiters (all non alpha-numeric
characters)". While that doesn't really say so explicitly, that will
have the effect of removing puncutation. So searching for "fatty
acid-binding protein."~1 (note the period) will fail since the token
will include the period.

I'd _really_ advise you to use the stock WordDelimiterFilterFactory
settings in both analysis and query times included in the stock Solr
release for, say, text_en_splitting or even a single analyzer like
text_en_splitting_tight.

Best,
Erick

On Mon, Sep 22, 2014 at 6:33 AM, aaguilar <antelmo.aguilar...@nd.edu> wrote:
> Hello Erick.
>
> Below is the information you requested.   Thanks for your help!
>
> <fieldType name="text_ws_finer" class="solr.TextField" positionIncrementGap=
> "100"> <analyzer type="index"> <tokenizer class=
> "solr.WhitespaceTokenizerFactory"/> <filter class=
> "solr.WordDelimiterFilterFactory" splitOnNumerics="0" splitOnCaseChange="0"
> generateWordParts="1" generateNumberParts="0" catenateWords="0"
> catenateNumbers="0" catenateAll="0" preserveOriginal="1"/> <filter class=
> "solr.StopFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </
> analyzer> <analyzer type="query"> <tokenizer class=
> "solr.WhitespaceTokenizerFactory"/> <filter class=
> "solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>
>
>
> <field name="description" type="text_ws_finer" indexed="true" stored="true"
> />
>
> On Fri, Sep 19, 2014 at 7:36 PM, Erick Erickson [via Lucene] <
> ml-node+s472066n4160122...@n3.nabble.com> wrote:
>
>> Hmmm, I'd have to see the schema definition for your description
>> field. For this, the admin/analysis page is very helpful. Here's my
>> guess:
>>
>> Your analysis chain doesn't break the incoming tokens up quite like
>> you think it is. Thus you have the tokens in your index like
>> 'protein,' (notice the comma) and 'protein-like' rather than just
>> 'protein'. However, I can't quite reconcile this with your statement:
>> "Another weird thing is that if I used description:"fatty
>> acid-binding" AND description:"protein"
>>
>> so I'm at something of a loss. If you paste in your schema definition
>> for the 'description' field _and_ the corresponding <fieldType>
>> definition I can give it a quick whirl.
>>
>> Best,
>> Erick
>>
>> On Fri, Sep 19, 2014 at 11:53 AM, aaguilar <[hidden email]
>> <http://user/SendEmail.jtp?type=node&node=4160122&i=0>> wrote:
>>
>> > Hello Erick,
>> >
>> > Thanks for the response.  I tried adding the debug=True to the query,
>> but I
>> > do not know exactly what I am looking for in the output.  Would it be
>> > possible for you to look at the results?  I would really appreciate it.
>> I
>> > attached two files, one of them is with the filter query
>> description:"fatty
>> > acid-binding" and the other is with the filter query description:"fatty
>> > acid-binding protein".  If you see the file that has the results for
>> > description:"fatty acid-binding" , you can see that the hits do have
>> "fatty
>> > acid-binding protein" and nothing in between.  I really appreciate any
>> help
>> > you can provide.
>> >
>> > Thanks you
>> >
>> > On Fri, Sep 19, 2014 at 2:03 PM, Erick Erickson [via Lucene] <
>> > [hidden email] <http://user/SendEmail.jtp?type=node&node=4160122&i=1>>
>> wrote:
>> >
>> >> Your very best friend here is attaching &debug=query to the URL and
>> >> looking at the parsed query results. Upon occasion there's some
>> >>
>> >> One possible explanation is that description field has something like
>> >> "fatty acid-binding some words protein" in which case your query
>> >> "fatty acid-binding protein" would fail, but "fatty acid-binding
>> >> protein"~4 would succeed.
>> >>
>> >> The other possibility is that your query parsing isn't quite doing
>> >> what you think, but adding &debug=query should help there.
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >> On Fri, Sep 19, 2014 at 8:10 AM, aaguilar <[hidden email]
>> >> <http://user/SendEmail.jtp?type=node&node=4160036&i=0>> wrote:
>> >>
>> >> > Hello All,
>> >> >
>> >> > I recently came across a problem when I tried using
>> description:"fatty
>> >> > acid-binding protein" as a filter query when doing a query through
>> the
>> >> query
>> >> > interface for Solr in the Tomcat server.  Using that filter query did
>> >> not
>> >> > give me any results at all, however if I used description:"fatty
>> >> > acid-binding" as the filter query, it would give me the results I
>> >> wanted.
>> >> >
>> >> > The thing is that some of the results I got back from Solr, did have
>> the
>> >> > words "fatty acid-binding protein" in the description field.  So I
>> >> really do
>> >> > not know what might be causing the issue of Solr not being able to
>> find
>> >> > those hits.
>> >> >
>> >> > Another weird thing is that if I used description:"fatty
>> acid-binding"
>> >> AND
>> >> > description:"protein" as the filter query when doing a query, it gave
>> me
>> >> the
>> >> > results I anticipated (with some extra results that did not have the
>> >> exact
>> >> > phrase "fatty acid-binding protein").  Does anyone have an idea as to
>> >> what
>> >> > might be happening?  Just in case this is helpful, the version of
>> Solr
>> >> we
>> >> > are using is 4.0.0.2012.10.06.03.04.33.  I appreciate any help anyone
>> >> can
>> >> > provide.
>> >> >
>> >> > Thanks!
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > View this message in context:
>> >>
>> http://lucene.472066.n3.nabble.com/Issue-Adding-Filter-Query-tp4159990.html
>> >> > Sent from the Solr - User mailing list archive at Nabble.com.
>> >>
>> >>
>> >> ------------------------------
>> >>  If you reply to this email, your message will be added to the
>> discussion
>> >> below:
>> >>
>> >>
>> http://lucene.472066.n3.nabble.com/Issue-Adding-Filter-Query-tp4159990p4160036.html
>> >>  To unsubscribe from Issue Adding Filter Query, click here
>> >> <
>> >> .
>> >> NAML
>> >> <
>> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>> >>
>> >
>> >
>> > fatty_acid-binding_protein.xml (1K) <
>> http://lucene.472066.n3.nabble.com/attachment/4160048/0/fatty_acid-binding_protein.xml>
>>
>> > fatty_acid-binding.xml (63K) <
>> http://lucene.472066.n3.nabble.com/attachment/4160048/1/fatty_acid-binding.xml>
>>
>> >
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> http://lucene.472066.n3.nabble.com/Issue-Adding-Filter-Query-tp4159990p4160048.html
>> > Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
>> ------------------------------
>>  If you reply to this email, your message will be added to the discussion
>> below:
>>
>> http://lucene.472066.n3.nabble.com/Issue-Adding-Filter-Query-tp4159990p4160122.html
>>  To unsubscribe from Issue Adding Filter Query, click here
>> <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4159990&code=QW50ZWxtby5BZ3VpbGFyLjE3QG5kLmVkdXw0MTU5OTkwfC0xMDkyNTg2ODY3>
>> .
>> NAML
>> <http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Issue-Adding-Filter-Query-tp4159990p4160423.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to