Glad your problem isn't one any longer. Yeah, there are a lot of nooks and crannies that one gets used to with Solr!
I'd estimate that between learning how to read the debug output and the analysis page 80-90% of the "my search isn't working" questions on the list can be answered, but it takes a while to get comfortable with those tools (and to even know they exist!)... Best Erick On Wed, Sep 24, 2014 at 6:57 AM, aaguilar <antelmo.aguilar...@nd.edu> wrote: > Hello Erick, > > Just wanted to let you know that I did the change you suggested and > everything works as expected. Also, thanks for letting me know about the > Analysis page in solr. I did not know about it and I have found it very > useful. > > Thanks! > > On Mon, Sep 22, 2014 at 5:41 PM, Antelmo Aguilar < > antelmo.aguilar...@nd.edu> > wrote: > > > Hello Erick, > > > > Thank you so much for your help. That makes perfect sense. I will do > the > > changes you suggest and let you know how it goes. > > > > Thanks! > > > > On Mon, Sep 22, 2014 at 4:12 PM, Erick Erickson [via Lucene] < > > ml-node+s472066n4160547...@n3.nabble.com> wrote: > > > >> You have your index and query time analysis chains defined much > >> differently. Omitting the WordDelimiterFilterFactory from the > >> query-time analysis chain will lead to endless problems. > >> > >> With the definition you have, here are the terms in the index and > >> their term positions as below. This is available from the > >> admin/analysis page if you click the "verbose" checkbox, although I > >> admit it's kind of hard to read: > >> 1 2 3 4 > >> fatty acid-binding binding protein > >> acid > >> > >> But at query time, this is how they're being analyzed > >> 1 2 3 > >> fatty acid-binding protein > >> > >> So searching for "fatty acid-binding protein" requires that the tokens > >> "fatty" "acid-binding" and "protein" appear in term positions 1, 2, 3 > >> rather than where they actually are (1, 2, 4). Searching for "fatty > >> acid-binding protein"~1 would actually find this, the "~1" means allow > >> one gap in there. > >> > >> HOWEVER, that's the least of your problems. WordDelimiterFilterFactory > >> will _also_ "split on intra-word delimiters (all non alpha-numeric > >> characters)". While that doesn't really say so explicitly, that will > >> have the effect of removing puncutation. So searching for "fatty > >> acid-binding protein."~1 (note the period) will fail since the token > >> will include the period. > >> > >> I'd _really_ advise you to use the stock WordDelimiterFilterFactory > >> settings in both analysis and query times included in the stock Solr > >> release for, say, text_en_splitting or even a single analyzer like > >> text_en_splitting_tight. > >> > >> Best, > >> Erick > >> > >> On Mon, Sep 22, 2014 at 6:33 AM, aaguilar <[hidden email] > >> <http://user/SendEmail.jtp?type=node&node=4160547&i=0>> wrote: > >> > >> > Hello Erick. > >> > > >> > Below is the information you requested. Thanks for your help! > >> > > >> > <fieldType name="text_ws_finer" class="solr.TextField" > >> positionIncrementGap= > >> > "100"> <analyzer type="index"> <tokenizer class= > >> > "solr.WhitespaceTokenizerFactory"/> <filter class= > >> > "solr.WordDelimiterFilterFactory" splitOnNumerics="0" > >> splitOnCaseChange="0" > >> > generateWordParts="1" generateNumberParts="0" catenateWords="0" > >> > catenateNumbers="0" catenateAll="0" preserveOriginal="1"/> <filter > >> class= > >> > "solr.StopFilterFactory"/> <filter > >> class="solr.LowerCaseFilterFactory"/> </ > >> > analyzer> <analyzer type="query"> <tokenizer class= > >> > "solr.WhitespaceTokenizerFactory"/> <filter class= > >> > "solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> > >> > > >> > > >> > <field name="description" type="text_ws_finer" indexed="true" > >> stored="true" > >> > /> > >> > > >> > On Fri, Sep 19, 2014 at 7:36 PM, Erick Erickson [via Lucene] < > >> > [hidden email] <http://user/SendEmail.jtp?type=node&node=4160547&i=1 > >> > >> wrote: > >> > > >> >> Hmmm, I'd have to see the schema definition for your description > >> >> field. For this, the admin/analysis page is very helpful. Here's my > >> >> guess: > >> >> > >> >> Your analysis chain doesn't break the incoming tokens up quite like > >> >> you think it is. Thus you have the tokens in your index like > >> >> 'protein,' (notice the comma) and 'protein-like' rather than just > >> >> 'protein'. However, I can't quite reconcile this with your statement: > >> >> "Another weird thing is that if I used description:"fatty > >> >> acid-binding" AND description:"protein" > >> >> > >> >> so I'm at something of a loss. If you paste in your schema definition > >> >> for the 'description' field _and_ the corresponding <fieldType> > >> >> definition I can give it a quick whirl. > >> >> > >> >> Best, > >> >> Erick > >> >> > >> >> On Fri, Sep 19, 2014 at 11:53 AM, aaguilar <[hidden email] > >> >> <http://user/SendEmail.jtp?type=node&node=4160122&i=0>> wrote: > >> >> > >> >> > Hello Erick, > >> >> > > >> >> > Thanks for the response. I tried adding the debug=True to the > >> query, > >> >> but I > >> >> > do not know exactly what I am looking for in the output. Would it > >> be > >> >> > possible for you to look at the results? I would really appreciate > >> it. > >> >> I > >> >> > attached two files, one of them is with the filter query > >> >> description:"fatty > >> >> > acid-binding" and the other is with the filter query > >> description:"fatty > >> >> > acid-binding protein". If you see the file that has the results > for > >> >> > description:"fatty acid-binding" , you can see that the hits do > have > >> >> "fatty > >> >> > acid-binding protein" and nothing in between. I really appreciate > >> any > >> >> help > >> >> > you can provide. > >> >> > > >> >> > Thanks you > >> >> > > >> >> > On Fri, Sep 19, 2014 at 2:03 PM, Erick Erickson [via Lucene] < > >> >> > [hidden email] < > http://user/SendEmail.jtp?type=node&node=4160122&i=1>> > >> > >> >> wrote: > >> >> > > >> >> >> Your very best friend here is attaching &debug=query to the URL > and > >> >> >> looking at the parsed query results. Upon occasion there's some > >> >> >> > >> >> >> One possible explanation is that description field has something > >> like > >> >> >> "fatty acid-binding some words protein" in which case your query > >> >> >> "fatty acid-binding protein" would fail, but "fatty acid-binding > >> >> >> protein"~4 would succeed. > >> >> >> > >> >> >> The other possibility is that your query parsing isn't quite doing > >> >> >> what you think, but adding &debug=query should help there. > >> >> >> > >> >> >> Best, > >> >> >> Erick > >> >> >> > >> >> >> On Fri, Sep 19, 2014 at 8:10 AM, aaguilar <[hidden email] > >> >> >> <http://user/SendEmail.jtp?type=node&node=4160036&i=0>> wrote: > >> >> >> > >> >> >> > Hello All, > >> >> >> > > >> >> >> > I recently came across a problem when I tried using > >> >> description:"fatty > >> >> >> > acid-binding protein" as a filter query when doing a query > >> through > >> >> the > >> >> >> query > >> >> >> > interface for Solr in the Tomcat server. Using that filter > query > >> did > >> >> >> not > >> >> >> > give me any results at all, however if I used description:"fatty > >> >> >> > acid-binding" as the filter query, it would give me the results > I > >> >> >> wanted. > >> >> >> > > >> >> >> > The thing is that some of the results I got back from Solr, did > >> have > >> >> the > >> >> >> > words "fatty acid-binding protein" in the description field. So > >> I > >> >> >> really do > >> >> >> > not know what might be causing the issue of Solr not being able > >> to > >> >> find > >> >> >> > those hits. > >> >> >> > > >> >> >> > Another weird thing is that if I used description:"fatty > >> >> acid-binding" > >> >> >> AND > >> >> >> > description:"protein" as the filter query when doing a query, it > >> gave > >> >> me > >> >> >> the > >> >> >> > results I anticipated (with some extra results that did not have > >> the > >> >> >> exact > >> >> >> > phrase "fatty acid-binding protein"). Does anyone have an idea > >> as to > >> >> >> what > >> >> >> > might be happening? Just in case this is helpful, the version > of > >> >> Solr > >> >> >> we > >> >> >> > are using is 4.0.0.2012.10.06.03.04.33. I appreciate any help > >> anyone > >> >> >> can > >> >> >> > provide. > >> >> >> > > >> >> >> > Thanks! > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > -- > >> >> >> > View this message in context: > >> >> >> > >> >> > >> > http://lucene.472066.n3.nabble.com/Issue-Adding-Filter-Query-tp4159990.html > >> >> >> > Sent from the Solr - User mailing list archive at Nabble.com. > >> >> >> > >> >> >> > >> >> >> ------------------------------ > >> >> >> If you reply to this email, your message will be added to the > >> >> discussion > >> >> >> below: > >> >> >> > >> >> >> > >> >> > >> > http://lucene.472066.n3.nabble.com/Issue-Adding-Filter-Query-tp4159990p4160036.html > >> >> >> To unsubscribe from Issue Adding Filter Query, click here > >> >> >> < > >> >> >> . > >> >> >> NAML > >> >> >> < > >> >> > >> > http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml > > > >> > >> >> > >> >> >> > >> >> > > >> >> > > >> >> > fatty_acid-binding_protein.xml (1K) < > >> >> > >> > http://lucene.472066.n3.nabble.com/attachment/4160048/0/fatty_acid-binding_protein.xml > > > >> > >> >> > >> >> > fatty_acid-binding.xml (63K) < > >> >> > >> > http://lucene.472066.n3.nabble.com/attachment/4160048/1/fatty_acid-binding.xml > > > >> > >> >> > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > -- > >> >> > View this message in context: > >> >> > >> > http://lucene.472066.n3.nabble.com/Issue-Adding-Filter-Query-tp4159990p4160048.html > >> >> > Sent from the Solr - User mailing list archive at Nabble.com. > >> >> > >> >> > >> >> ------------------------------ > >> >> If you reply to this email, your message will be added to the > >> discussion > >> >> below: > >> >> > >> >> > >> > http://lucene.472066.n3.nabble.com/Issue-Adding-Filter-Query-tp4159990p4160122.html > >> >> To unsubscribe from Issue Adding Filter Query, click here > >> >> < > >> >> . > >> >> NAML > >> >> < > >> > http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml > > > >> > >> >> > >> > > >> > > >> > > >> > > >> > -- > >> > View this message in context: > >> > http://lucene.472066.n3.nabble.com/Issue-Adding-Filter-Query-tp4159990p4160423.html > >> > Sent from the Solr - User mailing list archive at Nabble.com. > >> > >> > >> ------------------------------ > >> If you reply to this email, your message will be added to the > >> discussion below: > >> > >> > http://lucene.472066.n3.nabble.com/Issue-Adding-Filter-Query-tp4159990p4160547.html > >> To unsubscribe from Issue Adding Filter Query, click here > >> < > http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=4159990&code=QW50ZWxtby5BZ3VpbGFyLjE3QG5kLmVkdXw0MTU5OTkwfC0xMDkyNTg2ODY3 > > > >> . > >> NAML > >> < > http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml > > > >> > > > > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Issue-Adding-Filter-Query-tp4159990p4160921.html > Sent from the Solr - User mailing list archive at Nabble.com. >