[
https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Muir updated SOLR-1852:
------------------------------
Attachment: SOLR-1852_testcase.patch
attached is a testcase demonstrating the bug.
The problem is that if you have, for example "the lucene.solr", where "the" is
a stopword, the Solr 1.4 WordDelimiter bumps the position increment of *both*
"lucene" and "solr" tokens:
* lucene (posInc=2)
* solr (posInc=2)
* lucenesolr (posInc=0)
Instead it should look like:
* lucene (posInc=2)
* solr (posInc=1)
* lucenesolr (posInc=0)
In my opinion the behavior of trunk is correct, and this is a bug.
But I don't know how to fix just Solr 1.4's WDF in a better way than dropping
in the entire rewritten WDF...
> enablePositionIncrements="true" can cause searches to fail when they are
> parsed as phrase queries
> -------------------------------------------------------------------------------------------------
>
> Key: SOLR-1852
> URL: https://issues.apache.org/jira/browse/SOLR-1852
> Project: Solr
> Issue Type: Bug
> Affects Versions: 1.4
> Reporter: Peter Wolanin
> Attachments: SOLR-1852.patch, SOLR-1852_testcase.patch
>
>
> Symptom: searching for a string like a domain name containing a '.', the Solr
> 1.4 analyzer tells me that I will get a match, but when I enter the search
> either in the client or directly in Solr, the search fails.
> test string: Identi.ca
> queries that fail: IdentiCa, Identi.ca, Identi-ca
> query that matches: Identi ca
> schema in use is:
> http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34&content-type=text%2Fplain&view=co&pathrev=DRUPAL-6--1
> Screen shots:
> analysis: http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png
> dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png
> dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png
> standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png
> Whether or not the bug appears is determined by the surrounding text:
> "would be great to have support for Identi.ca on the follow block"
> fails to match "Identi.ca", but putting the content on its own or in another
> sentence:
> "Support Identi.ca"
> the search matches. Testing suggests the word "for" is the problem, and it
> looks like the bug occurs when a stop word preceeds a word that is split up
> using the word delimiter filter.
> Setting enablePositionIncrements="false" in the stop filter and reindexing
> causes the searches to match.
> According to Mark Miller in #solr, this bug appears to be fixed already in
> Solr trunk, either due to the upgraded lucene or changes to the
> WordDelimiterFactory
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.