Rupert Westenthaler created STANBOL-654:
-------------------------------------------
Summary: The SolrYard does not correcly enclose multi word query
terms in quotes
Key: STANBOL-654
URL: https://issues.apache.org/jira/browse/STANBOL-654
Project: Stanbol
Issue Type: Bug
Components: Entity Hub
Reporter: Rupert Westenthaler
Assignee: Rupert Westenthaler
Priority: Critical
STANBOL-607 introduced that natural language constraints containing of multiple
words are encoded using "Frankfurt am Main" instead of (Frankfurt AND am AND
Main).
However the implementation does not correctly put "quotes" around multi word
tokens
Because of that a query for the rdfs:label "Frankfurt am Main" is encoded as
(_\!@/rdfs\:label/:Frankfurt am Main)
instead of
(_\!@/rdfs\:label/:"Frankfurt am Main")
resulting in Solr to search for
* "Frankfurt" in the values of rdfs:label OR
* "am" in the full text field OR
* "Main" in the full text field
instead of "Frankfurt am Main" in the values of rdfs:label.
Sadly all unit test passes because for the used DBpedia test data Solr ranking
"ensures" that the wrongly encoded query has the same result as a correctly
encoded one.
However on bigger data sets with more data in the full text field this really
has a big impact on query results.
NOTE: the release 0.9.0-incubating version is NOT affected by this as this was
only introduced in the trunk while working on 0.10.0!
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira