Rupert Westenthaler created STANBOL-654:
-------------------------------------------

             Summary: The SolrYard does not correcly enclose multi word query 
terms in quotes
                 Key: STANBOL-654
                 URL: https://issues.apache.org/jira/browse/STANBOL-654
             Project: Stanbol
          Issue Type: Bug
          Components: Entity Hub
            Reporter: Rupert Westenthaler
            Assignee: Rupert Westenthaler
            Priority: Critical


STANBOL-607 introduced that natural language constraints containing of multiple 
words are encoded using "Frankfurt am Main" instead of (Frankfurt AND am AND 
Main). 

However the implementation does not correctly put "quotes" around multi word 
tokens

Because of that a query for the rdfs:label "Frankfurt am Main" is encoded as

    (_\!@/rdfs\:label/:Frankfurt am Main) 

instead of 

    (_\!@/rdfs\:label/:"Frankfurt am Main") 

resulting in Solr to search for

* "Frankfurt" in the values of rdfs:label OR
* "am" in the full text field OR
* "Main" in the full text field

instead of "Frankfurt am Main" in the values of rdfs:label.

Sadly all unit test passes because for the used DBpedia test data Solr ranking 
"ensures" that the wrongly encoded query has the same result as a correctly 
encoded one. 

However on bigger data sets with more data in the full text field this really 
has a big impact on query results.

NOTE: the release 0.9.0-incubating version is NOT affected by this as this was 
only introduced in the trunk while working on 0.10.0!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to