Exact name match should get boosted in the entity hub SolrYard indices
----------------------------------------------------------------------

                 Key: STANBOL-246
                 URL: https://issues.apache.org/jira/browse/STANBOL-246
             Project: Stanbol
          Issue Type: Bug
            Reporter: Olivier Grisel
            Assignee: Rupert Westenthaler


For instance, using the default embedded solryard index:

 curl -X POST -d "name=United States&limit=10&offset=0" 
http://localhost:8080/entityhub/site/dbpedia/find

The first results are "United States Navy" and "United States Air Force" and 
finally "United States" comes in the third position. See the attached JSON 
output.

Exact name match (or close to exact matches) should get a score boost. This can 
probably be implemented with FuzzyQuery and minSimilarity of 0.8f for instance.

https://lucene.apache.org/java/3_3_0/api/all/org/apache/lucene/search/FuzzyQuery.html

Maybe in this case the popularity boost are bad because of the naive incoming 
links. Using a Page Rank style centrality score might work better in this case:

https://github.com/julienledem/Pig-scripting-examples/tree/master/Page%20Rank
https://github.com/mesos/spark/blob/master/bagel/src/main/scala/spark/bagel/examples/WikipediaPageRank.scala



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to