Exact name match should get boosted in the entity hub SolrYard indices
----------------------------------------------------------------------
Key: STANBOL-246
URL: https://issues.apache.org/jira/browse/STANBOL-246
Project: Stanbol
Issue Type: Bug
Reporter: Olivier Grisel
Assignee: Rupert Westenthaler
For instance, using the default embedded solryard index:
curl -X POST -d "name=United States&limit=10&offset=0"
http://localhost:8080/entityhub/site/dbpedia/find
The first results are "United States Navy" and "United States Air Force" and
finally "United States" comes in the third position. See the attached JSON
output.
Exact name match (or close to exact matches) should get a score boost. This can
probably be implemented with FuzzyQuery and minSimilarity of 0.8f for instance.
https://lucene.apache.org/java/3_3_0/api/all/org/apache/lucene/search/FuzzyQuery.html
Maybe in this case the popularity boost are bad because of the naive incoming
links. Using a Page Rank style centrality score might work better in this case:
https://github.com/julienledem/Pig-scripting-examples/tree/master/Page%20Rank
https://github.com/mesos/spark/blob/master/bagel/src/main/scala/spark/bagel/examples/WikipediaPageRank.scala
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira