[
https://issues.apache.org/jira/browse/STANBOL-246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13059250#comment-13059250
]
Olivier Grisel commented on STANBOL-246:
----------------------------------------
Actually the natural Lucene way to implement this is probably to use a
PhraseQuery with some slope (so as to favor names with the right order) +
rerank results based on some distance metric: results with names with term that
where not part of the initial query should be downgraded somehow w.r.t results
with the matching term set in the label.
> Exact name match should get boosted in the entity hub SolrYard indices
> ----------------------------------------------------------------------
>
> Key: STANBOL-246
> URL: https://issues.apache.org/jira/browse/STANBOL-246
> Project: Stanbol
> Issue Type: Bug
> Reporter: Olivier Grisel
> Assignee: Rupert Westenthaler
> Attachments: united_states_dbpedia_solrindex.json
>
>
> For instance, using the default embedded solryard index:
> {code}
> curl -X POST -d "name=United States&limit=10&offset=0"
> http://localhost:8080/entityhub/site/dbpedia/find
> {code}
> The first results are "United States Navy" and "United States Air Force" and
> finally "United States" comes in the third position. See the attached JSON
> output.
> Exact name match (or close to exact matches) should get a score boost. This
> can probably be implemented with FuzzyQuery and minSimilarity of 0.8f for
> instance.
> https://lucene.apache.org/java/3_3_0/api/all/org/apache/lucene/search/FuzzyQuery.html
> Maybe in this case the popularity boost are bad because of the naive incoming
> links. Using a Page Rank style centrality score might work better in this
> case:
> https://github.com/julienledem/Pig-scripting-examples/tree/master/Page%20Rank
> https://github.com/mesos/spark/blob/master/bagel/src/main/scala/spark/bagel/examples/WikipediaPageRank.scala
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira