[
https://issues.apache.org/jira/browse/STANBOL-246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13058580#comment-13058580
]
Rupert Westenthaler commented on STANBOL-246:
---------------------------------------------
BTW: I have a little bit experimented with filtering (minimum number of tokens
that match), following redirects ... in the new TaxonomyLinkingEngine and I
plan to try some more things over the weekend. I was not yet thinking about
boosting the ranking of exact matches but that looks like an other thing worth
to try. If we see that this is useful we can move this functionality to a
common place so that both engines can use it.
> Exact name match should get boosted in the entity hub SolrYard indices
> ----------------------------------------------------------------------
>
> Key: STANBOL-246
> URL: https://issues.apache.org/jira/browse/STANBOL-246
> Project: Stanbol
> Issue Type: Bug
> Reporter: Olivier Grisel
> Assignee: Rupert Westenthaler
> Attachments: united_states_dbpedia_solrindex.json
>
>
> For instance, using the default embedded solryard index:
> {code}
> curl -X POST -d "name=United States&limit=10&offset=0"
> http://localhost:8080/entityhub/site/dbpedia/find
> {code}
> The first results are "United States Navy" and "United States Air Force" and
> finally "United States" comes in the third position. See the attached JSON
> output.
> Exact name match (or close to exact matches) should get a score boost. This
> can probably be implemented with FuzzyQuery and minSimilarity of 0.8f for
> instance.
> https://lucene.apache.org/java/3_3_0/api/all/org/apache/lucene/search/FuzzyQuery.html
> Maybe in this case the popularity boost are bad because of the naive incoming
> links. Using a Page Rank style centrality score might work better in this
> case:
> https://github.com/julienledem/Pig-scripting-examples/tree/master/Page%20Rank
> https://github.com/mesos/spark/blob/master/bagel/src/main/scala/spark/bagel/examples/WikipediaPageRank.scala
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira