Rupert Westenthaler created STANBOL-686:
-------------------------------------------
Summary: Make the "Minimum Token Match Factor" configurable for
the KeywordLinkingEngine
Key: STANBOL-686
URL: https://issues.apache.org/jira/browse/STANBOL-686
Project: Stanbol
Issue Type: Improvement
Components: Engine - KeywordExtraction
Reporter: Rupert Westenthaler
Priority: Minor
If a Token of the text is compared with a Token in the Label of an Entity the
similarity of those is expressed in the range [0..1]. This factor specifies the
minimum similarity of two Tokens so that they are considered to match. Lower
values will allow more Tokens to match (e.g inflected forms of words) but may
also result in false positives. Regardless of the configured value the
similarity will influence the confidence of suggestions.
BTW: currently the similarity match is calculated by dividing the
longest-matching-section of two tokens with the length of the longer of the two
tokens.
e.g. Austrian <-> Austria
match: Austria -> 6
max length: 7
similarity: 6/7=0.857
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira