Rupert Westenthaler created STANBOL-686:
-------------------------------------------

             Summary: Make the "Minimum Token Match Factor" configurable for 
the KeywordLinkingEngine
                 Key: STANBOL-686
                 URL: https://issues.apache.org/jira/browse/STANBOL-686
             Project: Stanbol
          Issue Type: Improvement
          Components: Engine - KeywordExtraction
            Reporter: Rupert Westenthaler
            Priority: Minor


If a Token of the text is compared with a Token in the Label of an Entity the 
similarity of those is expressed in the range [0..1]. This factor specifies the 
minimum similarity of two Tokens so that they are considered to match. Lower 
values will allow more Tokens to match (e.g inflected forms of words) but may 
also result in false positives. Regardless of the configured value the 
similarity will influence the confidence of suggestions.

BTW: currently the similarity match is calculated by dividing the 
longest-matching-section of two tokens with the length of the longer of the two 
tokens.

e.g. Austrian <-> Austria

match: Austria -> 6
max length: 7
similarity: 6/7=0.857


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to