[
https://issues.apache.org/jira/browse/STANBOL-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rupert Westenthaler resolved STANBOL-685.
-----------------------------------------
Resolution: Fixed
fixed with revision 1360296
> Improve POS tag handling of the KeywordLinkingEngine
> ----------------------------------------------------
>
> Key: STANBOL-685
> URL: https://issues.apache.org/jira/browse/STANBOL-685
> Project: Stanbol
> Issue Type: Improvement
> Components: Engine - KeywordExtraction
> Reporter: Rupert Westenthaler
> Assignee: Rupert Westenthaler
> Priority: Minor
>
> The KeywordLinkingEngine can make use of POS tags to decide of a Token (word)
> needs to be processed or can be skipped. If no POS tags are available or the
> POS tag probability is to low (currently the default is 0.8) than the minimum
> token length (default is 3) is used as fall-back.
> Analyzing POS tag results have shown that often tags with non noun tags where
> below the 0.8 limit. For those the fall-back was used and in most cases this
> resulted in the KeywordLinkingEngine in processing those tokens.
> However it can also be observed that while some of those POS tags where not
> correct usually non correct tags where only between tags where both where
> non-noun tags. Because of that it can improve results and processing time to
> decrease the minimum probability for accepting an non noun POS tag.
> Because of that the algorithm will be adjusted like follows:
> Introduce two Tag Probabilities:
> 1. "minPosTypeProb" for Accepting POS tags that represent Nouns and
> 2. "minPosTypeProb/2" for rejecting POS tags that are not nouns
> Assuming that the <code>minPosTypePropb=0.667</code> a<ul>
> * noun with the prop 0.8 would result in returning <code>true</code>
> * noun with prop 0.5 would return <code>null</code>
> * verb with prop 0.4 would return <code>false</code>
> * verb with prop 0.3 would return <code>null</code>
> NOTES: <code>null</code> indicates that no POS tag is available or the POS
> tag has a low propability
> This changes will be need to be applied to the
> "OpenNlpAnalysedContentFactory#processPOS(..)" and the
> "EntityLinker#isProcessableToken(..)" methods
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira