[ 
https://issues.apache.org/jira/browse/STANBOL-685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rupert Westenthaler resolved STANBOL-685.
-----------------------------------------

    Resolution: Fixed

fixed with revision 1360296
                
> Improve POS tag handling of the KeywordLinkingEngine
> ----------------------------------------------------
>
>                 Key: STANBOL-685
>                 URL: https://issues.apache.org/jira/browse/STANBOL-685
>             Project: Stanbol
>          Issue Type: Improvement
>          Components: Engine - KeywordExtraction
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>            Priority: Minor
>
> The KeywordLinkingEngine can make use of POS tags to decide of a Token (word) 
> needs to be processed or can be skipped. If no POS tags are available or the 
> POS tag probability is to low (currently the default is 0.8) than the minimum 
> token length (default is 3) is used as fall-back.
> Analyzing POS tag results have shown that often tags with non noun tags where 
> below the 0.8 limit. For those the fall-back was used and in most cases this 
> resulted in the KeywordLinkingEngine in processing those tokens.
> However it can also be observed that while some of those POS tags where not 
> correct usually non correct tags where only between tags where both where 
> non-noun tags. Because of that it can improve results and processing time to 
> decrease the minimum probability for accepting an non noun POS tag.
> Because of that the algorithm will be adjusted like follows:
> Introduce two Tag Probabilities:
> 1. "minPosTypeProb" for Accepting POS tags that represent Nouns and
> 2. "minPosTypeProb/2" for rejecting POS tags that are not nouns
> Assuming that the <code>minPosTypePropb=0.667</code> a<ul>
>  * noun with the prop 0.8 would result in returning <code>true</code>
>  * noun with prop 0.5 would return <code>null</code>
>  * verb with prop 0.4 would return <code>false</code>
>  * verb with prop 0.3 would return <code>null</code>
> NOTES: <code>null</code> indicates that no POS tag is available or the POS 
> tag has a low propability
> This changes will be need to be applied to the 
> "OpenNlpAnalysedContentFactory#processPOS(..)" and the 
> "EntityLinker#isProcessableToken(..)" methods

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to