[ 
https://issues.apache.org/jira/browse/STANBOL-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fabian Christ updated STANBOL-538:
----------------------------------

    Fix Version/s: 0.9.0-incubating
    
> Improve extraction of Keywords (alpha numeric IDs, URNs ...) with the 
> KeywordLinkingEngine
> ------------------------------------------------------------------------------------------
>
>                 Key: STANBOL-538
>                 URL: https://issues.apache.org/jira/browse/STANBOL-538
>             Project: Stanbol
>          Issue Type: Improvement
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>             Fix For: 0.9.0-incubating
>
>
> Currently the KeywordEngine can not be used to match against alpha numeric 
> IDs as often used for products. This is because the Tokenizers used by 
> OpenNLP tend to split such IDs in several small tokens what prevents a 
> correct mapping against such kind of IDs.
> The simplest solution is to implement a simple Tokenizer that is optimized 
> for the use to extract Keywords. Such an Tokenizer should only split based on 
> white spaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to