[
https://issues.apache.org/jira/browse/STANBOL-538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fabian Christ updated STANBOL-538:
----------------------------------
Fix Version/s: 0.9.0-incubating
> Improve extraction of Keywords (alpha numeric IDs, URNs ...) with the
> KeywordLinkingEngine
> ------------------------------------------------------------------------------------------
>
> Key: STANBOL-538
> URL: https://issues.apache.org/jira/browse/STANBOL-538
> Project: Stanbol
> Issue Type: Improvement
> Reporter: Rupert Westenthaler
> Assignee: Rupert Westenthaler
> Fix For: 0.9.0-incubating
>
>
> Currently the KeywordEngine can not be used to match against alpha numeric
> IDs as often used for products. This is because the Tokenizers used by
> OpenNLP tend to split such IDs in several small tokens what prevents a
> correct mapping against such kind of IDs.
> The simplest solution is to implement a simple Tokenizer that is optimized
> for the use to extract Keywords. Such an Tokenizer should only split based on
> white spaces.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira