Improve extraction of Keywords (alpha numeric IDs, URNs ...) with the
KeywordLinkingEngine
------------------------------------------------------------------------------------------
Key: STANBOL-538
URL: https://issues.apache.org/jira/browse/STANBOL-538
Project: Stanbol
Issue Type: Improvement
Reporter: Rupert Westenthaler
Assignee: Rupert Westenthaler
Currently the KeywordEngine can not be used to match against alpha numeric IDs
as often used for products. This is because the Tokenizers used by OpenNLP tend
to split such IDs in several small tokens what prevents a correct mapping
against such kind of IDs.
The simplest solution is to implement a simple Tokenizer that is optimized for
the use to extract Keywords. Such an Tokenizer should only split based on white
spaces.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira