Improve extraction of Keywords (alpha numeric IDs, URNs ...) with the 
KeywordLinkingEngine
------------------------------------------------------------------------------------------

                 Key: STANBOL-538
                 URL: https://issues.apache.org/jira/browse/STANBOL-538
             Project: Stanbol
          Issue Type: Improvement
            Reporter: Rupert Westenthaler
            Assignee: Rupert Westenthaler


Currently the KeywordEngine can not be used to match against alpha numeric IDs 
as often used for products. This is because the Tokenizers used by OpenNLP tend 
to split such IDs in several small tokens what prevents a correct mapping 
against such kind of IDs.

The simplest solution is to implement a simple Tokenizer that is optimized for 
the use to extract Keywords. Such an Tokenizer should only split based on white 
spaces.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to