[jira] [Commented] (STANBOL-102) Make the NER enhancement engine able to use different models for different languages

Rupert Westenthaler (Commented) (JIRA) Thu, 05 Jan 2012 05:04:10 -0800

    [ 
https://issues.apache.org/jira/browse/STANBOL-102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13180353#comment-13180353
 ]


Rupert Westenthaler commented on STANBOL-102:
---------------------------------------------

I will implement this similar to the KeywordLinkingEngine.

Two Options:

* Default Language: If configured this is used as default if no language was 
detected for a text (e.g. if no language detection engine is active)
* Processed Languages: Allows to configure a list of languages that are 
processed by an engine instance. If empty or not present all languages are 
processed. This allows to create multiple instances of the NER engine (with 
different configurations) that do only process some specific languages.

In addition I will change this Entinge to use the ConfigurationFactory. This 
will allow multiple instances to be configured and include a default 
configuration with the default values for default language (none) and processed 
languages (any) within the stanbol launchers.

The base framework that allows to dynamically load OpenNLP NER models for 
different languages was already implemented in the meantime by the OpenNLP 
utility (part of  org.apache.stanbol.commons.opennlp module).
                
> Make the NER enhancement engine able to use different models for different 
> languages
> ------------------------------------------------------------------------------------
>
>                 Key: STANBOL-102
>                 URL: https://issues.apache.org/jira/browse/STANBOL-102
>             Project: Stanbol
>          Issue Type: Improvement
>            Reporter: Olivier Grisel
>            Assignee: Olivier Grisel
>
> Currently, the list of models is hardcoded: it uses 
> en-{person,location,organization}-ner.bin in a hardcoded way. The engine 
> should be adapted to be able to lookup other models (following the 
> {language-code}-{entity-class}-ner.bin filename pattern) according to the 
> language of the text. If no such model is found, the engine should refuse 
> compute enhancement instead of using the wrong model which will output 
> spurious annotations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (STANBOL-102) Make the NER enhancement engine able to use different models for different languages

Reply via email to