[ 
https://issues.apache.org/jira/browse/STANBOL-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459333#comment-13459333
 ] 

Rupert Westenthaler commented on STANBOL-733:
---------------------------------------------

Status update:

The patch provided by Sebastian Schaffert was applied with revision 1387488 
[1]. I also added the data files used by the contributed Engines to 
{stanbol}/data. The German noun phrase chunker was added to the 
o.a.s.data.opennlp.lang.de module. For the sentiment related data files new 
modules and a sentiment bundlelist was created. I also added a special Laucher 
(nlp-launcher) intended to be used for testing developments in the 
nlp-processing branch.

In a second commit [2] I slightly changed the default configuration of the 
Engines so that they can use ConfigurationPolicy.OPTIONAL - meaning that an 
instance of those Engines is active by default. Also a "nlp-processing" chain 
configuration was added to the default launcher.

The nlp-processing branch is now in a state that early adopters might start to 
test it. I will continue to work on the adaption of the CELI Lemmatizer Engine 
(STANBOL-739) and the usage of the nlp-processing results by the 
KeywordLinkingEngine (STANBOL-740)


[1] http://svn.apache.org/viewvc?rev=1387488&view=rev
[2] http://svn.apache.org/viewvc?rev=1387596&view=rev


                
> Stanbol NLP processing
> ----------------------
>
>                 Key: STANBOL-733
>                 URL: https://issues.apache.org/jira/browse/STANBOL-733
>             Project: Stanbol
>          Issue Type: New Feature
>          Components: Enhancer
>            Reporter: Rupert Westenthaler
>         Attachments: srfgkmt-stanbol-nlp.zip
>
>
> This issue covers the NLP processing components as discussed in 
> http://markmail.org/message/qxusiup3mim2lhpx
> Goals
> =====
> 1. provide a modular infrastructure for NLP-related things
> Many tasks in NLP can be computationally intensive, and there is no "one fits
> all" NLP approach when analysing text. Therefore, we wanted to have a NLP
> infrastructure that can be configured and wired together as needed for the
> specific use case, with several specialised modules that can build upon each
> other but many of which are optional. 
> 2. provide a unified data model for representing NLP text annotations
> In many szenarios, it will be necessary to implement custom engines building 
> on
> the results of a previous "generic" analysis of the text (e.g. POS tagging and
> chunking). For example, in a project we are identifying so-called "noun
> phrases", use a lemmatizer to build the ground form, then convert this to
> singular nominative form to have a gramatically correct label to use in a tag
> cloud. Most of this builds on generic NLP functionality, but the last step is
> very specific to the use case.
> Therefore, we wanted also to implement a generic NLP data model that allows
> representing text annotations attached to individual words or also to spans of
> words.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to