[jira] [Commented] (STANBOL-303) EntityFetch engine

Rupert Westenthaler (JIRA) Wed, 07 Sep 2011 03:27:46 -0700

    [ 
https://issues.apache.org/jira/browse/STANBOL-303?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098833#comment-13098833
 ]


Rupert Westenthaler commented on STANBOL-303:
---------------------------------------------

Hi Florent

Yesterday I have started a 3rd attempt to implement the TaxonomyLinkingEngine 
in a modular fashion.
Up to now this looks much better as the 1st (the current version as in the SVN) 
and the 2nd.

The basic Idea of this is similar to what is described by this Issue. The main 
component processes through the text that is already analyzed by the 
TextAnalyzer [1] and looks-up Entities via "Taxonomy" interface. I will provide 
a default implementation for the Taxonomy interface based on the Entityhub, but 
one could also provide an implementation based on an in-memory representation 
(e.g. for smaller Taxonomies).

The following features will be supported:
  - finding Entities with multiple words (e.g. "Apache Stanbol", "Rupert 
Westenthaler")
  - excluding Entities with multiple words if only a single Word matches (e.g. 
"Apache Stabol" and "Apache Sling" for "Apache"; "Ruper Westenthaler" and 
"Rupert Murdoch" for "Rupert"). 
  - support for POS (Part-of-Speech) tags: e.g. look-up only Nouns - if users 
are interested in Named Entities, Concpets ... ;  look-up only Verbs - as 
required for an Engine as described by STANBOL-322. The presence of POS tags in 
the Analyzed Content is optional. If no POS tags are available, than all words 
need to be processed.
 - support for Chunks: Skip words outside of chunks; Skip/Process chunks based 
on type. The presence of Chunks tags in the Analyzed Content is optional. If no 
Chunks are available than no words of the text can be skipped.

My current plan is to commit this code within the TaxonomyLinkingEngine bundle, 
but in the end it will be the best to create an own module out of it - such a 
EntityFetch engine. 

As soon as I am ready to commit an first version (hopefully in the coming days) 
I will post an update here.

best
Rupert Westenthaler

[1] 
http://svn.apache.org/repos/asf/incubator/stanbol/trunk/commons/opennlp/src/main/java/org/apache/stanbol/commons/opennlp/TextAnalyzer.java

> EntityFetch engine
> ------------------
>
>                 Key: STANBOL-303
>                 URL: https://issues.apache.org/jira/browse/STANBOL-303
>             Project: Stanbol
>          Issue Type: Improvement
>          Components: Enhancer
>            Reporter: Florent ANDRE
>
> Hi,
> I extracted "entity fetching" related code from taxonomylinking engine and 
> create a new engine based on.
> I also make the query.addSelectedField() configurable by felix configuration.
> This engine is runnable in ServiceProperties.ORDERING_EXTRACTION_ENHANCEMENT 
> position.
> I see 2 advantages of such an engine : 
> 1) users can develop an extraction engine without think about entity retrieve
> 2) if this engine provide helpful lib, entity fetching will easily be embed 
> into user's engine and limit code duplication for entity fetch.
> Could it be an interesting engine for trunk ?
> ++

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (STANBOL-303) EntityFetch engine

Reply via email to