[ 
https://issues.apache.org/jira/browse/STANBOL-706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13438640#comment-13438640
 ] 

Rupert Westenthaler commented on STANBOL-706:
---------------------------------------------

Progress update (see [1])

Changes:
-----

* The POM files are now updated to use the versions of the trunk 
(0.10.0-incubating-SNAPSHOT)
* The DBpedia Spotlight Spot engine now behaves as expected for a 
EnhancementEngine
    * It supports asynchronous  enhancements (as highly recommended by Engines 
calling remote services)
    * It respects OfflineMode - does not allow connections to external services
    * It does not catch any Exceptions - the EnhancementJobManager MUST deal 
with those as only it knows if an engine is OPTIONAL or REQUIRED.
    * In addition I changed the communication with the Spotlight RESTful 
service so that request/response data are not loaded in memory twice (e.g. the 
Response as String and XML document)

NOTES:
-----

I also added the Spot Engine to the Enhancer Bundlelist. So for Users that "mvn 
clean install" the branch and than "mvn clean install" the Full/Stanble 
Launcher in the trunk ("{stanbol-trunk}/lanuchers/full") will see the DBpedia 
Spotlight Spot engine.

TODOs
-----

* Similar changes as for the Spot engine need to be done for the other 
Spotlight engines

Qusetions:
-----

### DBpedia Spotlight Modlues/Bundles

I have noticed that some Functionality (most noticeable the XMLParser class) is 
duplicated in some/all of the Spotlight engines. I thee the following 
possibilities to deal with that

    1. ignore the duplicated code
    2. create an extra module (bundle) that contains the shared functionality
    3. move all engines into a single module

(1) and (2) would be favorable if typical users would only want to install a 
subset of the DBpedia Spotlight engines. (3) works best if it is OK to install 
all (but maybe use only a few - e.g. by configuring according enhancement 
engines or by deactivating the unused one).


### Effects on the Stanbol default Configuration

With the addition of the DBpedia Spotlight engines we might need to think about 
changing the default configuration of Apache Stanbol.

Currently the default EnhancementChain of the Stanbol Launchers includes all 
active EnhancementEngines. When we add the DBpedia Spotlight Engines this might 
no longer make sense as the results of the DBpedia Spotlight Engines will be 
very similar to those of the NER+EntityTagging engine with the default DBpedia 
dataset. More concrete an EnhancementChain containing all active Enhancement 
Engines will result in a lot of duplicate results that might confuse users new 
to Stanbol.

To avoid this I see two possibilities

    1. Do not include the DBpedia Spotlight Engines in the default Launcher 
    2. Deactivate the DBpedia Spotlight Engines by default.
    3. Switch from the "All active Engines Chain" to an explicitly configured 
Chain for the default configuration add an DBpedia Spotlight Chain.

I am strongly favoring (3) and only included (1) and (2) to give people that 
want to keep the "All active Engines Chain" the change to leave a comment. Note 
that even with (3) we can keep the "All active Engines chain" but it would no 
longer be the default chain.



[1] http://svn.apache.org/viewvc?rev=1375468&view=rev
                
> DBpedia Spotlight EnhancementEngines integration
> ------------------------------------------------
>
>                 Key: STANBOL-706
>                 URL: https://issues.apache.org/jira/browse/STANBOL-706
>             Project: Stanbol
>          Issue Type: New Feature
>          Components: Enhancer
>    Affects Versions: 0.9.0-incubating
>            Reporter: Iavor Jelev
>            Assignee: Rupert Westenthaler
>         Attachments: dbpediaspotlightintegration.rar
>
>
> In the process of the early adopters programme of the IKS we developed 4 
> EnhancementEngines, which integrate the different aspects of DBpedia 
> Spotlight in Apache Stanbol. We would like to contribute them, so they can 
> eventually become a part of the Stanbol Stack. The engines are as follows:
> - dbpediaspotlightannotate - spots the potential mentions, retrieves the 
> candidate DBpedia resources, disambiguates them if needed, and links the 
> mentions to the best one
> - dbpediaspotlightcandidates - same as annotate, but does not disambiguate 
> the candidates for each mention. Rather it returns the top K ones.
> - dbpediaspotlightdisambiguate - does not do spotting, it just selects the 
> candidates for the given mentions and does disambiguation.
> - dbpediaspotlightspot - does only spotting, no candidate resource selection, 
> disambiguation or linking

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to