See
<https://builds.apache.org/job/stanbol-trunk-1.6/org.apache.stanbol$org.apache.stanbol.enhancer.engine.keywordextraction/416/changes>
Changes:
[rwesten] = Summary: =
* STANBOL-303: The new KeywordExtractionEngine replaces the current
TaxonomyLinkingEngine. This Engine is more modular and should be a good base
for STANBOL-303.
* Multilingual support: Added OpenNLP bundles for all available models. Added
multi language support to the TextAnalyzer utility in the commons.opennlp
bundle. The KeywordExtractionEngine also supports multiple languages (tested
with English, German and also some Frensh, Italien and Spain texts).
* Updated the dbpedia default data to include more languages and rdfs:comment
values for english
= Detailed list of changes =
== KeywordExtractionEngine (STANBOL-303) ==
* This engine in a re-implementation of the current TaxonomyLinkingEngine
* It has support for multiple languages (tested with English, German, French
and Italien)
* It uses NLP (mainly Sentence detection, POS tagging) to optimize the
enhancement process, but does not depend on it. Only a Tokenizer is required.
I will publish a detailed description of this Engine on the Stanbol Webpage and
also write a mail on the stanbol-dev list later today. Andreas Gruber is also
preparing a blog post explaining how to use this Engine out of a user
perspective.
== TaxonomyLinkingEngine ==
Currently this engine is still functional but marked as deprecated. Users are
encouraged to use the KexwordExtractionEngine instead.
I do plan to re-introduce this engine with features special to Taxonomies such
as support for hierarchies, default configuration for skos … . However this
will be based on the code base of the KeywordExtractionEngine.
== commons.openNLP ==
* TextAnalyzer now supports multiple languages
* PosTypeChunker now loads the list of POS tags to create/extend chunks from an
enumeration. Such tags are Language specific. Often there are even different
tag sets used for the same language. Therefore this configuration needs to be
external to the Chunker implementation. In future it is planed to load such
lists via the DataFileProvider (similar to the models needed for the POS tagger.
== data ==
* STANBOL-315: Updated the dbpedia default data bundle to include rdfs:comment
values for english
* Updated the dbpedia default data bundle to include labels for the following
languages: ar, da, de, en, fi, fr, it, no, pt, ru, sv, tr, zh
* updated version of the dbpedia default data bundle from 1.0.1-incubating to
1.0.2-incubationg
* Added OpenNLP language bundles for all available languages (da, de, nl, pt,
se)
* Added OpenNLP NER bundles for all available languages (es, nl)
* The additional OpenNLP bundles are not in the default build profile. Users
that want to build this modules need to activate the profile "opennlp"
== Entityhub Indexing ==
* Added Support for the usage of Multiple EntityProcessor during Indexing.
* Added a FieldValueFilter processor that allows to Filter Entities based on
the value of a specific field (e.g. to index only Entities with the ref:type
dbpedia:Person, dbpedia:Organisation and dbpedia:Place)
* The SolrYardIndexingDestination now optimizes the created Solr index during
the finalization step.
== Entityhub ==
* Corrected a bug in the ReferencedSiteImpl (Entithub) that cause the
EnttiyDereferencer (remote ID based lookup of Entities) to be called if an
entity was not found in the local index on CacheStrategy FULL.
* Added a QueryResultLit.results() Method returning the results as collection
instead of an Iterator.
* Added a optimize() Method to the SolrYard allowing to force the optimization
of the underlying Lucene index.
* Corrected a Bug in the Jersey EntityhubRootResource that cause the HTML REST
API documentation to be returned on illegal requests with am accept header not
compatible to HTML.
* Improved and updated the RESTful service API documentation for the
entityhub/find service
== Integration Tests ==
* Improved loggings on Error for the Enttiyhub related integration tests
* Added Support for the /find and /query tests for the "/entityhub" service
endpoint.
* Tests for entityhub/entity are still missing to complete STANBOL-299
* Updated the expected results of the /query and /find tests to reflect data
changes in the dbpediadefault data (this fixes the issues that caused Jenkins
build #415 to fail.
== other changes ==
* improved logging for the BundleInstaller
* changed log level of several messages from info to debug
* updated several loggings to use the "message: {}",message instead of
"message:"+message
* changed all ${pom.version} to ${project.version}: Maven 3 noted that
pom.version is deprecated
------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] Building Apache Stanbol Enhancer Enhancement Engine and utilities for
extracting keywords form parsed text.
[INFO] task-segment: [clean, install]
[INFO] ------------------------------------------------------------------------
[INFO] [clean:clean {execution: default-clean}]
[INFO] [enforcer:enforce {execution: enforce-java}]
[INFO] [remote-resources:process {execution: default}]
[INFO] [resources:resources {execution: default-resources}]
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] Copying 1 resource
[INFO] Copying 3 resources
[INFO] [compiler:compile {execution: default-compile}]
[INFO] Compiling 13 source files to
<https://builds.apache.org/job/stanbol-trunk-1.6/org.apache.stanbol$org.apache.stanbol.enhancer.engine.keywordextraction/ws/target/classes>
[INFO] [scr:scr {execution: generate-scr-scrdescriptor}]
[INFO] Generating 2 MetaType Descriptors to
<https://builds.apache.org/job/stanbol-trunk-1.6/org.apache.stanbol$org.apache.stanbol.enhancer.engine.keywordextraction/ws/target/scr-plugin-generated/OSGI-INF/metatype/metatype.xml>
[INFO] Writing abstract service descriptor
<https://builds.apache.org/job/stanbol-trunk-1.6/org.apache.stanbol$org.apache.stanbol.enhancer.engine.keywordextraction/ws/target/scr-plugin-generated/OSGI-INF/scr-plugin/scrinfo.xml>
with 1 entries.
[INFO] Generating 1 Service Component Descriptors to
<https://builds.apache.org/job/stanbol-trunk-1.6/org.apache.stanbol$org.apache.stanbol.enhancer.engine.keywordextraction/ws/target/scr-plugin-generated/OSGI-INF/serviceComponents.xml>
[INFO] [resources:testResources {execution: default-testResources}]
[INFO] Using 'UTF-8' encoding to copy filtered resources.
[INFO] skip non existing resourceDirectory
<https://builds.apache.org/job/stanbol-trunk-1.6/org.apache.stanbol$org.apache.stanbol.enhancer.engine.keywordextraction/ws/src/test/resources>
[INFO] Copying 3 resources
[INFO] [compiler:testCompile {execution: default-testCompile}]
[INFO] Compiling 1 source file to
<https://builds.apache.org/job/stanbol-trunk-1.6/org.apache.stanbol$org.apache.stanbol.enhancer.engine.keywordextraction/ws/target/test-classes>
[INFO] -------------------------------------------------------------
[ERROR] COMPILATION ERROR :
[INFO] -------------------------------------------------------------
[ERROR]
<https://builds.apache.org/job/stanbol-trunk-1.6/org.apache.stanbol$org.apache.stanbol.enhancer.engine.keywordextraction/ws/src/test/java/org/apache/stanbol/enhancer/engines/keywordextraction/TestTaxonomyLinker.java>:[38,65]
cannot find symbol
symbol : class ClasspathDataFileProvider
location: package org.apache.stanbol.enhancer.engines.keywordextraction.impl
[ERROR]
<https://builds.apache.org/job/stanbol-trunk-1.6/org.apache.stanbol$org.apache.stanbol.enhancer.engine.keywordextraction/ws/src/test/java/org/apache/stanbol/enhancer/engines/keywordextraction/TestTaxonomyLinker.java>:[39,65]
cannot find symbol
symbol : class TestSearcherImpl
location: package org.apache.stanbol.enhancer.engines.keywordextraction.impl
[ERROR]
<https://builds.apache.org/job/stanbol-trunk-1.6/org.apache.stanbol$org.apache.stanbol.enhancer.engine.keywordextraction/ws/src/test/java/org/apache/stanbol/enhancer/engines/keywordextraction/TestTaxonomyLinker.java>:[73,11]
cannot find symbol
symbol : class TestSearcherImpl
location: class
org.apache.stanbol.enhancer.engines.keywordextraction.TestTaxonomyLinker
[ERROR]
<https://builds.apache.org/job/stanbol-trunk-1.6/org.apache.stanbol$org.apache.stanbol.enhancer.engine.keywordextraction/ws/src/test/java/org/apache/stanbol/enhancer/engines/keywordextraction/TestTaxonomyLinker.java>:[83,34]
cannot find symbol
symbol : class ClasspathDataFileProvider
location: class
org.apache.stanbol.enhancer.engines.keywordextraction.TestTaxonomyLinker
[ERROR]
<https://builds.apache.org/job/stanbol-trunk-1.6/org.apache.stanbol$org.apache.stanbol.enhancer.engine.keywordextraction/ws/src/test/java/org/apache/stanbol/enhancer/engines/keywordextraction/TestTaxonomyLinker.java>:[84,23]
cannot find symbol
symbol : class TestSearcherImpl
location: class
org.apache.stanbol.enhancer.engines.keywordextraction.TestTaxonomyLinker
[INFO] 5 errors
[INFO] -------------------------------------------------------------
[JENKINS] Archiving
<https://builds.apache.org/job/stanbol-trunk-1.6/org.apache.stanbol$org.apache.stanbol.enhancer.engine.keywordextraction/ws/pom.xml>
to
/home/hudson/hudson/jobs/stanbol-trunk-1.6/modules/org.apache.stanbol$org.apache.stanbol.enhancer.engine.keywordextraction/builds/2011-09-22_07-02-21/archive/org.apache.stanbol/org.apache.stanbol.enhancer.engine.keywordextraction/0.9.0-incubating-SNAPSHOT/org.apache.stanbol.enhancer.engine.keywordextraction-0.9.0-incubating-SNAPSHOT.pom
[INFO] ------------------------------------------------------------------------
[ERROR] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Compilation failure
<https://builds.apache.org/job/stanbol-trunk-1.6/org.apache.stanbol$org.apache.stanbol.enhancer.engine.keywordextraction/ws/src/test/java/org/apache/stanbol/enhancer/engines/keywordextraction/TestTaxonomyLinker.java>:[38,65]
cannot find symbol
symbol : class ClasspathDataFileProvider
location: package org.apache.stanbol.enhancer.engines.keywordextraction.impl
<https://builds.apache.org/job/stanbol-trunk-1.6/org.apache.stanbol$org.apache.stanbol.enhancer.engine.keywordextraction/ws/src/test/java/org/apache/stanbol/enhancer/engines/keywordextraction/TestTaxonomyLinker.java>:[39,65]
cannot find symbol
symbol : class TestSearcherImpl
location: package org.apache.stanbol.enhancer.engines.keywordextraction.impl
<https://builds.apache.org/job/stanbol-trunk-1.6/org.apache.stanbol$org.apache.stanbol.enhancer.engine.keywordextraction/ws/src/test/java/org/apache/stanbol/enhancer/engines/keywordextraction/TestTaxonomyLinker.java>:[73,11]
cannot find symbol
symbol : class TestSearcherImpl
location: class
org.apache.stanbol.enhancer.engines.keywordextraction.TestTaxonomyLinker
<https://builds.apache.org/job/stanbol-trunk-1.6/org.apache.stanbol$org.apache.stanbol.enhancer.engine.keywordextraction/ws/src/test/java/org/apache/stanbol/enhancer/engines/keywordextraction/TestTaxonomyLinker.java>:[83,34]
cannot find symbol
symbol : class ClasspathDataFileProvider
location: class
org.apache.stanbol.enhancer.engines.keywordextraction.TestTaxonomyLinker
<https://builds.apache.org/job/stanbol-trunk-1.6/org.apache.stanbol$org.apache.stanbol.enhancer.engine.keywordextraction/ws/src/test/java/org/apache/stanbol/enhancer/engines/keywordextraction/TestTaxonomyLinker.java>:[84,23]
cannot find symbol
symbol : class TestSearcherImpl
location: class
org.apache.stanbol.enhancer.engines.keywordextraction.TestTaxonomyLinker
[INFO] ------------------------------------------------------------------------
[INFO] For more information, run Maven with the -e switch
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 24 minutes 16 seconds
[INFO] Finished at: Thu Sep 22 07:27:04 UTC 2011
[INFO] Final Memory: 198M/497M
[INFO] ------------------------------------------------------------------------