Hi
Currently the "org.apache.stanbol.defaultdata" bundle contains all
data needed by the Stanbol Launchers.
This basically includes things:
1. OpenNLP sentence detection for english (used by the opennlp.ner
engine, taxonomylinking engine)
2. OpenNLP POS model for english (used by the taxonomylinking engine)
3. OpenNLP name finder models for location, places and organizations
for the english language (used by the opennlp.ner engine)
4. Default DBPedia configuration consisting of a 43k entities dbpedia
index as well as SolrYard, Cache, ReferencedSite and
EntityLinkingEngine configuration
Having all this in a single bundle makes it hard to change/remove
parts of the default configurations without also affecting other
components
Because of this I suggest to remove the defaultdata bundle in the
current form and instead create several more focused bundles within
the {stanbol-trunk}/data folder.
The "default data" would than be determined by the "data" bundles
referenced in the bundle list.xml files of the different launchers.
Currently I would suggest to use three bundles
1) OpenNLP models for en (Sentence, POS)
2) OpenNLP name finder models for en (location, organization, places)
3) DBPedia.org default configuration
For users it would than be easily possible to deactivate parts of the
default configuration (e.g. the DBPedia related stuff) by simple
stopping or uninstalling the according bundles.
WDYT
Rupert
--
| Rupert Westenthaler [email protected]
| Bodenlehenstraße 11 ++43-699-11108907
| A-5500 Bischofshofen