2011/7/18 Rupert Westenthaler <[email protected]>: > Hi > > Currently the "org.apache.stanbol.defaultdata" bundle contains all > data needed by the Stanbol Launchers. > This basically includes things: > > 1. OpenNLP sentence detection for english (used by the opennlp.ner > engine, taxonomylinking engine) > 2. OpenNLP POS model for english (used by the taxonomylinking engine) > 3. OpenNLP name finder models for location, places and organizations > for the english language (used by the opennlp.ner engine) > 4. Default DBPedia configuration consisting of a 43k entities dbpedia > index as well as SolrYard, Cache, ReferencedSite and > EntityLinkingEngine configuration > > Having all this in a single bundle makes it hard to change/remove > parts of the default configurations without also affecting other > components > > Because of this I suggest to remove the defaultdata bundle in the > current form and instead create several more focused bundles within > the {stanbol-trunk}/data folder. > The "default data" would than be determined by the "data" bundles > referenced in the bundle list.xml files of the different launchers. > > Currently I would suggest to use three bundles > > 1) OpenNLP models for en (Sentence, POS) > 2) OpenNLP name finder models for en (location, organization, places) > 3) DBPedia.org default configuration > > For users it would than be easily possible to deactivate parts of the > default configuration (e.g. the DBPedia related stuff) by simple > stopping or uninstalling the according bundles. > > WDYT
Yes, +1 -- Fabian
