2011/7/18 Rupert Westenthaler <[email protected]>:
> Hi
>
> Currently the "org.apache.stanbol.defaultdata" bundle contains all
> data needed by the Stanbol Launchers.
> This basically includes things:
>
> 1. OpenNLP sentence detection for english (used by the opennlp.ner
> engine, taxonomylinking engine)
> 2. OpenNLP POS model for english (used by the taxonomylinking engine)
> 3. OpenNLP name finder models for location, places and organizations
> for the english language (used by the opennlp.ner engine)
> 4. Default DBPedia configuration consisting of a 43k entities dbpedia
> index as well as SolrYard, Cache, ReferencedSite and
> EntityLinkingEngine configuration
>
> Having all this in a single bundle makes it hard to change/remove
> parts of the default configurations without also affecting other
> components
>
> Because of this I suggest to remove the defaultdata bundle in the
> current form and instead create several more focused bundles within
> the {stanbol-trunk}/data folder.
> The "default data" would than be determined by the "data" bundles
> referenced in the bundle list.xml files of the different launchers.
>
> Currently I would suggest to use three bundles
>
> 1) OpenNLP models for en (Sentence, POS)
> 2) OpenNLP name finder models for en (location, organization, places)
> 3) DBPedia.org default configuration
>
> For users it would than be easily possible to deactivate parts of the
> default configuration (e.g. the DBPedia related stuff) by simple
> stopping or uninstalling the according bundles.
>
> WDYT

Yes, +1

-- 
Fabian

Reply via email to