[
https://issues.apache.org/jira/browse/STANBOL-90?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rupert Westenthaler resolved STANBOL-90.
----------------------------------------
Resolution: Fixed
> Create a maven artifact to embed all the default stanbol models data
> --------------------------------------------------------------------
>
> Key: STANBOL-90
> URL: https://issues.apache.org/jira/browse/STANBOL-90
> Project: Stanbol
> Issue Type: New Feature
> Reporter: Olivier Grisel
> Assignee: Olivier Grisel
>
> To make stanbol useful, esp. in offline mode, it needs to some statistical
> model and entity / topic indices. Those indices can be huge (several GB for
> all the entities of dbpedia and geonames for instance) hence cannot be
> packaged as part of the default distrib. However it is very desirable to
> embed some default statistical models
> - opennlp sentence detector for English
> - opennlp name finder models for English for organizations, people, places
> - solr index for the top 10000 most popular entities (of type organizations,
> people, places) as measured by number of incoming links in the Wikipedia
> article graph.
> - solr index for the top 1000 most popular topics number of Wikipedia
> articles categorized in this category or subcategory
> The goal is to keep that maven artifact less that 100 MB (ideally even
> smaller) so that it does not put a big barrier to entry to people downloading
> the default distribution of Stanbol.
> To avoid slowing down the svn repo, those data files will not be put under
> version control, just the pom.xml + script to rebuild the artifact from a
> previous version of the jar.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira