2011/7/20 Rupert Westenthaler <[email protected]>:
> On Wed, Jul 20, 2011 at 9:56 AM, Fabian Christ
> <[email protected]> wrote:
>> 2011/7/19 Olivier Grisel <[email protected]>:
>>> Maybe we could work on extending the DataFileProvider to make the
>>> defaultdata provider only provide download URLs from the existing gray
>>> licensed opennlp 1.5 models from
>>> http://opennlp.sourceforge.net/models-1.5/ and let the
>>> DataFileProvider download them from there automatically the first time
>>> they are required. The issue then is that every integration tests job
>>> will re-download the same data from sourceforge over and over again...
>>> That will slowdown the builds / tests and waste bandwith for nothing +
>>> add a new way for the builds and test to fail (dependency on the
>>> network / sourceforge availability).
>>
>> I think having the OpenNLP models in our trunk and use them during
>> development in incubation is no problem. So we don't need to change
>> anything for build and integration tests right now.
>>
> The models where never in the trunk, but downloaded by using a shell
> script. I am currently upgrading this to use the maven-ant-plugin
> instead.
> This will allow to download data files automatically during the normal
> build process
>
>> I would propose to exclude the models when a release is made. In this
>> case the OpenNLP engine has to be prepared to recognize that the
>> required model is missing and download it from Sourceforge. If the
>> model is not missing as during development in our trunk everything is
>> fine.
>>
> In other words we will exclude such bundles from the release forcing users to
>
> * check out /data and build them locally or
> * download those bundles from a Maven repository.
>
> But if we want to also release the launchers, that we would need to
> include those bundles otherwise the launcher would not work out of the
> box - something very important for adoption.
>
> I would assume that the normal user would double-click the jar; open
> the Browser; paste some text to /engines - everything he will get with
> the missing models would be "Invalid query" :(
>
> We can not expect that he will go to the Felix Web Console; open the
> DataFileProvider tab; look at the list of missing files; download them
> from SourceForge and copy them to the /datafiles directory.

That's why I would like to separate releases of ready to use Stanbol
versions and the Stanbol framework. When releasing the framework we
can argue that this framework comes without data. If you want a ready
to use Stanbol for a specific scenario we will release it separately.

See my other threat about splitting the release.

-- 
Fabian

Reply via email to