On 7/19/2012 2:07 AM, Lance Norskog wrote:
> What is the legitimacy of data which is tagged using an encumbered
> model? I mean, if I tag documents with OpenNLP's non-free models on
> sourceforge, the tagged output is a "derived work". Is this tagged
> output considered free? Does this depend on the license of the
> original data?
>
>
Lance,

The problem is two-fold.

(1)  We would like to distribute the models on Apache.  Unfortunately,
to do so would mean the models and source used to create the models
would have to be under the Apache license to be distributed.  We don't
see any way around this than to generate our own training data with an
open license compatible with the Apache license.
  Jorn is getting the groundwork done for this with the tagging server
to allow us to hand-tag and correct data for our own training data.  I
know it is re-doing work that already has been done; but, the benefits
will be large in the long run.  Anyone could download the training data
and add/remove/etc all they want to customize the training set to
various situations without the worry of a copyright issue.
  The down side, we have a lot of work to do to get there.

(2)  The models themselves although available on sourceforge are for
research purposes ONLY.  The copyright and contract with those holding
the copyright for the original works have stated so.  I've asked many on
this point.  We are not helping by breaking the law on this, nor do we
suggest anyone to do this.
  The next problem is we can't distribute the training data for the
models.... so, modifications to the models are next to impossible to add
training for other situations.  The data used to train are mainly from
news sources and that limits some of the usefulness for some.

.....
I guess I'll have to get the FAQ section on our web-site done soon.

Thanks,
James

Reply via email to