Re: Corpora used for training OpenNLP english models

Rodrigo Agerri Tue, 04 Nov 2014 23:17:17 -0800

Hi Raj,

I believe that the NameFinder models were trained with MUC, but I am
not sure. In any case, if you are going to annotate a different domain
to that of MUC, you will better off annotating data for that domain
because supervised approaches do not adapt well when used in other
genres/domains.


HTH

R

On Wed, Nov 5, 2014 at 8:08 AM, Raj Kiran <[email protected]> wrote:
> Hi Rodrigo,
>
> By extending model I meant, combining the base corpora (used to train 
> existing model) with additional annotated text and retrain the model.
> Apart from licensing, this is one of the reason I am interested in knowing 
> the source/base corpora used for training name finder models.
>
> Thanks,
> Raj
>
> -----Original Message-----
> From: Rodrigo Agerri [mailto:[email protected]]
> Sent: Wednesday, November 5, 2014 12:16 PM
> To: [email protected]
> Subject: Re: Corpora used for training OpenNLP english models
>
> Hi Raj,
>
> I do not know which license the models in sourceforge are distributed under. 
> But you cannot extend the existing English models. You need to train new ones 
> for your domain based on annotated data.
>
> Best,
>
> R
>
> On Tue, Nov 4, 2014 at 7:05 PM, Raj Kiran <[email protected]> wrote:
>> Hi All,
>>
>> We want to use OpenNLP for NER and other capabilities in a commercial 
>> software (English only). It looks like existing OpenNLP english models 
>> available at sourceforge might have some license restriction. Is there any 
>> information available on the source corpora used for training existing 
>> OpenNLP english models ?
>>
>> Apart from purchasing the source corpora, this information would help us to 
>> enhance the existing models by adding more training data.
>>
>> Thanks and Regards,
>> Raj
>>
>>
>>
>> ________________________________
>>
>
> ________________________________
>

Re: Corpora used for training OpenNLP english models

Reply via email to