Hi Rahul,

This discussion has happened already a number of times. Doing a quick
google search gives you a number of solutions to the same question
already answered:

You need to train with whole sentences annotated with the sequences
you need. One sentence per line of tokenized text. In the link I sent
you in the previous email

http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.namefind

you can see the kind of annotations required to train a model in OpenNLP:

<START:person> Pierre Vinken <END> , 61 years old , will join the
board as a nonexecutive director Nov. 29 .
Mr . <START:person> Vinken <END> is chairman of Elsevier N.V. , the
Dutch publishing group .
<START:person> Rudolph Agnew <END> , 55 years old and former chairman
of Consolidated Gold Fields PLC , was named a director of this British
industrial conglomerate .

If you only need one class, say "business" then the tags will be
<START:business> and so on.
If you are getting the whole token array could be because you only
trained on a list of words or because you did not use enough data,
among other reasons.

Cheers,

Rodrigo


On Thu, May 14, 2015 at 3:48 PM, Vashishth, Rahul
<[email protected]> wrote:
> Hi Rodrigo,
>
> I did follow below link to create custom model, but it isn't working for me.
> https://gist.github.com/johnmiedema/4020deea875ce306971e
> Test File
> https://gist.github.com/johnmiedema/4020deea875ce306971e/download#
> Training file - I couldn't find a document to design a training file though
> <START> Loan <END>
> <START> Insurance <END>
> <START> Mortgage <END>
>
> I successfully created the model bin file. But when I used this file with 
> TokenNameFinderModel,
> Instead of returning keyword span it is returning me span for whole token 
> array. Can you suggest
> me any possible issue with above code.
>
> Thanks,
> Rahul Vashishth
>
> -----Original Message-----
> From: Rodrigo Agerri [mailto:[email protected]]
> Sent: Thursday, May 14, 2015 4:09 PM
> To: [email protected]
> Subject: Re: Custom model creation openNLP
>
> Hello,
>
> The best thing to do would be to manually annotate some data of the same type 
> you want to analyze. In this case, you will be annotating "loan", "insurance" 
> and so on. Then you can train a model to recognize such sequences.
>
> http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.namefind
>
> If you have only a limited list of words you want to find you could get away 
> with lookups, but for open-ended terms recognition of those types you will 
> need to try to generate training data.
>
> Brat is a nice tool to do such annotation
>
> http://brat.nlplab.org/
>
> HTH,
>
> R
>
> On Thu, May 14, 2015 at 8:37 AM, Vashishth, Rahul <[email protected]> 
> wrote:
>> Hi,
>> My requirement is to analyze sentence like. "What is health insurence." or 
>> "What is mortgage loan."
>> For this i need to create a custom models to find the business words
>> in given array of tokens. So that later on i can create a query based on 
>> given sentence.
>> As we have models created for person name location name, I need to
>> have a model for business terms i.e. Loan, Insurance, and User action i.e. 
>> Download, define and English grammar i.e. What, How.
>> Please let me know how i can achieve this or if there is any other way to 
>> analyze the sentence like that.
>>
>> Many Regards,
>> Rahul Vashishth
>>
>> This e-mail, including attachments, may include confidential and/or
>> proprietary information, and may be used only by the person or entity
>> to which it is addressed. If the reader of this e-mail is not the
>> intended recipient or his or her authorized agent, the reader is
>> hereby notified that any dissemination, distribution or copying of
>> this e-mail is prohibited. If you have received this e-mail in error,
>> please notify the sender by replying to this message and delete this e-mail 
>> immediately.
>
>
> This e-mail, including attachments, may include confidential and/or
> proprietary information, and may be used only by the person or entity
> to which it is addressed. If the reader of this e-mail is not the intended
> recipient or his or her authorized agent, the reader is hereby notified
> that any dissemination, distribution or copying of this e-mail is
> prohibited. If you have received this e-mail in error, please notify the
> sender by replying to this message and delete this e-mail immediately.

Reply via email to