Hi Rahul, This discussion has happened already a number of times. Doing a quick google search gives you a number of solutions to the same question already answered:
You need to train with whole sentences annotated with the sequences you need. One sentence per line of tokenized text. In the link I sent you in the previous email http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.namefind you can see the kind of annotations required to train a model in OpenNLP: <START:person> Pierre Vinken <END> , 61 years old , will join the board as a nonexecutive director Nov. 29 . Mr . <START:person> Vinken <END> is chairman of Elsevier N.V. , the Dutch publishing group . <START:person> Rudolph Agnew <END> , 55 years old and former chairman of Consolidated Gold Fields PLC , was named a director of this British industrial conglomerate . If you only need one class, say "business" then the tags will be <START:business> and so on. If you are getting the whole token array could be because you only trained on a list of words or because you did not use enough data, among other reasons. Cheers, Rodrigo On Thu, May 14, 2015 at 3:48 PM, Vashishth, Rahul <[email protected]> wrote: > Hi Rodrigo, > > I did follow below link to create custom model, but it isn't working for me. > https://gist.github.com/johnmiedema/4020deea875ce306971e > Test File > https://gist.github.com/johnmiedema/4020deea875ce306971e/download# > Training file - I couldn't find a document to design a training file though > <START> Loan <END> > <START> Insurance <END> > <START> Mortgage <END> > > I successfully created the model bin file. But when I used this file with > TokenNameFinderModel, > Instead of returning keyword span it is returning me span for whole token > array. Can you suggest > me any possible issue with above code. > > Thanks, > Rahul Vashishth > > -----Original Message----- > From: Rodrigo Agerri [mailto:[email protected]] > Sent: Thursday, May 14, 2015 4:09 PM > To: [email protected] > Subject: Re: Custom model creation openNLP > > Hello, > > The best thing to do would be to manually annotate some data of the same type > you want to analyze. In this case, you will be annotating "loan", "insurance" > and so on. Then you can train a model to recognize such sequences. > > http://opennlp.apache.org/documentation/1.5.3/manual/opennlp.html#tools.namefind > > If you have only a limited list of words you want to find you could get away > with lookups, but for open-ended terms recognition of those types you will > need to try to generate training data. > > Brat is a nice tool to do such annotation > > http://brat.nlplab.org/ > > HTH, > > R > > On Thu, May 14, 2015 at 8:37 AM, Vashishth, Rahul <[email protected]> > wrote: >> Hi, >> My requirement is to analyze sentence like. "What is health insurence." or >> "What is mortgage loan." >> For this i need to create a custom models to find the business words >> in given array of tokens. So that later on i can create a query based on >> given sentence. >> As we have models created for person name location name, I need to >> have a model for business terms i.e. Loan, Insurance, and User action i.e. >> Download, define and English grammar i.e. What, How. >> Please let me know how i can achieve this or if there is any other way to >> analyze the sentence like that. >> >> Many Regards, >> Rahul Vashishth >> >> This e-mail, including attachments, may include confidential and/or >> proprietary information, and may be used only by the person or entity >> to which it is addressed. If the reader of this e-mail is not the >> intended recipient or his or her authorized agent, the reader is >> hereby notified that any dissemination, distribution or copying of >> this e-mail is prohibited. If you have received this e-mail in error, >> please notify the sender by replying to this message and delete this e-mail >> immediately. > > > This e-mail, including attachments, may include confidential and/or > proprietary information, and may be used only by the person or entity > to which it is addressed. If the reader of this e-mail is not the intended > recipient or his or her authorized agent, the reader is hereby notified > that any dissemination, distribution or copying of this e-mail is > prohibited. If you have received this e-mail in error, please notify the > sender by replying to this message and delete this e-mail immediately.
