Hi Ittigi, off the top ... how many lines are in your training set? I assume you have compiled the model and are pointing to it as in the sample code. Sorry, just confirming the most obvious things first.
On Tue, Jul 1, 2014 at 8:04 AM, Vivekanand Ittigi <[email protected]> wrote: > Hi John, > > I went through your post. It was so impressive and started implementing as > you said. > > This is what is my training data: > who is working on <START> phone <END>. > <START> mobile <END> is a product in our system. > > > And these are the inputs i'm giving: > static String sentence = "phone is a product in our system"; > static String sentence = "what is the risk on phone"; > static String sentence = "who is working on switch today"; > > I should get these entities from respective lines "phone" "phone" and > "switch". > > But i got nothing.? I know i'm doing something wrong in training data. I'm > new to this field. can you please guide me what can i put in training data > to process all these sentences. > > If i get little more knowledge about this, i can implement in our base?? > > Please help me..! > > Thanks, > Vivek > > > > > > On Tue, Jun 24, 2014 at 9:17 PM, John Miedema <[email protected]> > wrote: > >> I recently wrote up post, doing this in java, not using command line. >> Maybe >> it will help. Code samples in java. http://johnmiedema.com/?p=744 >> >> >> On Tue, Jun 24, 2014 at 8:53 AM, Vivekanand Ittigi <[email protected] >> > >> wrote: >> >> > It means you want me to write small story integrating these entities.? >> > >> > >> > On Tue, Jun 24, 2014 at 5:59 PM, Mark G <[email protected]> wrote: >> > >> > > Hello, you need to annotate the entity within some of the sentences it >> > > occurs in. The name finder needs context. It's giving you the same >> > sentence >> > > back because it was trained to find any token anywhere. >> > > Mg >> > > >> > > >> > > > On Jun 24, 2014, at 8:12 AM, Vivekanand Ittigi < >> [email protected]> >> > > wrote: >> > > > >> > > > Hi Jorn, >> > > > >> > > > Let me use training model itself. >> > > > >> > > > Let me just say what i've done so far >> > > > >> > > > 1. I've written the following text into a file called test.train >> > > > <START:Product_entities>icm2500<END> >> > > > <START:Product_entities>prd_234<END> >> > > > . >> > > > . >> > > > . >> > > > >> > > > 2. i ran the following >> > > > >> > > > ./opennlp TokenNameFinderTrainer -encoding UTF-8 -lang en -data >> > > test.train >> > > > -model en-ner-person.bin >> > > > >> > > > 3. I've added the bellow line in "sometext.txt" >> > > > >> > > > What is the risk value on icm2500. Delivery of prd_234 will be >> arrived >> > > > late. Watson is handling router_34. >> > > > >> > > > 4. I ran the command >> > > > >> > > > ./opennlp TokenNameFinder en-ner-person.bin <sometext.txt> >> > > > output/output4.txt >> > > > >> > > > result: It threw me the same line instead of What is the risk value >> on >> > > > <START:Product_entities>icm2500<END> Delivery of >> > > > <START:Product_entities>prd_234<END> will be arrived late....... >> > > > >> > > > Please tell me what am i doing wrong?????? >> > > > >> > > > Thanks, >> > > > Vivek >> > > > >> > > > >> > > > >> > > > >> > > > >> > > >> On Tue, Jun 24, 2014 at 5:06 PM, Jörn Kottmann <[email protected] >> > >> > > wrote: >> > > >> >> > > >>> On 06/24/2014 01:10 PM, Vivekanand Ittigi wrote: >> > > >>> >> > > >>> Hi Jorn, >> > > >>> >> > > >>> I read the document >> > > >>> http://opennlp.apache.org/documentation/manual/opennlp. >> > > >>> html#tools.namefind.recognition.cmdline. >> > > >>> But i felt i needed more information to put it in code. >> > > >>> >> > > >>> I got to know that we need to train the model. But could not get >> it. >> > > >>> Can you please explain it. so that i could start implementing it. >> > > >>> >> > > >>> Thanks, >> > > >>> Vivek >> > > >>> >> > > >>> Thanks, >> > > >>> Vivek >> > > >>> >> > > >>> >> > > >>> On Tue, Jun 24, 2014 at 3:28 PM, Jörn Kottmann < >> [email protected]> >> > > >>> wrote: >> > > >>> >> > > >>>> On 06/24/2014 09:44 AM, Vivekanand Ittigi wrote: >> > > >>>> >> > > >>>> Hi, >> > > >>>>> >> > > >>>>> If i use a query like this in command line >> > > >>>>> >> > > >>>>> ./opennlp TokenNameFinder en-ner-person.bin <input.txt> >> > <output.txt> >> > > >>>>> >> > > >>>>> I'll get person names printed in output.txt but I want to write >> own >> > > >>>>> models >> > > >>>>> such that i should print my own entities. >> > > >>>>> >> > > >>>>> E.g. >> > > >>>>> >> > > >>>>> 1. what is the risk value on icm2500. >> > > >>>>> 2. Delivery of prd_234 will be arrived late. >> > > >>>>> 3. Watson is handling router_34. >> > > >>>>> >> > > >>>>> If i pass these lines, it should parse and extract >> > product_entities. >> > > >>>>> icm2500, prd_234, router_34... etc these are all Products( we >> can >> > > save >> > > >>>>> this >> > > >>>>> information in a file and we can use it as look up kind of for >> > > models or >> > > >>>>> openNLP). >> > > >>>>> >> > > >>>>> Can anyone please tel me how to do this ? >> > > >>>>> >> > > >>>>> >> > > >>>>> You need to train your own model. To do that you have to collect >> > some >> > > >>>> of >> > > >>>> the texts >> > > >>>> and annotate them with the entities you wish to detect. >> > > >>>> >> > > >>>> Have a look at the documentation about the name finder. It >> explains >> > > how >> > > >>>> to >> > > >>>> the training >> > > >>>> works. >> > > >> For the training you need to produce annotated texts like the >> sample >> > in >> > > >> the documentation. >> > > >> If you have a training data file in that format you can use the >> > command >> > > >> line interface to >> > > >> actual train a model. >> > > >> >> > > >> The latest trunk version of OpenNLP can also be trained on files in >> > the >> > > >> brat data format, >> > > >> those can be easily created with brat. >> > > >> >> > > >> Have a look here: >> > > >> http://brat.nlplab.org/index.html >> > > >> >> > > >> In my experience brat works quite well in the latest trunk version. >> > > >> >> > > >> To train with brat you need to suffix the training command like >> this >> > > >> bin/opennlp TokenNameFinderTrainer.brat >> > > >> That command will print a help message explaining the inputs it >> needs. >> > > >> >> > > >> There is no need to write code to train a name finder model. >> > > >> >> > > >> Jörn >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > >> > >> >> >> >> -- >> _________________________________________ >> johnmiedema.com >> > > -- _________________________________________ johnmiedema.com
