Re: Writing our own models in openNLP.

John Miedema Tue, 24 Jun 2014 08:48:34 -0700

I recently wrote up post, doing this in java, not using command line. Maybe
it will help. Code samples in java. http://johnmiedema.com/?p=744



On Tue, Jun 24, 2014 at 8:53 AM, Vivekanand Ittigi <[email protected]>
wrote:

> It means you want me to write small story integrating these entities.?
>
>
> On Tue, Jun 24, 2014 at 5:59 PM, Mark G <[email protected]> wrote:
>
> > Hello, you need to annotate the entity within some of the sentences it
> > occurs in. The name finder needs context. It's giving you the same
> sentence
> > back because it was trained to find any token anywhere.
> > Mg
> >
> >
> > > On Jun 24, 2014, at 8:12 AM, Vivekanand Ittigi <[email protected]>
> > wrote:
> > >
> > > Hi Jorn,
> > >
> > > Let me use training model itself.
> > >
> > > Let me just say what i've done so far
> > >
> > > 1. I've written the following text into a file called test.train
> > > <START:Product_entities>icm2500<END>
> > > <START:Product_entities>prd_234<END>
> > > .
> > > .
> > > .
> > >
> > > 2.  i ran the following
> > >
> > > ./opennlp TokenNameFinderTrainer -encoding UTF-8 -lang en -data
> > test.train
> > > -model en-ner-person.bin
> > >
> > > 3. I've added the bellow line in "sometext.txt"
> > >
> > > What is the risk value on icm2500. Delivery of prd_234 will be arrived
> > > late. Watson is handling router_34.
> > >
> > > 4. I ran the command
> > >
> > > ./opennlp TokenNameFinder en-ner-person.bin <sometext.txt>
> > > output/output4.txt
> > >
> > > result: It threw me the same line instead of What is the risk value on
> > > <START:Product_entities>icm2500<END> Delivery of
> > > <START:Product_entities>prd_234<END> will be arrived late.......
> > >
> > > Please tell me what am i doing wrong??????
> > >
> > > Thanks,
> > > Vivek
> > >
> > >
> > >
> > >
> > >
> > >> On Tue, Jun 24, 2014 at 5:06 PM, Jörn Kottmann <[email protected]>
> > wrote:
> > >>
> > >>> On 06/24/2014 01:10 PM, Vivekanand Ittigi wrote:
> > >>>
> > >>> Hi Jorn,
> > >>>
> > >>> I read the document
> > >>> http://opennlp.apache.org/documentation/manual/opennlp.
> > >>> html#tools.namefind.recognition.cmdline.
> > >>> But i felt i needed more information to put it in code.
> > >>>
> > >>> I got to know that we need to train the model. But could not get it.
> > >>> Can you please explain it. so that i could start implementing it.
> > >>>
> > >>> Thanks,
> > >>> Vivek
> > >>>
> > >>> Thanks,
> > >>> Vivek
> > >>>
> > >>>
> > >>> On Tue, Jun 24, 2014 at 3:28 PM, Jörn Kottmann <[email protected]>
> > >>> wrote:
> > >>>
> > >>>> On 06/24/2014 09:44 AM, Vivekanand Ittigi wrote:
> > >>>>
> > >>>> Hi,
> > >>>>>
> > >>>>> If i use a query like this in command line
> > >>>>>
> > >>>>> ./opennlp TokenNameFinder en-ner-person.bin <input.txt>
> <output.txt>
> > >>>>>
> > >>>>> I'll get person names printed in output.txt but I want to write own
> > >>>>> models
> > >>>>> such that i should print my own entities.
> > >>>>>
> > >>>>> E.g.
> > >>>>>
> > >>>>> 1. what is the risk value on icm2500.
> > >>>>> 2. Delivery of prd_234 will be arrived late.
> > >>>>> 3. Watson is handling router_34.
> > >>>>>
> > >>>>> If i pass these lines, it should parse and extract
> product_entities.
> > >>>>> icm2500, prd_234, router_34... etc these are all Products( we can
> > save
> > >>>>> this
> > >>>>> information in a file and we can use it as look up kind of for
> > models or
> > >>>>> openNLP).
> > >>>>>
> > >>>>> Can anyone please tel me how to do this  ?
> > >>>>>
> > >>>>>
> > >>>>> You need to train your own model. To do that you have to collect
> some
> > >>>> of
> > >>>> the texts
> > >>>> and annotate them with the entities you wish to detect.
> > >>>>
> > >>>> Have a look at the documentation about the name finder. It explains
> > how
> > >>>> to
> > >>>> the training
> > >>>> works.
> > >> For the training you need to produce annotated texts like the sample
> in
> > >> the documentation.
> > >> If you have a training data file in that format you can use the
> command
> > >> line interface to
> > >> actual train a model.
> > >>
> > >> The latest trunk version of OpenNLP can also be trained on files in
> the
> > >> brat data format,
> > >> those can be easily created with brat.
> > >>
> > >> Have a look here:
> > >> http://brat.nlplab.org/index.html
> > >>
> > >> In my experience brat works quite well in the latest trunk version.
> > >>
> > >> To train with brat you need to suffix the training command like this
> > >> bin/opennlp TokenNameFinderTrainer.brat
> > >> That command will print a help message explaining the inputs it needs.
> > >>
> > >> There is no need to write code to train a name finder model.
> > >>
> > >> Jörn
> > >>
> > >>
> > >>
> > >>
> > >>
> >
>



-- 
_________________________________________
johnmiedema.com

Re: Writing our own models in openNLP.

Reply via email to