Re: Writing our own models in openNLP.

John Miedema Wed, 02 Jul 2014 07:23:48 -0700

Hi Ittigi, off the top ... how many lines are in your training set? I
assume you have compiled the model and are pointing to it as in the sample
code. Sorry, just confirming the most obvious things first.






On Tue, Jul 1, 2014 at 8:04 AM, Vivekanand Ittigi <[email protected]>
wrote:

> Hi John,
>
> I went through your post. It was so impressive and started implementing as
> you said.
>
> This is what is my training data:
> who is working on <START> phone <END>.
> <START> mobile <END> is a product in our system.
>
>
> And these are the inputs i'm giving:
> static String sentence = "phone is a product in our system";
> static String sentence = "what is the risk on phone";
> static String sentence = "who is working on switch today";
>
> I should get these entities from respective lines "phone" "phone" and
> "switch".
>
> But i got nothing.? I know i'm doing something wrong in training data. I'm
> new to this field. can you please guide me what can i put in training data
> to process all these sentences.
>
> If i get little more knowledge about this, i can implement in our base??
>
> Please help me..!
>
> Thanks,
> Vivek
>
>
>
>
>
> On Tue, Jun 24, 2014 at 9:17 PM, John Miedema <[email protected]>
> wrote:
>
>> I recently wrote up post, doing this in java, not using command line.
>> Maybe
>> it will help. Code samples in java. http://johnmiedema.com/?p=744
>>
>>
>> On Tue, Jun 24, 2014 at 8:53 AM, Vivekanand Ittigi <[email protected]
>> >
>> wrote:
>>
>> > It means you want me to write small story integrating these entities.?
>> >
>> >
>> > On Tue, Jun 24, 2014 at 5:59 PM, Mark G <[email protected]> wrote:
>> >
>> > > Hello, you need to annotate the entity within some of the sentences it
>> > > occurs in. The name finder needs context. It's giving you the same
>> > sentence
>> > > back because it was trained to find any token anywhere.
>> > > Mg
>> > >
>> > >
>> > > > On Jun 24, 2014, at 8:12 AM, Vivekanand Ittigi <
>> [email protected]>
>> > > wrote:
>> > > >
>> > > > Hi Jorn,
>> > > >
>> > > > Let me use training model itself.
>> > > >
>> > > > Let me just say what i've done so far
>> > > >
>> > > > 1. I've written the following text into a file called test.train
>> > > > <START:Product_entities>icm2500<END>
>> > > > <START:Product_entities>prd_234<END>
>> > > > .
>> > > > .
>> > > > .
>> > > >
>> > > > 2.  i ran the following
>> > > >
>> > > > ./opennlp TokenNameFinderTrainer -encoding UTF-8 -lang en -data
>> > > test.train
>> > > > -model en-ner-person.bin
>> > > >
>> > > > 3. I've added the bellow line in "sometext.txt"
>> > > >
>> > > > What is the risk value on icm2500. Delivery of prd_234 will be
>> arrived
>> > > > late. Watson is handling router_34.
>> > > >
>> > > > 4. I ran the command
>> > > >
>> > > > ./opennlp TokenNameFinder en-ner-person.bin <sometext.txt>
>> > > > output/output4.txt
>> > > >
>> > > > result: It threw me the same line instead of What is the risk value
>> on
>> > > > <START:Product_entities>icm2500<END> Delivery of
>> > > > <START:Product_entities>prd_234<END> will be arrived late.......
>> > > >
>> > > > Please tell me what am i doing wrong??????
>> > > >
>> > > > Thanks,
>> > > > Vivek
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >> On Tue, Jun 24, 2014 at 5:06 PM, Jörn Kottmann <[email protected]
>> >
>> > > wrote:
>> > > >>
>> > > >>> On 06/24/2014 01:10 PM, Vivekanand Ittigi wrote:
>> > > >>>
>> > > >>> Hi Jorn,
>> > > >>>
>> > > >>> I read the document
>> > > >>> http://opennlp.apache.org/documentation/manual/opennlp.
>> > > >>> html#tools.namefind.recognition.cmdline.
>> > > >>> But i felt i needed more information to put it in code.
>> > > >>>
>> > > >>> I got to know that we need to train the model. But could not get
>> it.
>> > > >>> Can you please explain it. so that i could start implementing it.
>> > > >>>
>> > > >>> Thanks,
>> > > >>> Vivek
>> > > >>>
>> > > >>> Thanks,
>> > > >>> Vivek
>> > > >>>
>> > > >>>
>> > > >>> On Tue, Jun 24, 2014 at 3:28 PM, Jörn Kottmann <
>> [email protected]>
>> > > >>> wrote:
>> > > >>>
>> > > >>>> On 06/24/2014 09:44 AM, Vivekanand Ittigi wrote:
>> > > >>>>
>> > > >>>> Hi,
>> > > >>>>>
>> > > >>>>> If i use a query like this in command line
>> > > >>>>>
>> > > >>>>> ./opennlp TokenNameFinder en-ner-person.bin <input.txt>
>> > <output.txt>
>> > > >>>>>
>> > > >>>>> I'll get person names printed in output.txt but I want to write
>> own
>> > > >>>>> models
>> > > >>>>> such that i should print my own entities.
>> > > >>>>>
>> > > >>>>> E.g.
>> > > >>>>>
>> > > >>>>> 1. what is the risk value on icm2500.
>> > > >>>>> 2. Delivery of prd_234 will be arrived late.
>> > > >>>>> 3. Watson is handling router_34.
>> > > >>>>>
>> > > >>>>> If i pass these lines, it should parse and extract
>> > product_entities.
>> > > >>>>> icm2500, prd_234, router_34... etc these are all Products( we
>> can
>> > > save
>> > > >>>>> this
>> > > >>>>> information in a file and we can use it as look up kind of for
>> > > models or
>> > > >>>>> openNLP).
>> > > >>>>>
>> > > >>>>> Can anyone please tel me how to do this  ?
>> > > >>>>>
>> > > >>>>>
>> > > >>>>> You need to train your own model. To do that you have to collect
>> > some
>> > > >>>> of
>> > > >>>> the texts
>> > > >>>> and annotate them with the entities you wish to detect.
>> > > >>>>
>> > > >>>> Have a look at the documentation about the name finder. It
>> explains
>> > > how
>> > > >>>> to
>> > > >>>> the training
>> > > >>>> works.
>> > > >> For the training you need to produce annotated texts like the
>> sample
>> > in
>> > > >> the documentation.
>> > > >> If you have a training data file in that format you can use the
>> > command
>> > > >> line interface to
>> > > >> actual train a model.
>> > > >>
>> > > >> The latest trunk version of OpenNLP can also be trained on files in
>> > the
>> > > >> brat data format,
>> > > >> those can be easily created with brat.
>> > > >>
>> > > >> Have a look here:
>> > > >> http://brat.nlplab.org/index.html
>> > > >>
>> > > >> In my experience brat works quite well in the latest trunk version.
>> > > >>
>> > > >> To train with brat you need to suffix the training command like
>> this
>> > > >> bin/opennlp TokenNameFinderTrainer.brat
>> > > >> That command will print a help message explaining the inputs it
>> needs.
>> > > >>
>> > > >> There is no need to write code to train a name finder model.
>> > > >>
>> > > >> Jörn
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > > >>
>> > >
>> >
>>
>>
>>
>> --
>> _________________________________________
>> johnmiedema.com
>>
>
>


-- 
_________________________________________
johnmiedema.com

Re: Writing our own models in openNLP.

Reply via email to