Re: Writing our own models in openNLP.

Mark G Tue, 24 Jun 2014 05:30:22 -0700

Hello, you need to annotate the entity within some of the sentences it occurs 
in. The name finder needs context. It's giving you the same sentence back 
because it was trained to find any token anywhere.
Mg



> On Jun 24, 2014, at 8:12 AM, Vivekanand Ittigi <[email protected]> wrote:
> 
> Hi Jorn,
> 
> Let me use training model itself.
> 
> Let me just say what i've done so far
> 
> 1. I've written the following text into a file called test.train
> <START:Product_entities>icm2500<END>
> <START:Product_entities>prd_234<END>
> .
> .
> .
> 
> 2.  i ran the following
> 
> ./opennlp TokenNameFinderTrainer -encoding UTF-8 -lang en -data test.train
> -model en-ner-person.bin
> 
> 3. I've added the bellow line in "sometext.txt"
> 
> What is the risk value on icm2500. Delivery of prd_234 will be arrived
> late. Watson is handling router_34.
> 
> 4. I ran the command
> 
> ./opennlp TokenNameFinder en-ner-person.bin <sometext.txt>
> output/output4.txt
> 
> result: It threw me the same line instead of What is the risk value on
> <START:Product_entities>icm2500<END> Delivery of
> <START:Product_entities>prd_234<END> will be arrived late.......
> 
> Please tell me what am i doing wrong??????
> 
> Thanks,
> Vivek
> 
> 
> 
> 
> 
>> On Tue, Jun 24, 2014 at 5:06 PM, Jörn Kottmann <[email protected]> wrote:
>> 
>>> On 06/24/2014 01:10 PM, Vivekanand Ittigi wrote:
>>> 
>>> Hi Jorn,
>>> 
>>> I read the document
>>> http://opennlp.apache.org/documentation/manual/opennlp.
>>> html#tools.namefind.recognition.cmdline.
>>> But i felt i needed more information to put it in code.
>>> 
>>> I got to know that we need to train the model. But could not get it.
>>> Can you please explain it. so that i could start implementing it.
>>> 
>>> Thanks,
>>> Vivek
>>> 
>>> Thanks,
>>> Vivek
>>> 
>>> 
>>> On Tue, Jun 24, 2014 at 3:28 PM, Jörn Kottmann <[email protected]>
>>> wrote:
>>> 
>>>> On 06/24/2014 09:44 AM, Vivekanand Ittigi wrote:
>>>> 
>>>> Hi,
>>>>> 
>>>>> If i use a query like this in command line
>>>>> 
>>>>> ./opennlp TokenNameFinder en-ner-person.bin <input.txt> <output.txt>
>>>>> 
>>>>> I'll get person names printed in output.txt but I want to write own
>>>>> models
>>>>> such that i should print my own entities.
>>>>> 
>>>>> E.g.
>>>>> 
>>>>> 1. what is the risk value on icm2500.
>>>>> 2. Delivery of prd_234 will be arrived late.
>>>>> 3. Watson is handling router_34.
>>>>> 
>>>>> If i pass these lines, it should parse and extract product_entities.
>>>>> icm2500, prd_234, router_34... etc these are all Products( we can save
>>>>> this
>>>>> information in a file and we can use it as look up kind of for models or
>>>>> openNLP).
>>>>> 
>>>>> Can anyone please tel me how to do this  ?
>>>>> 
>>>>> 
>>>>> You need to train your own model. To do that you have to collect some
>>>> of
>>>> the texts
>>>> and annotate them with the entities you wish to detect.
>>>> 
>>>> Have a look at the documentation about the name finder. It explains how
>>>> to
>>>> the training
>>>> works.
>> For the training you need to produce annotated texts like the sample in
>> the documentation.
>> If you have a training data file in that format you can use the command
>> line interface to
>> actual train a model.
>> 
>> The latest trunk version of OpenNLP can also be trained on files in the
>> brat data format,
>> those can be easily created with brat.
>> 
>> Have a look here:
>> http://brat.nlplab.org/index.html
>> 
>> In my experience brat works quite well in the latest trunk version.
>> 
>> To train with brat you need to suffix the training command like this
>> bin/opennlp TokenNameFinderTrainer.brat
>> That command will print a help message explaining the inputs it needs.
>> 
>> There is no need to write code to train a name finder model.
>> 
>> Jörn
>> 
>> 
>> 
>> 
>>

Re: Writing our own models in openNLP.

Reply via email to