First, ask yourself if a statistical model is an appropriate tool for this
job. What is it going to learn that is more effective than the obvious
regular expression rules that you can write yourself? You will certainly
need some pattern-based features, and unless context words make an
essential disambuguation, training a model in that case is just the long
way around.

On Sunday, July 8, 2012, William Colen wrote:

> Hi, Carlos,
>
> It is exactly the same. You can create a train corpus:
>
> Sometimes for length I might have <START:length> 35+81' <END> which means
> > <START:length> 3500 + 81 3581' <END>
>
> <START: pressure> 977 psig <END>
>
>
>  Notice that the corpus should have a tokenized sentence per line.
>
> You could also check if the regular expression Name Finder implementation
> would be better for your task:
>
> http://svn.apache.org/viewvc/opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/namefind/RegexNameFinder.java?view=markup
>
> Regards,
> William
>
> On Sun, Jul 8, 2012 at 6:43 AM, Carlos Scheidecker 
> <[email protected]<javascript:;>
> >wrote:
>
> > Hello all,
> >
> > I would like to train the system to identify pressure and length entities
> > on a document.
> >
> > So for instance, if I have 39481.8'  10.750" x .25"
> >
> > 977 psig
> >
> > Sometimes for length I might have 35+81' which means 3500 + 81 3581'
> >
> > Is there any examples on how to train entities on OpenNLP?
> >
> > On the manual it has this
> >
> >
> http://opennlp.apache.org/documentation/1.5.2-incubating/manual/opennlp.html#tools.namefind.training
> >
> > But then, I wonder how would that work or if I should use the examples on
> > percentage or money entities.
> >
> > Thanks again.
> >
> > Carlos.
> >
>

Reply via email to