I was wondering that as well Benson. It is only valid if the pressures and
lengths have larger variations on their formats. If the amount of different
formats is manageable then no. However, the statistical model would cover
more variations like on the date and money entity models.

Hence, once you tag the text, what is the next procedure to generate the
.bin file?

Also, I liked what William have suggested of the RefexNameFinder.java. I
would need to write a few patterns for that though and try it out.

Thanks all!

On Sun, Jul 8, 2012 at 12:17 PM, Benson Margulies <[email protected]>wrote:

> First, ask yourself if a statistical model is an appropriate tool for this
> job. What is it going to learn that is more effective than the obvious
> regular expression rules that you can write yourself? You will certainly
> need some pattern-based features, and unless context words make an
> essential disambuguation, training a model in that case is just the long
> way around.
>
> On Sunday, July 8, 2012, William Colen wrote:
>
> > Hi, Carlos,
> >
> > It is exactly the same. You can create a train corpus:
> >
> > Sometimes for length I might have <START:length> 35+81' <END> which means
> > > <START:length> 3500 + 81 3581' <END>
> >
> > <START: pressure> 977 psig <END>
> >
> >
> >  Notice that the corpus should have a tokenized sentence per line.
> >
> > You could also check if the regular expression Name Finder implementation
> > would be better for your task:
> >
> >
> http://svn.apache.org/viewvc/opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/namefind/RegexNameFinder.java?view=markup
> >
> > Regards,
> > William
> >
> > On Sun, Jul 8, 2012 at 6:43 AM, Carlos Scheidecker <[email protected]
> <javascript:;>
> > >wrote:
> >
> > > Hello all,
> > >
> > > I would like to train the system to identify pressure and length
> entities
> > > on a document.
> > >
> > > So for instance, if I have 39481.8'  10.750" x .25"
> > >
> > > 977 psig
> > >
> > > Sometimes for length I might have 35+81' which means 3500 + 81 3581'
> > >
> > > Is there any examples on how to train entities on OpenNLP?
> > >
> > > On the manual it has this
> > >
> > >
> >
> http://opennlp.apache.org/documentation/1.5.2-incubating/manual/opennlp.html#tools.namefind.training
> > >
> > > But then, I wonder how would that work or if I should use the examples
> on
> > > percentage or money entities.
> > >
> > > Thanks again.
> > >
> > > Carlos.
> > >
> >
>

Reply via email to