First, ask yourself if a statistical model is an appropriate tool for this job. What is it going to learn that is more effective than the obvious regular expression rules that you can write yourself? You will certainly need some pattern-based features, and unless context words make an essential disambuguation, training a model in that case is just the long way around.
On Sunday, July 8, 2012, William Colen wrote: > Hi, Carlos, > > It is exactly the same. You can create a train corpus: > > Sometimes for length I might have <START:length> 35+81' <END> which means > > <START:length> 3500 + 81 3581' <END> > > <START: pressure> 977 psig <END> > > > Notice that the corpus should have a tokenized sentence per line. > > You could also check if the regular expression Name Finder implementation > would be better for your task: > > http://svn.apache.org/viewvc/opennlp/trunk/opennlp-tools/src/main/java/opennlp/tools/namefind/RegexNameFinder.java?view=markup > > Regards, > William > > On Sun, Jul 8, 2012 at 6:43 AM, Carlos Scheidecker > <[email protected]<javascript:;> > >wrote: > > > Hello all, > > > > I would like to train the system to identify pressure and length entities > > on a document. > > > > So for instance, if I have 39481.8' 10.750" x .25" > > > > 977 psig > > > > Sometimes for length I might have 35+81' which means 3500 + 81 3581' > > > > Is there any examples on how to train entities on OpenNLP? > > > > On the manual it has this > > > > > http://opennlp.apache.org/documentation/1.5.2-incubating/manual/opennlp.html#tools.namefind.training > > > > But then, I wonder how would that work or if I should use the examples on > > percentage or money entities. > > > > Thanks again. > > > > Carlos. > > >
