Hello,
I am trying to extract requirement from a large set of documents (sorry
can’t be more specific). The documents are split into sentences.
I have a small training set that has been annotated in a format similar to the
NameFinder
I think that you should <START:requirement> eat more chicken <END> .
I can get fair results with the NameFinder, which is actually surprisingly
considering I just used the opennlp CLI application with my data. I would like
to customize the features for my needs.
I have spent some time looking into the NameFinder code. It appears that a lot
of the code is required to integrate with the opennlp CLI application. That is
not a requirement for me.
It appears that the minimum I need to create is an equivalent to a NameSample;
a cachedFeatureGenerator containing a set of adaptiveFeatureGenerators
including my custom featureGenerator (extend FeatureGeneratoryAdapter); an
objectStream of Samples; and an eventStream (extend AbstractEventStream)
A few questions
1) did I get all the parts needed? The xxxME (NameFinderME) appears to wrap
all the training functionality for the application, but does not seem to be a
requirement. All the various factories appear to be a requirement to work with
the opennlp CLI application.
2) What exactly is adaptive about the adaptive data. The contextGenerators add
all the predicates to the context at each index. Do I clear the adaptiveData
at the end of each sentence?
3) The NameFinder does not appear to use the BeamSearch, by default it creates
a GIS object and trains using that. I think that the beam search would be
better for me, because it keep multiple potential local potential outcomes to
improve the global classification. Am I correct?
BTW: The example online
https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.namefind.training.featuregen
<https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.namefind.training.featuregen>
gives an example with a depreciated method. The only non-deprecated
NameFinderME.train method uses a TokenNameFinderFactory which doesn’t have a
method to set the context generators (except via XML).
Thank you for any advice
Dan