Hello,

    I am trying to extract requirement from a large set of documents (sorry 
can’t be more specific). The documents are split into sentences. 

I have a small training set that has been annotated in a format similar to the 
NameFinder

I think that you should <START:requirement> eat more chicken <END> .  

I can get fair results with the NameFinder, which is actually surprisingly 
considering I just used the opennlp CLI application with my data.  I would like 
to customize the features for my needs. 

I have spent some time looking into the NameFinder code.  It appears that a lot 
of the code is required to integrate with the opennlp CLI application.  That is 
not a requirement for me.  

It appears that the minimum I need to create is an equivalent to a NameSample; 
a cachedFeatureGenerator containing a set of adaptiveFeatureGenerators 
including my custom featureGenerator (extend FeatureGeneratoryAdapter); an 
objectStream of Samples; and an eventStream (extend AbstractEventStream)

A few questions
1) did I get all the parts needed?  The xxxME (NameFinderME) appears to wrap 
all the training functionality for the application, but does not seem to be a 
requirement.  All the various factories appear to be a requirement to work with 
the opennlp CLI application.
2) What exactly is adaptive about the adaptive data.  The contextGenerators add 
all the predicates to the context at each index.  Do I clear the adaptiveData 
at the end of each sentence?
3) The NameFinder does not appear to use the BeamSearch, by default it creates 
a GIS object and trains using that.  I think that the beam search would be 
better for me, because it keep multiple potential local potential outcomes to 
improve the global classification.  Am I correct?

BTW: The example online 
https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.namefind.training.featuregen
 
<https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.namefind.training.featuregen>
 gives an example with a depreciated method. The only non-deprecated 
NameFinderME.train method uses a TokenNameFinderFactory which doesn’t have a 
method to set the context generators (except via XML).

Thank you for any advice
Dan

Reply via email to