requirements information extraction

Daniel Russ Fri, 06 Nov 2015 08:32:41 -0800

Hello,

    I am trying to extract requirement from a large set of documents (sorry 
can’t be more specific). The documents are split into sentences.

I have a small training set that has been annotated in a format similar to the
NameFinder

I think that you should <START:requirement> eat more chicken <END> .

I can get fair results with the NameFinder, which is actually surprisingly
considering I just used the opennlp CLI application with my data. I would like
to customize the features for my needs.

I have spent some time looking into the NameFinder code. It appears that a lot
of the code is required to integrate with the opennlp CLI application. That is
not a requirement for me.

It appears that the minimum I need to create is an equivalent to a NameSample;
a cachedFeatureGenerator containing a set of adaptiveFeatureGenerators
including my custom featureGenerator (extend FeatureGeneratoryAdapter); an
objectStream of Samples; and an eventStream (extend AbstractEventStream)

A few questions
1) did I get all the parts needed? The xxxME (NameFinderME) appears to wrap
all the training functionality for the application, but does not seem to be a
requirement. All the various factories appear to be a requirement to work with
the opennlp CLI application.
2) What exactly is adaptive about the adaptive data. The contextGenerators add
all the predicates to the context at each index. Do I clear the adaptiveData
at the end of each sentence?
3) The NameFinder does not appear to use the BeamSearch, by default it creates
a GIS object and trains using that. I think that the beam search would be
better for me, because it keep multiple potential local potential outcomes to
improve the global classification. Am I correct?

BTW: The example online
https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.namefind.training.featuregen

<https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.namefind.training.featuregen>
gives an example with a depreciated method. The only non-deprecated
NameFinderME.train method uses a TokenNameFinderFactory which doesn’t have a
method to set the context generators (except via XML).

Thank you for any advice
Dan

requirements information extraction

Reply via email to