Re: requirements information extraction

Rodrigo Agerri Mon, 16 Nov 2015 00:33:07 -0800

Hello Daniel,

The NameFinderME.train(String lang, String type,
ObjectStream<NameSample>, TrainingParameters, TokenNameFinderFactory)
method requires a language, a the training samples, a
TrainingParameters object and a TokenNameFinderFactory.


If you do not specify a type, it will train for every type found in the data.

To create a TokenNameFinderFactory use TokenNameFinderFactory
create(String subclassName, byte[] featureGeneratorBytes, final
Map<String, Object> resources, SequenceCodec<String> seqCodec).

If you like to use your own featureset you can to pass them in a
byte[] array which can be constructed from the String representation
of a XML feature generator.

The beamsize is read from the TrainingParameters and failing that it
defaults to 3.

In the NameSampleDataStream class you can see that the features are
cleared each time an empty line is encountered.

HTH,

R

On Fri, Nov 6, 2015 at 5:22 PM, Daniel Russ <[email protected]> wrote:
> Hello,
>
>     I am trying to extract requirement from a large set of documents (sorry 
> can’t be more specific). The documents are split into sentences.
>
> I have a small training set that has been annotated in a format similar to 
> the NameFinder
>
> I think that you should <START:requirement> eat more chicken <END> .
>
> I can get fair results with the NameFinder, which is actually surprisingly 
> considering I just used the opennlp CLI application with my data.  I would 
> like to customize the features for my needs.
>
> I have spent some time looking into the NameFinder code.  It appears that a 
> lot of the code is required to integrate with the opennlp CLI application.  
> That is not a requirement for me.
>
> It appears that the minimum I need to create is an equivalent to a 
> NameSample; a cachedFeatureGenerator containing a set of 
> adaptiveFeatureGenerators including my custom featureGenerator (extend 
> FeatureGeneratoryAdapter); an objectStream of Samples; and an eventStream 
> (extend AbstractEventStream)
>
> A few questions
> 1) did I get all the parts needed?  The xxxME (NameFinderME) appears to wrap 
> all the training functionality for the application, but does not seem to be a 
> requirement.  All the various factories appear to be a requirement to work 
> with the opennlp CLI application.
> 2) What exactly is adaptive about the adaptive data.  The contextGenerators 
> add all the predicates to the context at each index.  Do I clear the 
> adaptiveData at the end of each sentence?
> 3) The NameFinder does not appear to use the BeamSearch, by default it 
> creates a GIS object and trains using that.  I think that the beam search 
> would be better for me, because it keep multiple potential local potential 
> outcomes to improve the global classification.  Am I correct?
>
> BTW: The example online 
> https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.namefind.training.featuregen
>  
> <https://opennlp.apache.org/documentation/1.6.0/manual/opennlp.html#tools.namefind.training.featuregen>
>  gives an example with a depreciated method. The only non-deprecated 
> NameFinderME.train method uses a TokenNameFinderFactory which doesn’t have a 
> method to set the context generators (except via XML).
>
> Thank you for any advice
> Dan

Re: requirements information extraction

Reply via email to