Hello,
the command line trainer util has an option to only used a specified set
of types.
I am not sure if we ever made this available as part of the API, but it
should be really easy to do.
Jörn
On 11/21/2013 08:43 PM, Walrus theCat wrote:
Hi,
I'm using the training API, and I want to create a bunch of different
models. My training data has various entities in it. Unsurprisingly (at
least to the people on this list), when I train a model on my training
data, passing it a name for the entity I'm trying train, it creates a model
that can detect all the entities in the input data. This is the line of
code I'm using to do the training, pardon my Scala:
NameFinderME.train("en", entityName, sampleStream,
TrainingParameters.defaultParams(),
null:Array[Byte], Collections.emptyMap[String, Object]());
The docs say this is how it will behave:
"A training file can contain multiple types. If the training file contains
multiple types the created model will also be able to detect these multiple
types. For now its recommended to only train single type models, since
multi type support is stil experimental. "
What I was hoping would happen is that the trainer would just ignore the
other entities not matching entityName, and just train the model for
entityName. This seems like useful functionality, as the user could just
do multiple passes over the training data training for different entities.
I guess my question is, can OpenNLP already do what I'm trying to do?
Would it be easier to script new data for each model I want to train (ugh)
or modify OpenNLP to be able to do this?
Cheers