I'm currently using OpenNLP with UIMA to label words in a sentence. It's
important that a single word can be labelled more than once. For example David
Cronenberg should be labelled as director and person.
I know the training process is implemented correctly because I have a
custom model file and when all sentences with one of the labels is removed
from the model file the other label is detected.
I would preferably be able to continue to use OpenNLP to double label
words. Is there a way to do this? If not is this possible with another
library such as Stanford CoreNLP.
The code that gets the labels is below:
List<NamedEntity> entities = JCasUtil.selectCovered(
NamedEntity.class, aConstituent );
if ( !entities.isEmpty() ) {
// is never more than 1
}
And some sample training data is below (there are hundreds of lines similar
to this.)
<START:person> David Cronenberg <END> directed <START:film> Crash
<END> .<START:director> David Cronenberg <END> directed <START:film>
Scanners <END> .