Are there any known issues in the OpenNLP chunker version 1.53 that would
inhibit the chunking of tokens at the beginning and end of short sentences
like these :
VP + NP + VP
VP + NP
VP + NP + ADJP:
did_VBD [NP the_DT phone_NN ] [VP vibrate_VB ] ._. ("did" is not chunked)
has_VBZ [NP activated_JJ account_NN ] ._. ("has" is not grouped)
will_MD [VP activate_VB ] [NP account_NN ] ._. ("will" is not grouped)
[VP was_VBD ] [NP his_PRP$ phone_NN expensive_JJ ] ._. ("expensive" is
not chunked separately)
[VP does_VBZ ] [DP this_DT ] [NP phone_NN ring_VB ] ._. ("ring" is not
chunked separately)
This behavior is showing up in the English chunker model available on the
internet and in our locally trained models. Large amounts of training data
based on these patterns do not seem to train this behavior out of a model.
What might be going on here? Will version 1.60 of the chunker address
issues like these?
--
David Sanderson
<http://wysdom.com>