On 03/19/2014 11:22 AM, Richard Eckart de Castilho wrote:
Of course you could probably always train your own models, at least
for the tokenizer, sentencedetector, and pos tagger. I believe the
AnCora corpus should serve well [1].
Not sure about the chunker though and last time I looked, I believe
the parser was pretty much hard-coded to English.
The chunker can be trained for Spanish with out any modifications. All
you need is
a training corpus and a tool which can convert it into the OpenNLP format.
The parser needs a head rules file for Spanish, we recently got a
contribution for one
and it should soon be possible to train it on Spanish too.
HTH,
Jörn