Hi Jörn, Yes, I am currently training new models with more features and other corpora for English and Spanish.
About contributing, sure, no problem I will do that in the future, as soon as the parser language specific casting is corrected :) https://issues.apache.org/jira/browse/OPENNLP-665 Cheers, Rodrigo On 2014/03/19 at 14:31, Joern Kottmann wrote: > I had a short look at the paper. For English NER you might want in > addition to publish OntoNotes models. There is format support for that > in OpenNLP. > > Maybe it could be interesting for you to contribute the work you did > on the tokenization > or coref component to OpenNLP. > > Jörn > > On 03/19/2014 01:59 PM, Rodrigo Agerri wrote: > >Hi, > > > >We have new models 1.5.3 for pos, ner (conll 2002), parser (Ancora) with > >evaluations > >etc and so on as part of the IXA pipeline tools. > > > >We also have tokenizer (tried opennlp models and were not adaptable enough) > >based on JFlex specification. Coreference resolution (loosely based on > >Stanford > >NLP approach) coming very soon (for May). > > > >More info here: > > > >http://www.rodrigoagerri.net/recent-papers/ixa-pipes.pdf?attredirects=0&d=1 > > > >Thanks, > > > >Rodrigo > > > >On 2014/03/19 at 12:39, Charles Jalin wrote: > >>For tokenizer, sentence, pos tagger y tokchunk. > >> > >>I amn't sure that i can obtain Spanish corpora. > >> > >>Thanks. > >> > >> > >>2014-03-19 12:08 GMT+01:00 Jörn Kottmann <[email protected]>: > >> > >>>On 03/19/2014 12:01 PM, Charles Jalin wrote: > >>> > >>>>How i do this? > >>>> > >>>> > >>>> > >>>Depends on the model. For which component? > >>> > >>>Anyway, the best way to improve the situation would be to > >>>add support to OpenNLP to train it on the available Spanish corpora. > >>> > >>>Jörn > >>> >
