Hello, I finally made the pull request that includes the models for the POS tagger for Spanish.
I created the models using their original tags and also the universal POS tags. For each tag set I trained two models: one using maxent and the other using perceptron. The pull request contains the models and the scripts that I used to train them: https://github.com/utcompling/OpenNLP-Models/pull/1 Cheers, Juan Manuel Caicedo On Mon, Feb 13, 2012 at 12:47 PM, Jason Baldridge <[email protected]> wrote: > > > On Thu, Feb 9, 2012 at 7:36 AM, Juan Manuel Caicedo Carvajal > <[email protected]> wrote: >> >> (Sorry for the late reply) >> >> I just cloned the repository and I'll add the scripts I used to >> convert the input files and to train the models. this afternoon I'll >> put them together on a pull request. >> > > Great! > >> >> Should we keep a copy of the training data in GitHub? I think it could >> be useful for training again the models and it also be helpful in case >> that the original files are not available anymore (e.g. 404 errors). >> Otherwise, should be enough to include links those files? >> > It depends on whether it is legal to do so. For example, the Norwegian data > used to train the models there cannot be distributed. If it is fine to have > it and the corpus isn't too massive, then it might make sense. > > >> >> I also have a script for generating a Maven repository for the models. >> The GitHub project could also be used for hosting that repository, >> what do you think? >> > > +1 Sounds interesting, so if you want to set that up, it sounds good to me. > > -Jason > >> On Thu, Feb 2, 2012 at 7:50 PM, Jason Baldridge >> <[email protected]> wrote: >> > That's great! Would you be interested in contributing code and/or data >> > to >> > the OpenNLP Models repo? >> > >> > https://github.com/utcompling/OpenNLP-Models >> > >> > >> > >> > On Thu, Feb 2, 2012 at 4:02 PM, Juan Manuel Caicedo Carvajal >> > <[email protected]> wrote: >> >> >> >> Hello everyone, >> >> >> >> I trained POS tagging models for Spanish using the CoNLL data [1]. >> >> >> >> I created two versions using a different model type (percetron and >> >> maxent) and I also created versions of the models using the universal >> >> Part-of-Speech Tags [2]. >> >> >> >> I uploaded the files to my server, you can read more details here, >> >> including the evaluation results: >> >> >> >> http://cavorite.com/labs/nlp/opennlp-models-es/ >> >> >> >> And the files are here: >> >> >> >> http://files.cavorite.com/projects/opennlp-models-es/ner/models/ >> >> >> >> >> >> Feel free to host them on the OpenNLP website and do not hesitate to >> >> send me your questions or comments. >> >> >> >> Cheers, >> >> >> >> Juan Manuel Caicedo >> >> >> >> [1] http://www.lsi.upc.edu/~nlp/tools/nerc/nerc.html >> >> [2] http://code.google.com/p/universal-pos-tags/ >> > >> > >> > >> > >> > -- >> > Jason Baldridge >> > Associate Professor, Department of Linguistics >> > The University of Texas at Austin >> > http://www.jasonbaldridge.com >> > http://twitter.com/jasonbaldridge >> > >> > > > > > > -- > Jason Baldridge > Associate Professor, Department of Linguistics > The University of Texas at Austin > http://www.jasonbaldridge.com > http://twitter.com/jasonbaldridge > >
