Thanks much! I merged the pull request yesterday. On Mon, Apr 9, 2012 at 6:40 PM, Juan Manuel Caicedo Carvajal < [email protected]> wrote:
> Hello, > > I finally made the pull request that includes the models for the POS > tagger for Spanish. > > I created the models using their original tags and also the universal > POS tags. For each tag set I trained two models: one using maxent and > the other using perceptron. > > The pull request contains the models and the scripts that I used to train > them: > > https://github.com/utcompling/OpenNLP-Models/pull/1 > > Cheers, > > Juan Manuel Caicedo > > On Mon, Feb 13, 2012 at 12:47 PM, Jason Baldridge > <[email protected]> wrote: > > > > > > On Thu, Feb 9, 2012 at 7:36 AM, Juan Manuel Caicedo Carvajal > > <[email protected]> wrote: > >> > >> (Sorry for the late reply) > >> > >> I just cloned the repository and I'll add the scripts I used to > >> convert the input files and to train the models. this afternoon I'll > >> put them together on a pull request. > >> > > > > Great! > > > >> > >> Should we keep a copy of the training data in GitHub? I think it could > >> be useful for training again the models and it also be helpful in case > >> that the original files are not available anymore (e.g. 404 errors). > >> Otherwise, should be enough to include links those files? > >> > > It depends on whether it is legal to do so. For example, the Norwegian > data > > used to train the models there cannot be distributed. If it is fine to > have > > it and the corpus isn't too massive, then it might make sense. > > > > > >> > >> I also have a script for generating a Maven repository for the models. > >> The GitHub project could also be used for hosting that repository, > >> what do you think? > >> > > > > +1 Sounds interesting, so if you want to set that up, it sounds good to > me. > > > > -Jason > > > >> On Thu, Feb 2, 2012 at 7:50 PM, Jason Baldridge > >> <[email protected]> wrote: > >> > That's great! Would you be interested in contributing code and/or data > >> > to > >> > the OpenNLP Models repo? > >> > > >> > https://github.com/utcompling/OpenNLP-Models > >> > > >> > > >> > > >> > On Thu, Feb 2, 2012 at 4:02 PM, Juan Manuel Caicedo Carvajal > >> > <[email protected]> wrote: > >> >> > >> >> Hello everyone, > >> >> > >> >> I trained POS tagging models for Spanish using the CoNLL data [1]. > >> >> > >> >> I created two versions using a different model type (percetron and > >> >> maxent) and I also created versions of the models using the universal > >> >> Part-of-Speech Tags [2]. > >> >> > >> >> I uploaded the files to my server, you can read more details here, > >> >> including the evaluation results: > >> >> > >> >> http://cavorite.com/labs/nlp/opennlp-models-es/ > >> >> > >> >> And the files are here: > >> >> > >> >> http://files.cavorite.com/projects/opennlp-models-es/ner/models/ > >> >> > >> >> > >> >> Feel free to host them on the OpenNLP website and do not hesitate to > >> >> send me your questions or comments. > >> >> > >> >> Cheers, > >> >> > >> >> Juan Manuel Caicedo > >> >> > >> >> [1] http://www.lsi.upc.edu/~nlp/tools/nerc/nerc.html > >> >> [2] http://code.google.com/p/universal-pos-tags/ > >> > > >> > > >> > > >> > > >> > -- > >> > Jason Baldridge > >> > Associate Professor, Department of Linguistics > >> > The University of Texas at Austin > >> > http://www.jasonbaldridge.com > >> > http://twitter.com/jasonbaldridge > >> > > >> > > > > > > > > > > > -- > > Jason Baldridge > > Associate Professor, Department of Linguistics > > The University of Texas at Austin > > http://www.jasonbaldridge.com > > http://twitter.com/jasonbaldridge > > > > > -- Jason Baldridge Associate Professor, Department of Linguistics The University of Texas at Austin http://www.jasonbaldridge.com http://twitter.com/jasonbaldridge
