I ever trained a POS tagger model for Indonesian language. I defined some tags for Indonesian words which had some differences with English POS tags.
I also used a 'token_pair' format in sentence list. I didn't provide any tag dictionary. And ... that was doing great without problem. I could create an Indonesian POS tagger model and used it to evaluate some Indonesian text as well. Hope this can help. -- Dhito On Fri, Jul 27, 2012 at 2:27 PM, Alessandra Donnini <[email protected]>wrote: > Ok I know I'm new to opennlp, and my question may be wrong, but I would > like to understand: can anyone answer? > thanks > Alessandra > > Inizio messaggio inoltrato: > > > Da: Alessandra Donnini <[email protected]> > > Data: 20 luglio 2012 17.04.27 GMT+02.00 > > A: [email protected] > > Oggetto: Training a POS tagger model > > > > I would like to provide (train) a POS tagger model for italian language. > I have some questions: > > - may I use a token_tag pair list in place of sentence list? Something > like: > > casa_NOUN > > e_CON (conjuction) > > ... > > in place of > > > > la_ART casa_NOUN e_CON la_ART strada_NOUN > > ... > > because I have founded an italian word list. > > > > - Do I need to provide a tag dictionary? Is there a default tag > dictionary? > > > > thanks > > Alessandra > > > > > >
