Re: abbreviation diccionary format

Joan Codina Tue, 10 Apr 2012 07:45:04 -0700

Thanks

I know I need a training model with the <space> but, but if I can add alist of domain abbreviations, I hope, I will be able to solve someproblems I have with tokenization.Also I will expand a bit the training set, with some sentences I mayfind conflictive.

But to train the system I only found that file... which is small.
http://opennlp.cvs.sourceforge.net/viewvc/opennlp/opennlp/src/test/resources/opennlp/tools/tokenize/token.train?view=markup

which only contains 121 sentences. i don't know if this is enough orthere are other training annotated models



Joan



On 10/04/12 15:20, Jim - FooBar(); wrote:

On 10/04/12 14:18, Jörn Kottmann wrote:
On 04/10/2012 03:15 PM, Jim - FooBar(); wrote:
But you still cannot "train" anything (maxent/perceptron) on thedictionary, can you???One needs training data for that yes?
The dictionary is used to produce additional features to our standardfeature set.Therefor you need training data to train our statistical tokenizer,even so the feature
generation can use a dictionary to produce features.

Jörn
aha ok, that makes sense...

Jim


--

Joan Codina Filbà
Departament de Tecnologia
Universitat Pompeu Fabra

_______________________________________________________________________________

Abans d'imprimir aquest e-mail, pensa si realment és necessari, i en casde que ho sigui, pensa que si ho fas a doble cara estalvies un 25% delpaper, els arbres t'ho agrairan._______________________________________________________________________________

/La informació d'aquest missatge electrònic és confidencial, personal iintransferible i només està dirigida a la/les adreça/ces indicades adalt. Si vostè llegeix aquest missatge per equivocació, l'informem quequeda prohibida la seva divulgació, ús o distribució, completa o enpart, i li preguem esborri el missatge original juntament amb els seusfitxers annexos sense llegir-lo ni gravar-lo./


/Gràcies./

Re: abbreviation diccionary format

Reply via email to