Nevermind my last mail, I found well-formated spanish training files
in this link:

http://www.lsi.upc.edu/~nlp/tools/nerc/nerc.html

Its a bit confusing how its explained here:

http://www.cnts.ua.ac.be/conll2002/ner/

because Dutch data are well-formated in this file
http://www.cnts.ua.ac.be/conll2002/ner.tgz but Spanish data aren't
well-formated there.



2012/7/4, Daniel <[email protected]>:
> I see, so I can't train the existing OpenNLP model for detect person
> names "es-ner-person.bin"....I would need the .train file that OpenNLP
> used to create this model, and concatenate that file with my new
> trainning files, isnt it?
>
> OpenNLP used conll2002 data to create "es-ner-person.bin", so I have
> downloaded it from here http://www.cnts.ua.ac.be/conll2002/ner.tgz but
> Im not able to use "esp.train", because when I run it
>
> C:\>opennlp TokenNameFinderTrainer -lang es -data esp.train -model
> es_person.bin
>
> I get this error:
>
> java.lang.IllegalArgumentException: Model not compatible with name finder!
>
>
> so I guess that I must convert this data file to OpenNLP format, but I use:
>
> C:\>opennlp TokenNameFinderConverter conll02 -data esp.train -lang es
> -types per > corpus_train.txt
>
> and I get this error:
>
> IO error while reading training data or indexing data: Expected three
> fields per line in training data!
>
>
> 2012/7/4, Jörn Kottmann <[email protected]>:
>> On 07/04/2012 08:18 AM, Daniel wrote:
>>> I have a easy question about training NameFinders, can I use 5-6
>>> different training files to train a NameFinderME? or I only can use
>>> one training file to generate one model.bin?
>>
>> You need to concatenate the files for the cli tools.
>>
>>> And one last question, if I want that my application detects english
>>> person names and spanish person names, should I use
>>> "es-ner-person.bin" and "en-ner-person.bin"? or these models are 100%
>>> dependent on language, so if my text is in spanish language, I only
>>> have to use "es-ner-person.bin"?
>>
>> I usually detect the language before with our Document Categorizer,
>> and then use the model trained for the language.
>>
>> You can also try to train one name finder for both languages.
>>
>> Jörn
>>
>>
>

Reply via email to