I sent this some days before, but I got no answer :-(( :
To train a tokenizer I can use a dictionary, but where is the dictionary used to train the current English model? andwhere can I find information about the dictionary format? , so I can, at least, generate my own one.
thanks Joan Codina
