I checked the English models from download page. They were not trained
using an abbreviation dictionary. If they were you would be able to see it
if you extract the model like a zip file. So we don't have a basic English
abbreviation dictionary for you to start with, you will need to create
yours from scratch.

To create your own abbreviation dictionary use *DictionaryBuilder* tool:

$ *bin/opennlp DictionaryBuilder*
Usage: opennlp DictionaryBuilder -inputFile in -outputFile out [-encoding
charsetName]

Arguments description:
-inputFile in
Plain file with one entry per line
 -outputFile out
The dictionary file.
-encoding charsetName
 specifies the encoding which should be used for reading and writing text.
If not specified the system default will be used.

The output looks like this:
http://svn.apache.org/viewvc/opennlp/trunk/opennlp-tools/src/test/resources/opennlp/tools/sentdetect/abb.xml?view=markup

On Tue, Apr 10, 2012 at 6:31 AM, Jim - FooBar(); <[email protected]>wrote:

> To train models of any type you need training data...The pretrained
> english tokenizer was trained on the CoNNL shared task if i remember
> correctly...Maybe one of the developers can shine some light on
> this...Anyway i don't think you need a dictionary but training data of the
> following form :
>
> Pierre Vinken<SPLIT>, 61 years old<SPLIT>, will join the board as a
> nonexecutive director Nov. 29<SPLIT>.
> Mr. Vinken is chairman of Elsevier N.V.<SPLIT>, the Dutch publishing
> group<SPLIT>.
> Rudolph Agnew<SPLIT>, 55 years old and former chairman of Consolidated
> Gold Fields PLC<SPLIT>, was named a nonexecutive director of this British
> industrial conglomerate<SPLIT>.
>
> Hope that helps,
>
> Jim
>
> p.s: Did you mean an abbreviation dictionary? Well, you can't really train
> a model using an abbreviation dictionary...
>
>
> On 10/04/12 09:02, Joan Codina wrote:
>
>>
>> I sent this some days before, but I got no answer :-((  :
>>
>> To train a tokenizer I  can use a dictionary, but
>> where is the dictionary used to train the current English model? and
>> where can I  find information about the dictionary format? , so I can, at
>> least, generate my own one.
>>
>> thanks
>> Joan Codina
>>
>>
>

Reply via email to