You can also use a tools like the Apache UIMA Cas Editor, Brat, WebAnno, etc. Usually the annotation speed is much higher if you don't need to edit a text file
yourself.

The Tagging Server in the sandbox can be used to pre-label data for brat or the Apache UIMA Cas Editor.

Another tool you should try is word2vec, it can create word clusters which can be used as part of the feature generation, in my tests that increased the recall a few percents, but it is still work in progress, it will take a few days until that works with the TokenNameFinderTrainer command line tool.

HTH,
Jörn

On 10/14/2013 09:27 PM, Thomas Zastrow wrote:
Hello,

I have a question: when creating training material, does it make a
difference if there are " " (blanks) around the NE? In other words, is
it the same to have:

<START:loc>Hamburg<END>

or:

<START:loc> Hamburg <END>

The example in the documentation shows up with the " " ... ?

Best,

Tom

P.S.: ca. 1300 sentences for a free German NE model are done :-)

Reply via email to