You can also use a tools like the Apache UIMA Cas Editor, Brat, WebAnno,
etc.
Usually the annotation speed is much higher if you don't need to edit a
text file
yourself.
The Tagging Server in the sandbox can be used to pre-label data for brat
or the Apache UIMA Cas Editor.
Another tool you should try is word2vec, it can create word clusters
which can be used as part of
the feature generation, in my tests that increased the recall a few
percents, but it is still work in progress,
it will take a few days until that works with the TokenNameFinderTrainer
command line tool.
HTH,
Jörn
On 10/14/2013 09:27 PM, Thomas Zastrow wrote:
Hello,
I have a question: when creating training material, does it make a
difference if there are " " (blanks) around the NE? In other words, is
it the same to have:
<START:loc>Hamburg<END>
or:
<START:loc> Hamburg <END>
The example in the documentation shows up with the " " ... ?
Best,
Tom
P.S.: ca. 1300 sentences for a free German NE model are done :-)