Dear Jörn, Thanks for your answer. I know these tools, but I'm happy (and effective) with my little, self-programmed tool. If it will be stable enough, I will publish it sometime. word2vec sounds interesting, I will take a look.
Best, Tom Am 15.10.2013 11:02, schrieb Jörn Kottmann: > You can also use a tools like the Apache UIMA Cas Editor, Brat, WebAnno, > etc. > Usually the annotation speed is much higher if you don't need to edit a > text file > yourself. > > The Tagging Server in the sandbox can be used to pre-label data for brat > or the Apache UIMA Cas Editor. > > Another tool you should try is word2vec, it can create word clusters > which can be used as part of > the feature generation, in my tests that increased the recall a few > percents, but it is still work in progress, > it will take a few days until that works with the TokenNameFinderTrainer > command line tool. > > HTH, > Jörn > > On 10/14/2013 09:27 PM, Thomas Zastrow wrote: >> Hello, >> >> I have a question: when creating training material, does it make a >> difference if there are " " (blanks) around the NE? In other words, is >> it the same to have: >> >> <START:loc>Hamburg<END> >> >> or: >> >> <START:loc> Hamburg <END> >> >> The example in the documentation shows up with the " " ... ? >> >> Best, >> >> Tom >> >> P.S.: ca. 1300 sentences for a free German NE model are done :-) >
