I need to obtain training data for turkish to use in Sentence Detector Training to get tr-sent.bin, which will be later used in both opennlp and wikipediaminer.
I have downloaded corpora for turkish from http://corpora.uni-leipzig.de/download.html. Then used the command: $ bin/opennlp DoccatConverter leipzig -lang tr -data Leipzig/tr100k/sentences.txt >> lang.train However, there is no DoccatConverter TOOL. How can I obtain the train data from sentences.txt? Btw, I am working with opennlp-1.5.0 Thanks in advance... Duygu
