You should not use the general models especially for such data like tweets and Facebook! They are trained using a completely different set of data. No wonder you get a lot of errors.
My advice is to create a training set and retrain the sentence detector. It will get much better and then you will most probably need 3 models - 1 for tweets, 1 for FB and one for blogs. The underlying data can be quite different in all of these. But maybe someone has some models that they can freely distribute… Hope this helps. Поздрави, Светослав ________________________________________ Från: Yasen Kiprov <[email protected]> Skickat: den 21 januari 2014 10:45 Till: [email protected] Ämne: Sentence detection for user generated content Hello, Has anyone tried training the sentence detector on user generated content - tweets, facebook or forum posts? Is there a model available for the purpose? With the default models I see lots of errors with things like ...... , ?!!?!! , !!!, and also with incomplete sentences. All the best, Yasen
